Skip to content

FIX support read only view for tree input #16331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 1, 2020

Conversation

Batalex
Copy link
Contributor

@Batalex Batalex commented Jan 31, 2020

Reference Issues/PRs

Fixes #15851.

What does this implement/fix? Explain your changes.

According to https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#read-only-views

Since Cython 0.28, the memoryview item type can be declared as const to support read-only buffers as input

This should be fine with scikit-learn since the minimal Cython version supported is 28.5. We could also check the input array for the writeable flag and make a copy instead of a view if needed but I figured this solution would be better.

Any other comments?

The benchmarks now work as expected.

python benchmarks\bench_mnist.py --classifiers RandomForest ExtraTrees CART
[...]
Classification performance:
===========================
Classifier               train-time   test-time   error-rate
------------------------------------------------------------
ExtraTrees                   45.16s       0.45s       0.0294
RandomForest                 40.51s       0.39s       0.0295
CART                         21.61s       0.06s       0.1221

2020 Paris Sprint

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do the same in _decision_path_dense otherwise this sounds good to me.

Please add a unit tests that check that trees work with read-only array as input (you can use create_memmap_backed_data to make one).

Also please add a what's new entry to doc/whats_new/v0.23.rst indicating that the corresponding tree methods can now be used with joblib based multiprocessing (or @jeremiedbb might suggest a better formulation).

@rth rth changed the title ENH use read only view for tree input FIX support read only view for tree input Jan 31, 2020
@Batalex
Copy link
Contributor Author

Batalex commented Jan 31, 2020

I believe we can remove the test test_decision_tree_memmap https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tests/test_tree.py#L1826. It should be covered by the new ones and it will remove an import.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Batalex, LGTM.

I pushed a small fix to improve the what's new entry given new information from #15851 (comment) and also removed test_decision_tree_memmap as you suggested since it is indeed redundant.

Will merge on green CI.

:class:`ensemble.GradientBoostingClassifier` as well as ``predict`` method of
:class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeRegressor`, and
:class:`ensemble.GradientBoostingRegressor`.
:pr:`16331` by :user:`Alexandre Batisse <batalex>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of affected estimators is obtained in #16359

@rth rth merged commit 0db80b9 into scikit-learn:master Feb 1, 2020
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: buffer source array is read-only when running bench_mnist.py
4 participants