Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Support read-only sparse datasets for Tree-based estimators #25341

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Jan 9, 2023

Reference Issues/PRs

Fixes #25333.
Fixes #25247.

What does this implement/fix? Explain your changes.

In some workflows using DecisionTree* (this include RandomForest* and ExtraTrees*):

  • users might provide NumPy arrays with read-only buffers
  • joblib might memmap arrays making their buffer read-only.

Those cases are correctly supported when using dense datasets thanks to const-qualification of memoryview in Cython. Yet those cases aren't currently supported when using sparse datasets (for instance when X is provided as a CSC or as a CSR matrix).

const-qualifying CSC or CSR buffers (i.e. data, indices and indptr) in BaseSparseSplitter allows supporting those cases when using sparse datasets.

@jjerphan jjerphan changed the title FIX const-qualify BaseSparseSplitter attributes of CSR data FIX const-qualify BaseSparseSplitter attributes of CSR or CSC data Jan 9, 2023
@jjerphan jjerphan added the Quick Review For PRs that are quick to review label Jan 10, 2023
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jjerphan jjerphan changed the title FIX const-qualify BaseSparseSplitter attributes of CSR or CSC data FIX Support read-only sparse datasets for Tree-based estimators Jan 11, 2023
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! There is a merge conflict with main to be resolved.

sklearn/ensemble/tests/test_forest.py Outdated Show resolved Hide resolved
doc/whats_new/v1.3.rst Outdated Show resolved Hide resolved
@thomasjpfan thomasjpfan added this to the 1.2.1 milestone Jan 11, 2023
jjerphan and others added 3 commits January 12, 2023 09:17
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
doc/whats_new/v1.3.rst Outdated Show resolved Hide resolved
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

doc/whats_new/v1.2.rst Show resolved Hide resolved
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@jjerphan jjerphan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 12, 2023
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit c3fca81 into scikit-learn:main Jan 12, 2023
@jjerphan jjerphan deleted the fix/const-qualify-X-csr-in-BaseSparseSplitter branch January 12, 2023 17:57
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…ikit-learn#25341)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…ikit-learn#25341)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
…ikit-learn#25341)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
adrinjalali pushed a commit that referenced this pull request Jan 24, 2023
…5341)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug cython module:ensemble module:tree Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read only buffer in cross_val_score with sparse matrix. ValueError: buffer source array is read-only
4 participants