Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all non-canonical modules private? #9250

Closed
amueller opened this issue Jun 29, 2017 · 30 comments
Closed

Make all non-canonical modules private? #9250

amueller opened this issue Jun 29, 2017 · 30 comments

Comments

@amueller
Copy link
Member

@amueller amueller commented Jun 29, 2017

This is maybe for 1.0. Recently we started marking files like model_selection._split private, so that everything has a single canonical import:

from sklearn.model_selection import cross_val_score

For many (older?) models that's not the case, we have

from sklearn.linear_model.logistic import LogisticRegression
from sklearn.linear_model import LogisticRegression

etc.
I think it would be nice to make all the files that are not the canonical import (according to the API documentation) private (with deprecation obviously).
That might be a bit annoying for existing users that used long import paths, but it makes auto-complete much more helpful and the module structure much less confusing for newcomers.

For example sklearn.linear_model.ridge is a module, while sklearn.linear_model.ridge_regression is a function that implements ridge regression and sklearn.linear_model.Ridge is a class that implements ridge regression. From the names this is totally unclear.

@amueller amueller added the API label Jun 29, 2017
@amueller amueller added this to the 1.0 milestone Jun 29, 2017
@jmschrei
Copy link
Member

@jmschrei jmschrei commented Jun 29, 2017

I would support this as it seems like it could clear up a lot of confusion. Do you believe there are any people directly trying to use the functions that would be made private?

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 29, 2017

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Sep 10, 2019

Since #14939 is merged, we seem to be deprecating them already for 0.22. This now seems like an easy one for some of the upcoming sprints. Although if we want it done before the end of october, we should do it ourselves maybe.

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Sep 10, 2019

Below is the list of the folders whose files need to be preceded by an underscore. The procedure is pretty straightforward:

  • rename file.py into _file.py and create a new file.py with a similar content that raises a deprecation warning (like neural_network/rbm.py)
  • update __init__.py
  • update test parametrization of sklearn/tests/test_import_deprecations.py

Folders

  • compose -- no need \o/
  • cluster
  • covariance
  • cross_decomposition
  • datasets (#15307)
  • decomposition
  • ensemble
  • experimental -- irrelevant
  • externals -- no need
  • feature_extraction (#15321)
  • gaussian_process (#15320)
  • impute -- no need
  • inspection (#15334)
  • linear_model (#15324)
  • manifold (#15318)
  • metrics (#15306)
  • metrics/cluster (#15306)
  • mixture
  • model_selection -- no need
  • neighbors (#15322)
  • neural_network -- done in #14939
  • preprocessing (#15323)
  • semi_supervised (#15317)
  • svm
  • tree
  • utils -- irrelevant since addressed by #6616 (comment)

(Note that this only addresses the folders. Files like sklearn/pipeline.py may still contain non-deprecated things.)

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Sep 10, 2019

ping @thomasjpfan @glemaitre @adrinjalali let's pick one each and review each-other? this should go relatively fast

I'll pick cluster

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Sep 11, 2019

I'm taking the tree and ensemble, since they're interrelated.

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Sep 12, 2019

I'll take preprocessing and inspection at first

@amueller
Copy link
Member Author

@amueller amueller commented Sep 18, 2019

pretty sure folks have not been using git move correctly. Try doing a single commit that does the move and commit that, before adding anything else. Your PRs don't show files as moved, which means that something went wrong.

@jnothman
Copy link
Member

@jnothman jnothman commented Sep 18, 2019

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Sep 19, 2019

I did move, as you see in this commit df7f54d but once you create a file with the same name, github gets confused.

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Oct 9, 2019

I'm working on mixture and svm

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Oct 17, 2019

Working on covariance

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Oct 17, 2019

working on cross_decomposition

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Oct 20, 2019

working on metrics and metrics/cluster

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Oct 21, 2019

Working on preprocessing

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Oct 21, 2019

Working on linear_models

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Oct 23, 2019

@thomasjpfan From the list, I only did not review the PR for the decomposition. I did not open a PR because you seem to have work on it. Could you open a PR?

@qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Oct 28, 2019

ping @NicolasHug @thomasjpfan what about sklearn.feature_selection?

Is #15367 a blocker?

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Oct 28, 2019

Working on feature_selection

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Oct 28, 2019

what about sklearn.feature_selection?

Thanks I missed this one

Is #15367 a blocker?

Yes but it's tracked by another issue

@adrinjalali adrinjalali added this to To do in Meeting Issues via automation Oct 29, 2019
@adrinjalali adrinjalali moved this from To do to Done in Meeting Issues Oct 29, 2019
@glemaitre glemaitre moved this from TO DO to REVIEWED AND WAITING FOR CHANGES in Guillaume's pet Oct 30, 2019
@glemaitre glemaitre moved this from REVIEWED AND WAITING FOR CHANGES to MERGED in Guillaume's pet Oct 30, 2019
@lucasdavid
Copy link
Contributor

@lucasdavid lucasdavid commented Dec 23, 2019

I had a ton of private datasets, such as names, text and geographical data. I used to use the following utils to download and manage these:

from sklearn.datasets.base import (get_data_home, RemoteFileMetadata, _fetch_remote)

What's the recommended approach now?

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Dec 23, 2019

you can import from sklearn.datasets._base import ...

However, now you are aware that _base will be private and you might have to change your code at a new scikit-learn release since we might not provide backward-compatible code.

This said, be aware that we are going to define a developer API which will define some backward support for these types of utilities used in third-party.

sebp added a commit to sebp/scikit-survival that referenced this issue Apr 10, 2020
- Deprecate presort (scikit-learn/scikit-learn#14907)
- Add Minimal Cost-Complexity Pruning to Decision Trees (scikit-learn/scikit-learn#12887)
- Add bootstrap sample size limit to forest ensembles (scikit-learn/scikit-learn#14682)
- Fix deprecated imports (scikit-learn/scikit-learn#9250)
sebp added a commit to sebp/scikit-survival that referenced this issue Apr 10, 2020
- Deprecate presort (scikit-learn/scikit-learn#14907)
- Add Minimal Cost-Complexity Pruning to Decision Trees (scikit-learn/scikit-learn#12887)
- Add bootstrap sample size limit to forest ensembles (scikit-learn/scikit-learn#14682)
- Fix deprecated imports (scikit-learn/scikit-learn#9250)

Do not add ccp_alpha to SurvivalTree, because
it relies node_impurity, which is not set for SurvivalTree.
sebp added a commit to sebp/scikit-survival that referenced this issue Apr 10, 2020
- Deprecate presort (scikit-learn/scikit-learn#14907)
- Add Minimal Cost-Complexity Pruning to Decision Trees (scikit-learn/scikit-learn#12887)
- Add bootstrap sample size limit to forest ensembles (scikit-learn/scikit-learn#14682)
- Fix deprecated imports (scikit-learn/scikit-learn#9250)

Do not add ccp_alpha to SurvivalTree, because
it relies node_impurity, which is not set for SurvivalTree.
sebp added a commit to sebp/scikit-survival that referenced this issue Apr 10, 2020
- Deprecate presort (scikit-learn/scikit-learn#14907)
- Add Minimal Cost-Complexity Pruning to Decision Trees (scikit-learn/scikit-learn#12887)
- Add bootstrap sample size limit to forest ensembles (scikit-learn/scikit-learn#14682)
- Fix deprecated imports (scikit-learn/scikit-learn#9250)

Do not add ccp_alpha to SurvivalTree, because
it relies node_impurity, which is not set for SurvivalTree.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Andy's pets
Awaiting triage
Development

No branches or pull requests

9 participants