Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all non-canonical modules private? #9250

Closed
amueller opened this issue Jun 29, 2017 · 30 comments
Closed

Make all non-canonical modules private? #9250

amueller opened this issue Jun 29, 2017 · 30 comments

Comments

@amueller
Copy link
Member

@amueller amueller commented Jun 29, 2017

This is maybe for 1.0. Recently we started marking files like model_selection._split private, so that everything has a single canonical import:

from sklearn.model_selection import cross_val_score

For many (older?) models that's not the case, we have

from sklearn.linear_model.logistic import LogisticRegression
from sklearn.linear_model import LogisticRegression

etc.
I think it would be nice to make all the files that are not the canonical import (according to the API documentation) private (with deprecation obviously).
That might be a bit annoying for existing users that used long import paths, but it makes auto-complete much more helpful and the module structure much less confusing for newcomers.

For example sklearn.linear_model.ridge is a module, while sklearn.linear_model.ridge_regression is a function that implements ridge regression and sklearn.linear_model.Ridge is a class that implements ridge regression. From the names this is totally unclear.

@amueller amueller added the API label Jun 29, 2017
@jmschrei

This comment has been minimized.

Copy link
Member

@jmschrei jmschrei commented Jun 29, 2017

I would support this as it seems like it could clear up a lot of confusion. Do you believe there are any people directly trying to use the functions that would be made private?

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Jun 29, 2017

@adrinjalali

This comment has been minimized.

Copy link
Member

@adrinjalali adrinjalali commented Sep 10, 2019

Since #14939 is merged, we seem to be deprecating them already for 0.22. This now seems like an easy one for some of the upcoming sprints. Although if we want it done before the end of october, we should do it ourselves maybe.

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Sep 10, 2019

Below is the list of the folders whose files need to be preceded by an underscore. The procedure is pretty straightforward:

  • rename file.py into _file.py and create a new file.py with a similar content that raises a deprecation warning (like neural_network/rbm.py)
  • update __init__.py
  • update test parametrization of sklearn/tests/test_import_deprecations.py

Folders

  • compose -- no need \o/
  • cluster
  • covariance
  • cross_decomposition
  • datasets (#15307)
  • decomposition
  • ensemble
  • experimental -- irrelevant
  • externals -- no need
  • feature_extraction (#15321)
  • gaussian_process (#15320)
  • impute -- no need
  • inspection (#15334)
  • linear_model (#15324)
  • manifold (#15318)
  • metrics (#15306)
  • metrics/cluster (#15306)
  • mixture
  • model_selection -- no need
  • neighbors (#15322)
  • neural_network -- done in #14939
  • preprocessing (#15323)
  • semi_supervised (#15317)
  • svm
  • tree
  • utils -- irrelevant since addressed by #6616 (comment)

(Note that this only addresses the folders. Files like sklearn/pipeline.py may still contain non-deprecated things.)

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Sep 10, 2019

ping @thomasjpfan @glemaitre @adrinjalali let's pick one each and review each-other? this should go relatively fast

I'll pick cluster

@adrinjalali

This comment has been minimized.

Copy link
Member

@adrinjalali adrinjalali commented Sep 11, 2019

I'm taking the tree and ensemble, since they're interrelated.

@glemaitre

This comment has been minimized.

Copy link
Contributor

@glemaitre glemaitre commented Sep 12, 2019

I'll take preprocessing and inspection at first

@amueller

This comment has been minimized.

Copy link
Member Author

@amueller amueller commented Sep 18, 2019

pretty sure folks have not been using git move correctly. Try doing a single commit that does the move and commit that, before adding anything else. Your PRs don't show files as moved, which means that something went wrong.

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Sep 18, 2019

@adrinjalali

This comment has been minimized.

Copy link
Member

@adrinjalali adrinjalali commented Sep 19, 2019

I did move, as you see in this commit df7f54d but once you create a file with the same name, github gets confused.

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Oct 9, 2019

I'm working on mixture and svm

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Oct 17, 2019

Working on covariance

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Oct 17, 2019

working on cross_decomposition

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Oct 20, 2019

working on metrics and metrics/cluster

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Oct 20, 2019

I'll work on datasets

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Oct 21, 2019

Working on neighbors

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Oct 21, 2019

Working on gaussian_process

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Oct 21, 2019

Working on feature_extraction

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Oct 21, 2019

Working on inspection

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Oct 21, 2019

Working on preprocessing

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Oct 21, 2019

Working on linear_models

@glemaitre

This comment has been minimized.

Copy link
Contributor

@glemaitre glemaitre commented Oct 23, 2019

@thomasjpfan From the list, I only did not review the PR for the decomposition. I did not open a PR because you seem to have work on it. Could you open a PR?

@qinhanmin2014

This comment has been minimized.

Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Oct 28, 2019

ping @NicolasHug @thomasjpfan what about sklearn.feature_selection?

Is #15367 a blocker?

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Oct 28, 2019

Working on feature_selection

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Oct 28, 2019

what about sklearn.feature_selection?

Thanks I missed this one

Is #15367 a blocker?

Yes but it's tracked by another issue

@adrinjalali adrinjalali added the Blocker label Oct 28, 2019
@adrinjalali adrinjalali added this to To do in Meeting Issues via automation Oct 29, 2019
@adrinjalali adrinjalali moved this from To do to Done in Meeting Issues Oct 29, 2019
@glemaitre glemaitre moved this from TO DO to REVIEWED AND WAITING FOR CHANGES in Guillaume's pet Oct 30, 2019
@glemaitre glemaitre moved this from REVIEWED AND WAITING FOR CHANGES to MERGED in Guillaume's pet Oct 30, 2019
@lucasdavid

This comment has been minimized.

Copy link
Contributor

@lucasdavid lucasdavid commented Dec 23, 2019

I had a ton of private datasets, such as names, text and geographical data. I used to use the following utils to download and manage these:

from sklearn.datasets.base import (get_data_home, RemoteFileMetadata, _fetch_remote)

What's the recommended approach now?

@glemaitre

This comment has been minimized.

Copy link
Contributor

@glemaitre glemaitre commented Dec 23, 2019

you can import from sklearn.datasets._base import ...

However, now you are aware that _base will be private and you might have to change your code at a new scikit-learn release since we might not provide backward-compatible code.

This said, be aware that we are going to define a developer API which will define some backward support for these types of utilities used in third-party.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Andy's pets
Awaiting triage
9 participants
You can’t perform that action at this time.