New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] ENH add check_inverse in FunctionTransformer #9399

Merged
merged 27 commits into from Oct 25, 2017

Conversation

Projects
None yet
4 participants
@glemaitre
Contributor

glemaitre commented Jul 18, 2017

Reference Issue

We mentioned in #9041 that it could be nice to port the check_inverse inside the FunctionTransformer

What does this implement/fix? Explain your changes.

Make the checking in the fit methods.

Any other comments?

@jnothman

Just some thoughts on when this might be too restrictive

Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Jul 19, 2017

Contributor

@jnothman I used the assert_allclose_dense_sparse which look particularly ugly. Is it good enough or we should move the base code into utils.allclose_dense_sparse?

Also, it seems that safe_indexing do not handle COO matrix. Is it worth mentioning in the docstring?

Contributor

glemaitre commented Jul 19, 2017

@jnothman I used the assert_allclose_dense_sparse which look particularly ugly. Is it good enough or we should move the base code into utils.allclose_dense_sparse?

Also, it seems that safe_indexing do not handle COO matrix. Is it worth mentioning in the docstring?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 19, 2017

Member
Member

jnothman commented Jul 19, 2017

@amueller

Looks good apart from prints and possible use of resample.

Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
Show outdated Hide outdated sklearn/preprocessing/tests/test_function_transformer.py Outdated

@amueller amueller changed the title from [MRG] EHN add check_inverse in FunctionTransformer to [MRG + 1] EHN add check_inverse in FunctionTransformer Jul 21, 2017

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 21, 2017

Member

LGTM apart from pep8 break

Member

amueller commented Jul 21, 2017

LGTM apart from pep8 break

@jorisvandenbossche

Added some comments.

One further thing I was wondering whether the random_state kwarg is worth adding. It's used by check_inverse, so if you for some reason need to control it, I agree you need this keyword, but it just feels a bit like adding 'noise' to the class for most cases.

Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Jul 27, 2017

Contributor

@amueller @jnothman what are you thoughts regarding the random_state.
I have the same concerns that @jorisvandenbossche so I would need advice on the way to go.

Contributor

glemaitre commented Jul 27, 2017

@amueller @jnothman what are you thoughts regarding the random_state.
I have the same concerns that @jorisvandenbossche so I would need advice on the way to go.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 27, 2017

Member

Drop random_state. Make it deterministic. An evenly-space slice X[::max(1, X.shape[0] // 100)],; perhaps (good because it avoids copying the array, bad because in some cases it will only take a prefix of the data)?

Member

jnothman commented Jul 27, 2017

Drop random_state. Make it deterministic. An evenly-space slice X[::max(1, X.shape[0] // 100)],; perhaps (good because it avoids copying the array, bad because in some cases it will only take a prefix of the data)?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 27, 2017

Member

My main concern with this PR is that users aren't going to use the feature if it's not the default. Yes, we might internally. Should we make it eventually be default behaviour? But doing so also complicates what's meant to be a simple utility.

Member

jnothman commented Jul 27, 2017

My main concern with this PR is that users aren't going to use the feature if it's not the default. Yes, we might internally. Should we make it eventually be default behaviour? But doing so also complicates what's meant to be a simple utility.

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Jul 27, 2017

Contributor

@jnothman If we make it default it will not be back-compatible, isn't (we might need to deprecate the current behaviour). As a default it would make sense since the majority should have this behaviour.

Drop random_state. Make it deterministic.

It makes sense to me.

bad because in some cases it will only take a prefix of the data)

Not sure to know what you mean by "take a prefix of the data". If this is what I am thinking of, I am not sure that random sampling will make it more robust.

Contributor

glemaitre commented Jul 27, 2017

@jnothman If we make it default it will not be back-compatible, isn't (we might need to deprecate the current behaviour). As a default it would make sense since the majority should have this behaviour.

Drop random_state. Make it deterministic.

It makes sense to me.

bad because in some cases it will only take a prefix of the data)

Not sure to know what you mean by "take a prefix of the data". If this is what I am thinking of, I am not sure that random sampling will make it more robust.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 27, 2017

Member

👍 for making it deterministic.

Member

amueller commented Jul 27, 2017

👍 for making it deterministic.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 27, 2017

Member

test failure related.

Member

amueller commented Jul 27, 2017

test failure related.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Jul 31, 2017

Codecov Report

Merging #9399 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9399      +/-   ##
==========================================
+ Coverage   96.19%   96.19%   +<.01%     
==========================================
  Files         335      335              
  Lines       61800    61825      +25     
==========================================
+ Hits        59448    59473      +25     
  Misses       2352     2352
Impacted Files Coverage Δ
sklearn/preprocessing/_function_transformer.py 97.95% <100%> (+0.66%) ⬆️
...n/preprocessing/tests/test_function_transformer.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02a31f0...f3c0d10. Read the comment docs.

codecov bot commented Jul 31, 2017

Codecov Report

Merging #9399 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9399      +/-   ##
==========================================
+ Coverage   96.19%   96.19%   +<.01%     
==========================================
  Files         335      335              
  Lines       61800    61825      +25     
==========================================
+ Hits        59448    59473      +25     
  Misses       2352     2352
Impacted Files Coverage Δ
sklearn/preprocessing/_function_transformer.py 97.95% <100%> (+0.66%) ⬆️
...n/preprocessing/tests/test_function_transformer.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02a31f0...f3c0d10. Read the comment docs.

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Jul 31, 2017

Contributor

@jnothman do you see any other changes?

Contributor

glemaitre commented Jul 31, 2017

@jnothman do you see any other changes?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Aug 2, 2017

Member

LGTM (still) apart from checking no warning is raised in the good case.

Member

amueller commented Aug 2, 2017

LGTM (still) apart from checking no warning is raised in the good case.

glemaitre added some commits Aug 2, 2017

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Aug 17, 2017

Contributor

@jnothman Could you put this PR in your revision queue?

Contributor

glemaitre commented Aug 17, 2017

@jnothman Could you put this PR in your revision queue?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 17, 2017

Member

Believe it or note the current test suite doesn't appear to include a case of actually fitting a FunctionTransformer where inverse_func is not provided.

Member

jnothman commented Aug 17, 2017

Believe it or note the current test suite doesn't appear to include a case of actually fitting a FunctionTransformer where inverse_func is not provided.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 17, 2017

Member

Needs testing.

Member

jnothman commented on sklearn/preprocessing/_function_transformer.py in 677cd2a Aug 17, 2017

Needs testing.

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Aug 17, 2017

Contributor

True. Actually we should make it self.func is not None or self.inverse_func is not None

Contributor

glemaitre replied Aug 17, 2017

True. Actually we should make it self.func is not None or self.inverse_func is not None

@jnothman

Otherwise LGTM

Show outdated Hide outdated sklearn/preprocessing/_function_transformer.py Outdated
@@ -610,6 +610,9 @@ a transformer that applies a log transformation in a pipeline, do::
array([[ 0. , 0.69314718],
[ 1.09861229, 1.38629436]])
You can ensure that ``func`` and ``inverse_func`` are the inverse of each other

This comment has been minimized.

@jnothman

jnothman Aug 17, 2017

Member

Please note that a warning is raised and can be turned into an error with: give simplefilter example which tests message

@jnothman

jnothman Aug 17, 2017

Member

Please note that a warning is raised and can be turned into an error with: give simplefilter example which tests message

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Aug 21, 2017

Contributor

@jnothman so it should be ready to be merged this time :)

Contributor

glemaitre commented Aug 21, 2017

@jnothman so it should be ready to be merged this time :)

@jnothman

Sorry, these should've been included in the previous batch of comment, but github must've glitched.

Show outdated Hide outdated doc/modules/preprocessing.rst Outdated
@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Aug 31, 2017

Contributor

@jnothman this is corrected

Contributor

glemaitre commented Aug 31, 2017

@jnothman this is corrected

@jnothman jnothman changed the title from [MRG + 1] EHN add check_inverse in FunctionTransformer to [MRG + 1] ENG add check_inverse in FunctionTransformer Sep 1, 2017

@jnothman jnothman changed the title from [MRG + 1] ENG add check_inverse in FunctionTransformer to [MRG + 1] ENH add check_inverse in FunctionTransformer Sep 1, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 1, 2017

Member

I'm fine with this, but @amueller should confirm his +1 since we changed to always warning, rather than have a deprecated behaviour.

Member

jnothman commented Sep 1, 2017

I'm fine with this, but @amueller should confirm his +1 since we changed to always warning, rather than have a deprecated behaviour.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 1, 2017

Member

And it's still possible we should make it a custom warning class so filtering is easy. But I'm ambivalent.

Member

jnothman commented Sep 1, 2017

And it's still possible we should make it a custom warning class so filtering is easy. But I'm ambivalent.

glemaitre added some commits Oct 25, 2017

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Oct 25, 2017

Contributor

@lesteve @amueller Can you give a look to this PR.

Contributor

glemaitre commented Oct 25, 2017

@lesteve @amueller Can you give a look to this PR.

@amueller amueller merged commit 8472350 into scikit-learn:master Oct 25, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.17%)
Details
codecov/project 96.17% (+<.01%) compared to f84581b
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details
@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 25, 2017

Member

thanks!

Member

amueller commented Oct 25, 2017

thanks!

donigian added a commit to donigian/scikit-learn that referenced this pull request Oct 28, 2017

Merge branch 'master' of github.com:scikit-learn/scikit-learn into do…
…cs/donigian-update-contribution-guidelines

* 'master' of github.com:scikit-learn/scikit-learn: (23 commits)
  fixes scikit-learn#10031: fix attribute name and shape in documentation (scikit-learn#10033)
  [MRG+1] add changelog entry for fixed and merged PR scikit-learn#10005 issue scikit-learn#9633 (scikit-learn#10025)
  [MRG] Fix LogisticRegression see also should include LogisticRegressionCV(scikit-learn#9995) (scikit-learn#10022)
  [MRG + 1] Labels of clustering should start at 0 or -1 if noise (scikit-learn#10015)
  MAINT Remove redundancy in scikit-learn#9552 (scikit-learn#9573)
  [MRG+1] correct comparison in GaussianNB for 'priors' (scikit-learn#10005)
  [MRG + 1] ENH add check_inverse in FunctionTransformer (scikit-learn#9399)
  [MRG] FIX bug in nested set_params usage (scikit-learn#9999)
  [MRG+1] Fix LOF and Isolation benchmarks (scikit-learn#9798)
  [MRG + 1] Fix negative inputs checking in mean_squared_log_error (scikit-learn#9968)
  DOC Fix typo (scikit-learn#9996)
  DOC Fix typo: x axis -> y axis (scikit-learn#9985)
  improve example plot_forest_iris.py (scikit-learn#9989)
  [MRG+1] Deprecate pooling_func unused parameter in AgglomerativeClustering (scikit-learn#9875)
  DOC update news
  DOC Fix three typos in manifold documentation (scikit-learn#9990)
  DOC add missing dot in docstring
  DOC Add what's new for 0.19.1 (scikit-learn#9983)
  Improve readability of outlier detection example. (scikit-learn#9973)
  DOC: Fixed typo (scikit-learn#9977)
  ...

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG + 1] ENH add check_inverse in FunctionTransformer (scikit-learn#…
…9399)

* EHN add check_inverse in FunctionTransformer

* Add whats new entry and short narrative doc

* Sparse support

* better handle sparse data

* Address andreas comments

* PEP8

* Absolute tolerance default

* DOC fix docstring

* Remove random state and make check_inverse deterministic

* FIX remove random_state from init

* PEP8

* DOC motivation for the inverse

* make check_inverse=True default with a warning

* PEP8

* FIX get back X from check_array

* Andread comments

* Update whats new

* remove blank line

* joel s comments

* no check if one of forward or inverse not provided

* DOC fixes and example of filterwarnings

* DOC fix warningfiltering

* DOC fix merge error git

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG + 1] ENH add check_inverse in FunctionTransformer (scikit-learn#…
…9399)

* EHN add check_inverse in FunctionTransformer

* Add whats new entry and short narrative doc

* Sparse support

* better handle sparse data

* Address andreas comments

* PEP8

* Absolute tolerance default

* DOC fix docstring

* Remove random state and make check_inverse deterministic

* FIX remove random_state from init

* PEP8

* DOC motivation for the inverse

* make check_inverse=True default with a warning

* PEP8

* FIX get back X from check_array

* Andread comments

* Update whats new

* remove blank line

* joel s comments

* no check if one of forward or inverse not provided

* DOC fixes and example of filterwarnings

* DOC fix warningfiltering

* DOC fix merge error git
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment