Clone estimator for each parameter value in validation_curve #7365

Closed
wants to merge 762 commits into
from

Conversation

Projects
None yet
@Sundrique
Contributor

Sundrique commented Sep 8, 2016

In validation_curve function an estimator is not cloned like in learning_curve or GridSearchCV.fit for each value of a parameter. Thus it comes pre-trained to each iteration but very first.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 8, 2016

Member

thanks for the fix. Do you think you can create a non-regression test?

Member

amueller commented Sep 8, 2016

thanks for the fix. Do you think you can create a non-regression test?

@Sundrique

This comment has been minimized.

Show comment
Hide comment
@Sundrique

Sundrique Sep 8, 2016

Contributor

I think, I can if it can wait a couple of days.

Contributor

Sundrique commented Sep 8, 2016

I think, I can if it can wait a couple of days.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 8, 2016

Member

sure, no worries :)

Member

amueller commented Sep 8, 2016

sure, no worries :)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 21, 2016

Member

I find your mock estimator a bit of an obtuse way to test this. How about a class inheriting from something already in scikit-learn (even a dummy estimator) that simply raises an error if fit is called on it multiple times?

    def fit(self, X, y):
        assert_false(hasattr(self, 'fit_called_'))
        self.fit_called_ = True
        super(type(self), self).fit(X, y)
Member

jnothman commented Sep 21, 2016

I find your mock estimator a bit of an obtuse way to test this. How about a class inheriting from something already in scikit-learn (even a dummy estimator) that simply raises an error if fit is called on it multiple times?

    def fit(self, X, y):
        assert_false(hasattr(self, 'fit_called_'))
        self.fit_called_ = True
        super(type(self), self).fit(X, y)
@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 21, 2016

Member

Please also make sure that the lines you change in this PR follow the PEP8 conventions (travis is read because of this):

https://travis-ci.org/scikit-learn/scikit-learn/jobs/160421906#L457

Member

ogrisel commented Sep 21, 2016

Please also make sure that the lines you change in this PR follow the PEP8 conventions (travis is read because of this):

https://travis-ci.org/scikit-learn/scikit-learn/jobs/160421906#L457

@Sundrique

This comment has been minimized.

Show comment
Hide comment
@Sundrique

Sundrique Sep 23, 2016

Contributor

Thanks @jnothman, @ogrisel. Will fix.

Contributor

Sundrique commented Sep 23, 2016

Thanks @jnothman, @ogrisel. Will fix.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 26, 2016

Member

Actually, this change needs to be in model_selection/_validation.py, although I think it's a good idea to backport it to the deprecated module you've currently edited. Otherwise, LGTM. (And I confirm the test fails at master.)

Member

jnothman commented Sep 26, 2016

Actually, this change needs to be in model_selection/_validation.py, although I think it's a good idea to backport it to the deprecated module you've currently edited. Otherwise, LGTM. (And I confirm the test fails at master.)

@amueller amueller added this to the 0.19 milestone Oct 27, 2016

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 14, 2017

Member

This only needs a little fixing up to be merged and included in the upcoming release. @Sundrique, are you onto it? Please also add an entry to doc/whats_new.rst

Member

jnothman commented Jun 14, 2017

This only needs a little fixing up to be merged and included in the upcoming release. @Sundrique, are you onto it? Please also add an entry to doc/whats_new.rst

@Sundrique

This comment has been minimized.

Show comment
Hide comment
@Sundrique

Sundrique Jun 14, 2017

Contributor

@jnothman Yes, I am. Will add into model_selection/_validation.py, resolve the conflict and add an entry to doc/whats_new.rst. Please, let me know if anything else is needed.

Contributor

Sundrique commented Jun 14, 2017

@jnothman Yes, I am. Will add into model_selection/_validation.py, resolve the conflict and add an entry to doc/whats_new.rst. Please, let me know if anything else is needed.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 14, 2017

Member
Member

jnothman commented Jun 14, 2017

mathurinm and others added some commits Mar 2, 2017

[MRG+1] BUG: fix svd_solver validation in PCA.fit (#8496)
* BUG: fix svd_solver validation in PCA.fit

* TST: add test of pca svd_solver
[MRG] Modify Svm examples for matplotlibv2 comp (#8456)
* Modify plot_custom_kernel for matplotlib v2 comp

Add `edgecolors` attribute in scatter plot for better visualization
in matplotlib version 2

Issue: #8364

* Modify plot_oneclass.py for matplotlib v2 comp

Add `edgecolors` attribute to scatter plot for better
visualization in matplotlib version 2

Issue: #8364

* Modify plot_rbf_parameters for matplotlib v2

Add `edgecolors` attribute to scatter plot for
better visualization.

Issue: #8364

* Modify plot_separating_hyperplane_unbalanced for matplotlib v2

Add `edgecolors` attribute to scatter plot for better visualization.

Issue: #8364

* Modify plo_svm_kernels for matplotlib v2

Add `edgecolors` attribute to scatter plot for better
visualization.

Issue: #8364

* Modify plot_svm_margin for matplotlib v2 comp

Add `edgecolors` attribute to scatter plot for better
visualization.

Issue: #8364

* Modify plot_svm_nonlinear for matplotlib v2

Add `edgecolors` attribute to scatter plot for matplotlib
version 2 compatibility

Issue: #8364

* Modify file for remove flake8 error

Remove extra white space.

Issue: #8364
add html-noplot and changed help message to make.bat (#8524)
* add html-noplot and help message to make.bat

* changed spaces to tab in make.bat help

* changed all spaces to tabs in make.bat update
[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold (#8120)
* Add _RepeatedSplits and RepeatedKFold class

* Add RepeatedStratifiedKFold and doc for repeated cvs

* Change default value of n_repeats

* Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state

* Generate random states in split function rather than store it beforehand

* Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes

* Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase

* Using rng directly as random_state param to create cv instance and added a check for cvargs

* Fix pep8 warnings

* Changing default values for n_splits and n_repeats and add entry in changelog

* Adding name to the feature

* Missing space
modify disadvantage (#8521)
[MRG+2] modify disadvantage
[MRG] Separated regression metrics from other metrics in test_sample_…
…weight_invariance in metrics/tests/test_common.py (#8537)

* Separated tests for regression features in test_sample_weight_invariance

* Fixed pep8

* Removed unecessary check for regression

* Updated regression metrics

* Joel's suggestions
[MRG+2] addresses #8509 improvements to f_regression documentation (#…
…8548)

* clarify role of the function and streamline introduction

* added feature selection methods to see also

* completed see also

* fixed pep related formatting for flake8checks.

* fixed extra whitespace flake8 problems, remaining failure is a copied see all line from another function, the line is over by a period, does not make sense to newline that.

* one more whitespace

* FIX small pep8 error.
[MRG] Update joblib to 0.11 (#8492)
Use pip rather than easy_install in copy_joblib.sh. Also need to remove joblib/testing.py to avoid pytest dependency.
[MRG+2] use manylinux dev wheels for numpy / scipy (#8536)
* MAINT: use manylinux dev wheels for numpy / scipy

Use daily manylinux wheels for numpy and scipy, instead of
soon-to-be-discontinued per-commit Precise wheels.

* BF: add back ATLAS install for ubuntu build entry

scikit-learn can link against BLAS libraries still at the same location
as they were duing numpy build.
[MRG + 1] Fix gradient boosting overflow and various other float comp…
…arison on == (#7970)

* reintroduced isclose() and flake8 fixes to fixes.py

* changed == 0.0 to isclose(...)

* example changes

* changed back to abs() < epsilon

* flake8 convention on file

* reverted flake8 fixes

* reverted flake8 fixes (2)

* np.finfo(np.float32).tiny instead of hard coded epsilon 1e-150

* reverted to 1e-150

* whats new modified
[MRG+1] add edgecolor to plot_pca_iris.py (#8514)
for better rendering with matplotlib 2
[MRG + 1] Correct typo in cross decomposition example (Fixes #8307) (#…
…8578)

* changed plsca to cca

* corrected variable plsca in line 56-57

vene and others added some commits Jun 9, 2017

[MRG+1] Instance level common tests (#9019)
* start work on separating instance-level tests

* minor refactoring / fixes to work without tags

* add clone into check_supervised_y_2d estimator check (which made other checks fail)

* remove duplicate check_estimator_unfitted assert

* add issue reference to whatsnew entry

* added some clones, minor fixes from vene's review

* rename estimator arg to estimator_org to make a visible distinction before and after cloning.

* more renaming for more explicit clones

* org -> orig

* allclose, fix orig stuff

* don't use set_testing_parameters in the checks!

* minor fixes for allclose

* fix some test, add more tests on classes

* added the test using pickles.

* move assert_almost_equal_dense_sparse to utils.testing, rename to assert_allclose_sparse_dense, test it

* make assert_allclose_dense_sparse more stringent

* more allclose fixes

* run test_check_estimator on all estimators

* rename set_testing_parameters to set_checking_parameters so nose doesn't think it's a tests (and I don't want to import stuff from nose as we want to remove it)

* fix in set_checking_parameters so that common tests pass

* more fixes to assert_allclose_dense_sparse

* rename alg to clusterer, don't scream even though I really want to

* ok this is not a pretty strict test that runs check_estimator with and without fitting on an instance. I also check if ``fit`` is called on the instance that is passed.

* simplify test as they didn't help at all

* it works!!! omfg

* run check_estimator clone test only on one of the configs, don't run locally by default

* Add `slow_test` decorator and documentation

* run test_check_estimator only on some estimators

* fix diags in test for older scipy

* fix pep8 and shorten

* use joblib.hash for inequality check because the pickle state machine is weird
[MRG+2] clean outlier_detection.py (#9018)
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
[MRG+1] QuantileTransformer (#8363)
* resurrect quantile scaler

* move the code in the pre-processing module

* first draft

* Add tests.

* Fix bug in QuantileNormalizer.

* Add quantile_normalizer.

* Implement pickling

* create a specific function for dense transform

* Create a fit function for the dense case

* Create a toy examples

* First draft with sparse matrices

* remove useless functions and non-negative sparse compatibility

* fix slice call

* Fix tests of QuantileNormalizer.

* Fix estimator compatibility

* List of functions became tuple of functions
* Check X consistency at transform and inverse transform time

* fix doc

* Add negative ValueError tests for QuantileNormalizer.

* Fix cosmetics

* Fix compatibility numpy <= 1.8

* Add n_features tests and correct ValueError.

* PEP8

* fix fill_value for early scipy compatibility

* simplify sampling

* Fix tests.

* removing last pring

* Change choice for permutation

* cosmetics

* fix remove remaining choice

* DOC

* Fix inconsistencies

* pep8

* Add checker for init parameters.

* hack bounds and make a test

* FIX/TST bounds are provided by the fitting and not X at transform

* PEP8

* FIX/TST axis should be <= 1

* PEP8

* ENH Add parameter ignore_implicit_zeros

* ENH match output distribution

* ENH clip the data to avoid infinity due to output PDF

* FIX ENH restraint to uniform and norm

* [MRG] ENH Add example comparing the distribution of all scaling preprocessor (#2)

* ENH Add example comparing the distribution of all scaling preprocessor

* Remove Jupyter notebook convert

* FIX/ENH Select feat before not after; Plot interquantile data range for all

* Add heatmap legend

* Remove comment maybe?

* Move doc from robust_scaling to plot_all_scaling; Need to update doc

* Update the doc

* Better aesthetics; Better spacing and plot colormap only at end

* Shameless author re-ordering ;P

* Use env python for she-bang

* TST Validity of output_pdf

* EXA Use OrderedDict; Make it easier to add more transformations

* FIX PEP8 and replace scipy.stats by str in example

* FIX remove useless import

* COSMET change variable names

* FIX change output_pdf occurence to output_distribution

* FIX partial fixies from comments

* COMIT change class name and code structure

* COSMIT change direction to inverse

* FIX factorize transform in _transform_col

* PEP8

* FIX change the magic 10

* FIX add interp1d to fixes

* FIX/TST allow negative entries when ignore_implicit_zeros is True

* FIX use np.interp instead of sp.interpolate.interp1d

* FIX/TST fix tests

* DOC start checking doc

* TST add test to check the behaviour of interp numpy

* TST/EHN Add the possibility to add noise to compute quantile

* FIX factorize quantile computation

* FIX fixes issues

* PEP8

* FIX/DOC correct doc

* TST/DOC improve doc and add random state

* EXA add examples to illustrate the use of smoothing_noise

* FIX/DOC fix some grammar

* DOC fix example

* DOC/EXA make plot titles more succint

* EXA improve explanation

* EXA improve the docstring

* DOC add a bit more documentation

* FIX advance review

* TST add subsampling test

* DOC/TST better example for the docstring

* DOC add ellipsis to docstring

* FIX address olivier comments

* FIX remove random_state in sparse.rand

* FIX spelling doc

* FIX cite example in user guide and docstring

* FIX olivier comments

* EHN improve the example comparing all the pre-processing methods

* FIX/DOC remove title

* FIX change the scaling of the figure

* FIX plotting layout

* FIX ratio w/h

* Reorder and reword the plot_all_scaling example

* Fix aspect ratio and better explanations in the plot_all_scaling.py example

* Fix broken link and remove useless sentence

* FIX fix couples of spelling

* FIX comments joel

* FIX/DOC address documentation comments

* FIX address comments joel

* FIX inline sparse and dense transform

* PEP8

* TST/DOC temporary skipping test

* FIX raise an error if n_quantiles > subsample

* FIX wording in smoothing_noise example

* EXA Denis comments

* FIX rephrasing

* FIX make smoothing_noise to be a boolearn and change doc

* FIX address comments

* FIX verbose the doc slightly more

* PEP8/DOC

* ENH: 2-ways interpolation to avoid smoothing_noise

Simplifies also the code, examples, and documentation
[MRG+1] DOC add hyperlink to example (#9097)
* DOC add hyperlink to example

* Remove useless change

* DOC fix hyperlink

* DOC fix links
[MRG] Deprecate lsh forest (#9078)
* Add deprecation message and test.

* Adding performance warning and ignore_warnings in test

* Add deprecation to whatsnew and remove LSHForest references from docs.

Removing benchmark for lsh
[MRG+1] remove n_nonzero_coefs from attr of LassoLarsCV + clean up ca…
…ll hierarchy (#9004)

* FIX : remove n_nonzero_coefs from attr of LassoLarsCV + clean up call to Lars._fit

* cleanup

* fix deprecation warning + clarify warning

* add test

* pep8

* adddress comments
FIX OvR/OvO classifier decision_function shape fixes (#9100)
* fix OVR classifier edgecase bugs

* add regression tests for OVO and OVR decision function shapes
[MRG + 1] FIX/TST revert #5802 and raise error for faulty classifier (#…
…9063)

* FIX/TST revert #5802 and raise error for faulty classifier

* FIX check_estimator take care of the rest
[FIX] BIC/AIC for Lasso (#9022)
* correcting information criterion calculation in least_angle.py

The information criterion calculation is not compatible with the original paper 
Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192.
APA

* FIX : fix AIC/BIC computation in LassoLarsIC

* update what's new

* fix test

* fix test

* address comments

* DOC comments and docstring on criterion computation
Disable pager in git commands
Long commit messages can trigger a pager which is not what you want
when running flake8_diff.sh in a terminal.
CIRCLE Revert to sphinx 1.5
to fix pdf doc generation
TRAVIS put back all the builds
now that the sprint is over.
[MRG+1] fix StratifiedShuffleSplit with 2d y (#9044)
* regression test and fix for 2d stratified shuffle split

* strengthen non-overlap sss tests

* clarify test and comment

* remove iter from tests, use str instead of hash

@Sundrique Sundrique closed this Jun 14, 2017

@Sundrique Sundrique deleted the Sundrique:fix-clone-validation-curve-estimator branch Jun 14, 2017

@Sundrique

This comment has been minimized.

Show comment
Hide comment
@Sundrique

Sundrique Jun 14, 2017

Contributor

@jnothman Sorry, screwed up the branch. Created a new one #9119.

Contributor

Sundrique commented Jun 14, 2017

@jnothman Sorry, screwed up the branch. Created a new one #9119.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment