Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fixes to test_pprint.py breaking after changes to the tested API #13529

Merged
merged 43 commits into from
Apr 9, 2019

Conversation

scouvreur
Copy link
Contributor

@scouvreur scouvreur commented Mar 27, 2019

Reference Issues/PRs

This PR is an improvement for issue #13508.

What does this implement/fix? Explain your changes.

This change attempts to improve pain points found in issue #13470, where changes to the underlying API parameters for which sklearn/utils/tests/test_pprint.py is tested causes the test to fail.

As a solution, this PR adds minimal examples of the classes listed below to the test itself.

To-do

Example constructors needing to be excerpted to test_pprinting:

  • Logistic Regression
  • StandardScaler
  • Pipeline
  • RFE
  • GridSearchCV
  • CountVectorizer
  • SVC
    • LinearSVC (calls were replaced with SVC and expected string adapted)
  • PCA
  • NMF
  • SimpleImputer

@jnothman
Copy link
Member

You haven't imported BaseEstimator

@scouvreur
Copy link
Contributor Author

When I use pytest test_pprint.py to run tests on my changes, I seem to get an error at the make_pipeline(StandardScaler(), LogisticRegression(C=999)) stage, due to the a string not being 'passthrough'. Do you have any idea why that might be ?

Full stacktrace here:

$ pytest test_pprint.py 
================================================ test session starts ================================================
platform darwin -- Python 3.6.8, pytest-4.3.0, py-1.8.0, pluggy-0.9.0
rootdir: /Users/stephane.couvreur/Documents/Open_Source/scikit-learn, inifile: setup.cfg
plugins: remotedata-0.3.1, openfiles-0.3.2, cov-2.6.1
collected 9 items                                                                                                   

test_pprint.py ..F......                                                                                      [100%]

===================================================== FAILURES ======================================================
___________________________________________________ test_pipeline ___________________________________________________

    def test_pipeline():
        # Render a pipeline object
>       pipeline = make_pipeline(StandardScaler(), LogisticRegression(C=999))

test_pprint.py:94: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../pipeline.py:604: in make_pipeline
    return Pipeline(_name_estimators(steps), memory=memory)
../../pipeline.py:119: in __init__
    self._validate_steps()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = Pipeline(memory=None,
         steps=[('standardscaler',
                 StandardScaler(copy=True, with_mean=True, wi...                         solver='warn', tol=0.0001, verbose=0,
                                    warm_start=False))])

    def _validate_steps(self):
        names, estimators = zip(*self.steps)
    
        # validate names
        self._validate_names(names)
    
        # validate estimators
        transformers = estimators[:-1]
        estimator = estimators[-1]
    
        for t in transformers:
            if t is None or t == 'passthrough':
                continue
            if (not (hasattr(t, "fit") or hasattr(t, "fit_transform")) or not
                    hasattr(t, "transform")):
                raise TypeError("All intermediate steps should be "
                                "transformers and implement fit and transform "
                                "or be the string 'passthrough' "
                                "'%s' (type %s) doesn't" % (t, type(t)))
    
        # We allow last estimator to be None as an identity transformation
        if (estimator is not None and estimator != 'passthrough'
                and not hasattr(estimator, "fit")):
            raise TypeError(
                "Last step of Pipeline should implement fit "
                "or be the string 'passthrough'. "
>               "'%s' (type %s) doesn't" % (estimator, type(estimator)))
E           TypeError: Last step of Pipeline should implement fit or be the string 'passthrough'. 'LogisticRegression(C=999, class_weight=None, dual=False, fit_intercept=True,
E                              intercept_scaling=1, l1_ratio=None, max_iter=100,
E                              multi_class='warn', n_jobs=None, penalty='l2',
E                              random_state=None, solver='warn', tol=0.0001, verbose=0,
E                              warm_start=False)' (type <class 'sklearn.utils.tests.test_pprint.LogisticRegression'>) doesn't

../../pipeline.py:176: TypeError
======================================== 1 failed, 8 passed in 0.87 seconds =======================================

@jnothman
Copy link
Member

jnothman commented Mar 30, 2019 via email

@scouvreur
Copy link
Contributor Author

Apologies @jnothman @NicolasHug - is code coverage reduction inevitable as we excerpt more classes to test_pprint.py ?

@NicolasHug
Copy link
Member

Don't worry about code coverage for now

@scouvreur
Copy link
Contributor Author

Thanks @NicolasHug got it - I will keep working through the list then

@scouvreur scouvreur changed the title [WIP] Fixes to test_pprint.py breaking after changes to the tested API [MRG] Fixes to test_pprint.py breaking after changes to the tested API Apr 3, 2019
@scouvreur
Copy link
Contributor Author

scouvreur commented Apr 5, 2019

Should I also excerpt the following @NicolasHug ?

  • SelectKBest (signature unlikely to change)
  • chi2 (signature unlikely to change)
  • SVC
    • LinearSVC (replace w/ SVC)
  • PCA
  • NMF
  • SimpleImputer

hermidalc added a commit to hermidalc/scikit-learn that referenced this pull request Apr 5, 2019
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, this looks good so far

Should I also excerpt the following @NicolasHug ?

  • SelectKBestP: No (signature is unlikely to change)
  • chi2: No (same)
  • SVC yes please
  • LinearSVC No, feel free to replace its use by SVC instead (you'll need to update the expected strings ;)). Else, yes
  • PCA yes please
  • NMF Yes
  • SimpleImputer Yes

In general, try inheriting from as few classes as possible. BaseEstimator is often enough.

sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved
hermidalc added a commit to hermidalc/scikit-learn that referenced this pull request Apr 6, 2019
@hermidalc
Copy link
Contributor

hermidalc commented Apr 8, 2019

Sorry guys - I will just wait on some changes to OpenMP (#13390, #13543) to be able to build and run tests locally

Hi @scouvreur - are you using Anaconda? Create an independent conda environment just for sklearn development. You just need to run this below and it will have everything to build and test:

conda create -n sklearn-dev docutils matplotlib numpydoc cython joblib pandas pillow pyamg pytest python scipy sphinx
conda activate sklearn-dev

@scouvreur
Copy link
Contributor Author

Thanks alot for the tip @hermidalc ! I still could not build on a Mac OS machine, but on Ubuntu I was able. In any case it is better practice to work within a clean env.

@NicolasHug
Copy link
Member

Have you tried https://scikit-learn.org/dev/developers/advanced_installation.html#mac-osx ?

Merging #13543 might take some time

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.
This should certainly make these tests less brittle

sklearn/neural_network/tests/test_mlp.py Show resolved Hide resolved
@scouvreur
Copy link
Contributor Author

scouvreur commented Apr 9, 2019

Have you tried https://scikit-learn.org/dev/developers/advanced_installation.html#mac-osx ?
Merging #13543 might take some time

Thanks @NicolasHug - that worked !

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff is a bit strange but LGTM anyway

sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff is a bit strange but LGTM anyway

@scouvreur
Copy link
Contributor Author

Thanks @NicolasHug @hermidalc @jnothman for your feedback and support - again alot of learning from solving this ! I will keep going, see you in the next one.

@jnothman jnothman merged commit 0e3c187 into scikit-learn:master Apr 9, 2019
@jnothman
Copy link
Member

jnothman commented Apr 9, 2019

Thanks @scouvreur!

hermidalc added a commit to hermidalc/scikit-learn that referenced this pull request Apr 9, 2019
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants