Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Flow to test for sklearn compatibility #116

Merged
merged 103 commits into from Aug 18, 2019
Merged

[MRG] Flow to test for sklearn compatibility #116

merged 103 commits into from Aug 18, 2019

Conversation

GillesVandewiele
Copy link
Contributor

Hello,

This is a PR which allows to test automatically for all tslearn estimators whether they comply to the required checks of sklearn, allowing them to be used in their utilities such as GridSearchCV, Pipeline, ... The code to do this is currently located in tslearn/testing_utils.py, but should be moved to tslearn/testing when available.

I also included an example demonstrating how GlobalGAKMeans can now be used with an sklearn pipeline, in tslearn/docs/examples/plot_gakkmeans_sklearn.

All feedback is more than welcome!

Kind regards,
Gilles

@pep8speaks
Copy link

pep8speaks commented Jul 2, 2019

Hello @GillesVandewiele! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 600:80: E501 line too long (84 > 79 characters)
Line 679:80: E501 line too long (111 > 79 characters)

Comment last updated at 2019-08-18 09:19:11 UTC

@rtavenar
Copy link
Member

rtavenar commented Jul 2, 2019

Hi @GillesVandewiele

Thanks, this helps a lot. If you are OK with that, it would be better to fix all estimators in this PR before merging so that we do not forget to change others later. Also, your testing_utils.py will be considered as a standard test if it works well, then that means that all tests should pass before merging, hence all estimators should be fixed before merging.

@rtavenar rtavenar changed the title Flow to test for sklearn compatibility + fixing 1 estimator as an example [WIP] Flow to test for sklearn compatibility Jul 2, 2019
@GillesVandewiele
Copy link
Contributor Author

Hi @rtavenar,

Sure no problem at all. Do you happen to know whether I can run the script travis is running locally (and how I would do that)?

Something that is important to take into account is that we do not break any of the current tslearn functionality, for which there is no unit test in place currently, by refactoring to aggressively.

@rtavenar
Copy link
Member

rtavenar commented Jul 2, 2019

All functionalities in tslearn should be unit tested at the moment, though I suspect this is not the case, but we're doing our best for that.

The command Travis CI is running once everything is installed is:
https://github.com/rtavenar/tslearn/blob/master/.travis.yml#L57
and you can set ${KERAS_IGNORE} to an empty string

@rtavenar rtavenar self-requested a review July 2, 2019 16:52
@rtavenar rtavenar self-assigned this Jul 2, 2019
Copy link
Member

@rtavenar rtavenar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GillesVandewiele

Thanks for this nice piece of work. This is a very good thing to ensure that our estimators are valid sklearn ones. I suggested a few changes, let me know what you think.

I also plan to invite @johannfaouzi as an additional reviewer for this PR, once I find how to do so :)

Also, of course, tests should pass on Travis before merging

tslearn/clustering.py Outdated Show resolved Hide resolved
tslearn/clustering.py Show resolved Hide resolved
tslearn/docs/examples/plot_gakkmeans_sklearn.py Outdated Show resolved Hide resolved
tslearn/docs/examples/plot_gakkmeans_sklearn.py Outdated Show resolved Hide resolved
tslearn/docs/examples/plot_knnts_sklearn.py Outdated Show resolved Hide resolved
tslearn/preprocessing.py Show resolved Hide resolved
tslearn/utils.py Outdated Show resolved Hide resolved
tslearn/testing_utils.py Outdated Show resolved Hide resolved
tslearn/testing_utils.py Outdated Show resolved Hide resolved
print('{} is sklearn compliant.'.format(estimator[0]))


check_all_estimators()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for that call if it is in a test suite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tests are run automatically with a bash script. All the functions that are tests should start with test_ and should raise an error if the test is not successful (using assert or a function from np.testing or sklearn.testing for instance).

There are many folders with tests in scikit-learn, you can have a look at sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py for instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also catch errors and warnings with pytest (if you want to test that an error or a warning is raised):

@rtavenar rtavenar assigned rtavenar and unassigned rtavenar Jul 2, 2019
Copy link
Contributor

@johannfaouzi johannfaouzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! It's nice to hear from you this close after the conference ;)

I briefly looked at the PR and left some comments, I'll try to have a deeper look at it soon.

tslearn/utils.py Outdated Show resolved Hide resolved
print('{} is sklearn compliant.'.format(estimator[0]))


check_all_estimators()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tests are run automatically with a bash script. All the functions that are tests should start with test_ and should raise an error if the test is not successful (using assert or a function from np.testing or sklearn.testing for instance).

There are many folders with tests in scikit-learn, you can have a look at sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py for instance.

tslearn/preprocessing.py Show resolved Hide resolved
print('{} is sklearn compliant.'.format(estimator[0]))


check_all_estimators()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also catch errors and warnings with pytest (if you want to test that an error or a warning is raised):

tslearn/clustering.py Outdated Show resolved Hide resolved
@rtavenar rtavenar added this to In progress in Towards v.0.2 Jul 3, 2019
@rtavenar rtavenar removed this from In progress in Towards v.0.2 Jul 3, 2019
tslearn/svm.py Outdated Show resolved Hide resolved
tslearn/svm.py Outdated Show resolved Hide resolved
@rtavenar
Copy link
Member

This is great!

You really did an amazing job at that PR!

I have added some comments. You can take them into account, wait for feedback from @johannfaouzi and then we will be good.

Ah, and you can also add your name in third position in the list of authors for tslearn reference (in README.md and in the bibtex variable defined in tslearn/__init__.py (and if you could add Johann there as well that would be great :).

Best,
Romain

@GillesVandewiele
Copy link
Contributor Author

Thank you @rtavenar! I did underestimate the amount of work it was going to take, but I'm happy that it is done :)

I will implement your comments tomorrow and add @johannfaouzi and myself to the author list, thank you for that!

@GillesVandewiele
Copy link
Contributor Author

I implemented your comments and managed to reduce the travis build times to < 20 minutes (python2.7 takes roughly 5 minutes longer than the other versions).

Copy link
Contributor

@johannfaouzi johannfaouzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @GillesVandewiele for this big PR. It represents a huge amount of work. Sorry for the delay for my review.

I left a lot of comments, but most of them are redundant. My comments are not necessarily right, so take your time, the discussion is obviously open, and your remarks are more than welcomed.

My remarks are sometimes about Romain's code, but since you made some changes in the same class/function, maybe you can work on them.

tslearn/docs/examples/plot_knnts_sklearn.py Outdated Show resolved Hide resolved
tslearn/docs/examples/plot_knnts_sklearn.py Outdated Show resolved Hide resolved
tslearn/docs/examples/plot_knnts_sklearn.py Show resolved Hide resolved
tslearn/clustering.py Outdated Show resolved Hide resolved
tslearn/clustering.py Outdated Show resolved Hide resolved
tslearn/shapelets.py Outdated Show resolved Hide resolved
tslearn/shapelets.py Show resolved Hide resolved
tslearn/shapelets.py Outdated Show resolved Hide resolved
tslearn/svm.py Show resolved Hide resolved
tslearn/svm.py Show resolved Hide resolved
@GillesVandewiele
Copy link
Contributor Author

Thank you so much @johannfaouzi.

I will integrate your comments, or reply to them, as soon as I have some spare time :). I should be able to do this by the end of this week. If I'm right, Romain is currently on holidays anyway, so there's no urge.

@rtavenar
Copy link
Member

@GillesVandewiele

Since I have been away for some time, I am unsure I followed all your updates. It seems most comments have been taken into account (are now outdated). Are there still questions pending? Or are we ready for merge? @johannfaouzi what do you think?

@GillesVandewiele
Copy link
Contributor Author

Hi Romain, hope you enjoyed your holidays! There's a few issues left open where would like to hear your opinion on. I'll place a reaction on those.

Normally, I should be able to integrate these last comments this weekend, after which it should be ready to merge :).

@GillesVandewiele
Copy link
Contributor Author

Alright, I integrated the final comments. Thanks for the feedback @rtavenar and @johannfaouzi!

Travis is building, and no pep8 issues left. I think this is now ready to get merged 🙌 :)

@johannfaouzi
Copy link
Contributor

LGTM. Thanks for the huge PR!

@GillesVandewiele
Copy link
Contributor Author

Thanks @johannfaouzi! I did notice that I still used verbose_level at some places in shapelets.py, I quickly replaced those usages as well :)

@rtavenar rtavenar merged commit 6655070 into tslearn-team:dev Aug 18, 2019
@rtavenar
Copy link
Member

Thanks a lot @GillesVandewiele for this huge PR!

@rtavenar rtavenar changed the title [WIP] Flow to test for sklearn compatibility [MRG] Flow to test for sklearn compatibility Aug 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants