Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] accelerate stacking-classifier in test_common.py::test_ensemble_heterogeneous_estimators_behavior #21562

Merged

Conversation

chritter
Copy link
Contributor

@chritter chritter commented Nov 5, 2021

Reference Issues/PRs

Towards #21407

What does this implement/fix? Explain your changes.

These changes are accelerating test case test_common.py::test_ensemble_heterogeneous_estimators_behavior

Any other comments?

#DataUmbrella Sprint

@chritter chritter changed the title accelerated test ensemble stacking-classifier cv [WIP] accelerate stacking-classifier in test_common.py::test_ensemble_heterogeneous_estimators_behavior Nov 5, 2021
@@ -32,7 +32,8 @@
("lr", LogisticRegression()),
("svm", LinearSVC()),
("rf", RandomForestClassifier()),
Copy link
Member

@ogrisel ogrisel Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think it would be possible to make it run even faster with:

Suggested change
("rf", RandomForestClassifier()),
("rf", RandomForestClassifier(n_estimators=5, max_depth=3)),

Could you also update the related parametrized config for the same test function below?

Please also report the timings you get when running this test on your local machine with --durations=10.

Copy link
Contributor Author

@chritter chritter Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel Thank you very much for your suggestions. Note that this PR is still WIP and I plan to finish the optimization with your advice and the appropriate measurements reported.

@chritter
Copy link
Contributor Author

chritter commented Nov 8, 2021

Speed improvements reported as test duration 5-times averaged:

Setup Test Duration
Original 2.52s
with cv=2 1.98s
with cv=2 + Randomforestclassifier(n_estimators=5, max_depth=3) 0.24s

Note: With final improvement LR fit: 27ms, RF fit: 17ms, , SVM fit: 1ms.

@chritter
Copy link
Contributor Author

chritter commented Nov 8, 2021

Based on @ogrisel suggestion I have committed the speed optimization of RandomForestRegressor(n_estimators=5, max_depth=3) for voting-classifier, stacking-regressor and voting-regressor. The following speed-ups are achieved (5-times measurements). To speed-up stacking-regressor I have also used cv=2.

Algo Setup Test Duration
Voting Classsifer Original 0.60s
Voting Classsifer Optimized 0.05s
StackingRegressor Original 0.41s
StackingRegressor Optimized 0.10s
VotingRegressor Original 0.46s
VotingRegressor Optimized 0.04s

@chritter chritter changed the title [WIP] accelerate stacking-classifier in test_common.py::test_ensemble_heterogeneous_estimators_behavior [MRG] accelerate stacking-classifier in test_common.py::test_ensemble_heterogeneous_estimators_behavior Nov 9, 2021
…nto test-speed-improvement-stacking-classifier
…nto test-speed-improvement-stacking-classifier
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @chritter !

LGTM!

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much! LGTM.

@ogrisel ogrisel merged commit 5f3d1e5 into scikit-learn:main Dec 6, 2021
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Dec 9, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
glemaitre pushed a commit that referenced this pull request Dec 25, 2021
mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants