HistGradientBoostingClassifier is not tracked on https://scikit-learn.org/scikit-learn-benchmarks/ #18775

ogrisel · 2020-11-06T15:52:03Z

https://scikit-learn.org/scikit-learn-benchmarks/

I think we should bench it with a medium sized multiclass dataset (e.g. 5 classes, 100 features, 1e4 samples) or something similar.

Ideally a fit should last at least 5s to 10s.

The code of the benchmark for estimators in the sklearn.ensemble package is hosted here:

https://github.com/scikit-learn/scikit-learn/blob/master/asv_benchmarks/benchmarks/ensemble.py

The documentation on how to use ASV to run the benchmarks locally:

https://scikit-learn.org/dev/developers/contributing.html?highlight=asv#monitoring-performance

The text was updated successfully, but these errors were encountered:

NicolasHug · 2020-11-06T18:39:53Z

Thanks for opening the issue

For prediction we can probably consider quite a bit more samples (1e6?)

We should also have 2 versions for each benchmark: one with OMP_NUM_THREADS set to 1, and one where we let it to the default (which is hopefully not sufferring from oversubscription). Some changes can be dramatically bad in the multi-threaded case and yet would go undetected in single-threaded benchmarks

ogrisel · 2020-11-08T21:28:16Z

Setting the OMP_NUM_THREADS environment variable for a single test is probably not easy for asv, but we can use threadpoolctl from within the test instead.

jeremiedbb · 2020-11-16T21:43:38Z

I opened a PR, I don't deals with the number of threads for now. I'll do it in a separate PR because it involves a few other estimators like kmeans, t-sne, ... and I'm not sure how to do it yet

ogrisel added Bug: triage Build / CI help wanted Performance and removed Bug: triage labels Nov 6, 2020

jeremiedbb mentioned this issue Nov 16, 2020

Add benchmark for HistGradientBoostingClassifier #18851

Merged

cmarmo removed the help wanted label Nov 17, 2020

ogrisel closed this as completed in #18851 Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HistGradientBoostingClassifier is not tracked on https://scikit-learn.org/scikit-learn-benchmarks/ #18775

HistGradientBoostingClassifier is not tracked on https://scikit-learn.org/scikit-learn-benchmarks/ #18775

ogrisel commented Nov 6, 2020 •

edited

Loading

NicolasHug commented Nov 6, 2020

ogrisel commented Nov 8, 2020

jeremiedbb commented Nov 16, 2020

HistGradientBoostingClassifier is not tracked on https://scikit-learn.org/scikit-learn-benchmarks/ #18775

HistGradientBoostingClassifier is not tracked on https://scikit-learn.org/scikit-learn-benchmarks/ #18775

Comments

ogrisel commented Nov 6, 2020 • edited Loading

NicolasHug commented Nov 6, 2020

ogrisel commented Nov 8, 2020

jeremiedbb commented Nov 16, 2020

ogrisel commented Nov 6, 2020 •

edited

Loading