[RFC] Gradient Boosting improvement #8231

glemaitre · 2017-01-24T16:57:50Z

Current situation

The speed performance of the scikit-learn gradient-boosting is relatively slow compared to the other implementations: xgboost and lightgbm.

Which bottlenecks have been identified

During some benchmarking, we could identified a major difference between scikit-learn and xgboost.

For xgboost, all samples for a given feature will be scanned. Each sample will be used to update the impurity statistic of the node to which it belongs to (cf. here).
On the contrary in scikit-learn, only the samples for a node will be selected (cf. here or here). Therefore, it involves repetitive scans which are not necessary.

What have been tested

In an effort of speeding up the gradient-boosting, @ogrisel thought about implementing a subsampling at each split, lowering the computation cost while sorting and computing the statistic. The obtained results do not allow to find a satisfactory trade-off time/accuracy performance to compete with the other implementations.

Proposal

I would like to contribute with two changes:

Propose a prototype of the gradient boosting modifying the samples scanning and splitting strategies. It will required a change of the decision tree algorithm (that's why we want a prototype first). This change should allow to match the performance of xgboost while using the exact method. [WIP] Alternative architecture for regression tree #8458
Implement an approximation splitter by binning the data similarly to lightgbm and xgboost. This change allowed xgboost to match lightgbm performance recently (cf. here and here).

Related issue

#5212

@GaelVaroquaux @glouppe @ogrisel @jmschrei @raghavrv If you have any comments, I will be happy to hear them.

The text was updated successfully, but these errors were encountered:

raghavrv · 2017-01-24T18:29:06Z

@glouppe was very interested in the approximate splitter. I also think it would be a nice addition helping speed things up a lot... I'm also pinging @jnothman and @nelson-liu who might be interested in reviewing this...

glouppe · 2017-01-24T18:41:48Z

For xgboost, all samples for a given feature will be scanned. Each sample will be used to update the impurity statistic of the node to which it belongs to (cf. here).
On the contrary in scikit-learn, only the samples for a node will be selected (cf. here or here). Therefore, it involves repetitive scans which are not necessary.

Propose a prototype of the gradient boosting modifying the samples scanning and splitting strategies. It will required a change of the decision tree algorithm (that's why we want a prototype first). This change should allow to match the performance of xgboost while using the exact method.

Can you elaborate on this?

In our case, sample_mask is only used with presorting, which we can get rid off if benchmarks show it is indeed slower than the standard implementation.

How does our standard implementation (without presorting) compare with xgboost? both in terms of algorithms and performance?

glouppe · 2017-01-24T18:42:37Z

Implement an approximation splitter by binning the data

+1 for binning

jmschrei · 2017-01-24T18:48:26Z

I'm also a bit confused what you mean by repetitive scans. We have two different sorting mechanisms, presorting and not. Presorting works better on smaller datasets with a smaller number of features, whereas not presorting works better on larger amounts of data. XGBoost uses presorting (when I last looked at the code) and we do not, but we provide the option for allowing the user to do such.

glouppe · 2017-01-24T19:08:14Z

In our case, sample_mask is only used with presorting, which we can get rid off if benchmarks show it is indeed slower than the standard implementation.

Ok, I think I understand. We could could rid of the sample_mask logic by looking for splits across all nodes in parallel -- this is what I think they do in xgboost by maintaining several Criterion-like objects.
Is that more or less what you have in mind?

glemaitre · 2017-01-24T19:16:54Z

@glouppe in xgboost, sample_mask would store the ID of the current node and as you said, will allow to maintain several criterions.
It allows to make one scan of sample_mask. In scikit-learn, a scan is performed for each node to be split.

glouppe · 2017-01-25T10:04:21Z

@glouppe in xgboost, sample_mask would store the ID of the current node and as you said, will allow to maintain several criterions.
It allows to make one scan of sample_mask. In scikit-learn, a scan is performed for each node to be split.

Yup, looks like something promising.

+1 for both changes.

(However, to avoid never-ending PRs, please propose the changes separately, possibly with the least amount of changes to the other parts of the codebase.)

glemaitre · 2017-01-25T10:18:51Z

@glouppe Yep, I am gonna make separated PRs.

lorentzenchr · 2022-10-22T11:25:21Z

Can we close this as #12807 (HistGradientBoostingClassifier and HistGradientBoostingRegressor) is merged?

glemaitre · 2022-10-24T09:03:41Z

Yes this is done.

glemaitre mentioned this issue Feb 25, 2017

[WIP] Alternative architecture for regression tree #8458

Closed

8 tasks

glemaitre mentioned this issue Mar 13, 2017

[MRG + 2] EHN additional test for trees regarding fitting behaviour with constant features #8580

Merged

andreh7 mentioned this issue Apr 23, 2017

n_jobs support in GradientBoostingClassifier #3628

Closed

NicolasHug mentioned this issue Dec 17, 2018

[MRG+2] Faster Gradient Boosting Decision Trees with binned features #12807

Merged

cmarmo added the module:ensemble label Dec 15, 2021

glemaitre closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Gradient Boosting improvement #8231

[RFC] Gradient Boosting improvement #8231

glemaitre commented Jan 24, 2017 •

edited

Loading

raghavrv commented Jan 24, 2017

glouppe commented Jan 24, 2017 •

edited

Loading

glouppe commented Jan 24, 2017

jmschrei commented Jan 24, 2017

glouppe commented Jan 24, 2017

glemaitre commented Jan 24, 2017

glouppe commented Jan 25, 2017

glemaitre commented Jan 25, 2017

lorentzenchr commented Oct 22, 2022

glemaitre commented Oct 24, 2022

[RFC] Gradient Boosting improvement #8231

[RFC] Gradient Boosting improvement #8231

Comments

glemaitre commented Jan 24, 2017 • edited Loading

Current situation

Which bottlenecks have been identified

What have been tested

Proposal

Related issue

raghavrv commented Jan 24, 2017

glouppe commented Jan 24, 2017 • edited Loading

glouppe commented Jan 24, 2017

jmschrei commented Jan 24, 2017

glouppe commented Jan 24, 2017

glemaitre commented Jan 24, 2017

glouppe commented Jan 25, 2017

glemaitre commented Jan 25, 2017

lorentzenchr commented Oct 22, 2022

glemaitre commented Oct 24, 2022

glemaitre commented Jan 24, 2017 •

edited

Loading

glouppe commented Jan 24, 2017 •

edited

Loading