Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] implement least absolute deviation loss in GBDTs #13896

Merged
merged 19 commits into from Sep 9, 2019

Conversation

@NicolasHug
Copy link
Contributor

NicolasHug commented May 16, 2019

This PR implements the least absolute deviation (or MAE) loss for GBDTs.

It's not as trivial as it seems, since the leaves values have to be updated after the tree is trained. Indeed for MAE, we train the trees to fit the gradients (or more accuratly a Newton–Raphson step), but we need them to predict the median of the residuals. See the original paper for more details.

See also the current implementation of this for the old version of GBDTs, in particular _update_terminal_region() in sklearn/ensemble/gb_losses.py

Will ping when this is ready.


This is a bit slower than LightGBM, since the leaves values update is not done in parallel here.

python benchmarks/bench_hist_gradient_boosting.py --lightgbm --problem regression --n-samples-max 5000000 --n-trees 50 --loss least_absolute_deviation

mae

nit
@NicolasHug NicolasHug changed the title [WIP] implement least absolute deviation loss in GBDTs [MRG] implement least absolute deviation loss in GBDTs May 16, 2019
@NicolasHug

This comment has been minimized.

Copy link
Contributor Author

NicolasHug commented May 16, 2019

I think it's ready, ping @ogrisel @glemaitre @adrinjalali since you guys are the most familiar with the code ;)

# loss
# If the hessians are constant, we consider they are equal to 1.
# - This is correct for the half LS loss
# - For LAD loss, hessians are actually 0, but they are always

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Jun 14, 2019

Member

if that's the case, doesn't it make sense to actually set them to 0? I'm kinda worried about maintainability of this.

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Jun 14, 2019

Member

or even have a parameter in the loss class which is the function which gives you the constant hessians, i.e. for LAD it'd be np.zeros.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Jun 14, 2019

Author Contributor

IMO it's more maintainable to have the convention that constant hessians are always 1 rather than have custom constant hessians for each loss. Especially when these hessians are never used, like here.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Jun 14, 2019

Author Contributor

Also, that's something that's always implicit and not explained in the papers, but hessians need to be treated as 1 (even though they're 0): in order for the leaf value computation to be an average instead of a sum.

Copy link
Contributor

glemaitre left a comment

It looks good otherwise. You will need an entry in what's new as well.

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Jun 19, 2019

I just went to check our docs, and noticed/remembered we still don't have good examples/user guide for these classes. But other than that, it seems the loss function names are not consistent with the other ensemble methods, for instance the GBR calls this LAD, I think.

('binary_crossentropy', 2, 1),
('categorical_crossentropy', 3, 3),
])
@pytest.mark.skipif(Y_DTYPE != np.float64,
reason='Need 64 bits float precision for numerical checks')
def test_numerical_gradients(loss, n_classes, prediction_dim):
@pytest.mark.parametrize('seed', range(1))

This comment has been minimized.

Copy link
@adrinjalali

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Jun 19, 2019

Author Contributor

^^

I can remove it, it's just convenient when testing locally

@NicolasHug

This comment has been minimized.

Copy link
Contributor Author

NicolasHug commented Jun 19, 2019

we still don't have good examples/user guide for these classes.

don't worry that's still on my stack, I'm just waiting for a few more features

the loss function names are not consistent with the other ensemble methods

right, we broke name consistency with the other implementation already ('ls' vs 'least-squares', 'deviance' vs 'binary-crossentropy', etc.)

NicolasHug and others added 5 commits Jun 19, 2019
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…t-learn into absolute_loss_hist_gbdt
Copy link
Member

adrinjalali left a comment

LGTM :)

NicolasHug added 2 commits Jun 21, 2019
…solute_loss_hist_gbdt
…solute_loss_hist_gbdt
@NicolasHug

This comment has been minimized.

Copy link
Contributor Author

NicolasHug commented Aug 2, 2019

@glemaitre you were OK with this I think, do you wanna have another look so we can merge? Thanks!

@NicolasHug

This comment has been minimized.

Copy link
Contributor Author

NicolasHug commented Aug 21, 2019

Ping @glemaitre ;)

Maybe @ogrisel too?

@@ -668,7 +672,8 @@ class HistGradientBoostingRegressor(BaseHistGradientBoosting, RegressorMixin):
Parameters
----------
loss : {'least_squares'}, optional (default='least_squares')
loss : {'least_squares', 'least_absolute_deviation'}, \
optional (default='least_squares')

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 23, 2019

Contributor

do we remove right away the optional and the parenthesis?

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 23, 2019

Author Contributor

I'd rather keep the current docstring consistent with the other entries but I'm +1 in updating all of them in another PR

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 23, 2019

Contributor

OK fine with me

@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Aug 23, 2019

I am still fine with the PR. But before to merge, would it make sense to add a test in test_compare_lightgbm.py to check that we get a similar result?

@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Aug 23, 2019

and regarding the test, WDYT?

@NicolasHug

This comment has been minimized.

Copy link
Contributor Author

NicolasHug commented Aug 23, 2019

I agree. It currently fails, I'm investigating why. This has to do with the initial predictions being different from lightgbm and sklearn.

@NicolasHug

This comment has been minimized.

Copy link
Contributor Author

NicolasHug commented Aug 23, 2019

OK I sorted it out. Added a comment:

    # - We don't check the least_absolute_deviation loss here. This is because
    #   LightGBM's computation of the median (used for the initial value of
    #   raw_prediction) is a bit off (they'll e.g. return midpoints when there
    #   is no need to.). Since these tests only run 1 iteration, the
    #   discrepancy between the initial values leads to biggish differences in
    #   the predictions. These differences are much smaller with more
    #   iterations.
@glemaitre glemaitre merged commit 5cf88db into scikit-learn:master Sep 9, 2019
17 checks passed
17 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.9%)
Details
codecov/project 96.91% (+<.01%) compared to edcfa0d
Details
scikit-learn.scikit-learn Build #20190823.28 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl_pandas) Linux pylatest_conda_mkl_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Sep 9, 2019

OK make sense. Merging then. Thanks @NicolasHug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.