GradientBoostingClassifier train_score bug #2053

Closed
AngeldsWang opened this Issue Jun 11, 2013 · 11 comments

Comments

Projects
None yet
6 participants
@AngeldsWang

When I increased the n_estimators to 1000, the train score, suddenly, jumped to a large number as follows:
...
built tree 723 of 1000, train score = 1.318154e-01
built tree 724 of 1000, train score = 1.317394e-01
built tree 725 of 1000, train score = 1.317101e-01
built tree 726 of 1000, train score = 6.835322e+12
built tree 727 of 1000, train score = 6.835322e+12
built tree 728 of 1000, train score = 6.835322e+12
built tree 729 of 1000, train score = 6.835322e+12
built tree 730 of 1000, train score = 6.835322e+12
...

and kept 6.835322e+12 with no change.

Thanks

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jun 13, 2013

Member

It looks like a numerical precision issue. Could you please provide a script to reproduce the issue? Have you tried to reproduce it on a random dataset?

Member

ogrisel commented Jun 13, 2013

It looks like a numerical precision issue. Could you please provide a script to reproduce the issue? Have you tried to reproduce it on a random dataset?

@pprett

This comment has been minimized.

Show comment Hide comment
@pprett

pprett Jun 13, 2013

Member

gradient boosting with log loss is a bit prone to numerical precision issues. Can you please post your parameters and dataset stats - it might help to lower the learning rate, otherwise I'd try to identify for which examples the residuals explode and remove them from the training set (this is akin to the "trick" that Friedman proposed in the original paper)

Member

pprett commented Jun 13, 2013

gradient boosting with log loss is a bit prone to numerical precision issues. Can you please post your parameters and dataset stats - it might help to lower the learning rate, otherwise I'd try to identify for which examples the residuals explode and remove them from the training set (this is akin to the "trick" that Friedman proposed in the original paper)

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jun 13, 2013

Member

@pprett would it be possible to detect over/underflows from time to time and raise an exception as soon as it is detected maybe with some runtime info to help identify the cause?

Member

ogrisel commented Jun 13, 2013

@pprett would it be possible to detect over/underflows from time to time and raise an exception as soon as it is detected maybe with some runtime info to help identify the cause?

@pprett

This comment has been minimized.

Show comment Hide comment
@pprett

pprett Jun 13, 2013

Member

actually, I thought we fixed the issue... anyways, I'll try to make it more user friendly

Member

pprett commented Jun 13, 2013

actually, I thought we fixed the issue... anyways, I'll try to make it more user friendly

@AngeldsWang

This comment has been minimized.

Show comment Hide comment
@AngeldsWang

AngeldsWang Jun 14, 2013

Well, I only change the ' n_estimators = 1000 ' and ' verbose = 2 ' to print the training-info as,
"""
classifier = GradientBoostingClassifier( n_estimators = 1000, max_depth = 3, verbose = 2 )
classifier.fit( features, labels )
"""
My dataset has 300,000 samples with a 326-dimension feature for each one.
btw, I found a more dimension feature, like 500, could avoid this results.

Well, I only change the ' n_estimators = 1000 ' and ' verbose = 2 ' to print the training-info as,
"""
classifier = GradientBoostingClassifier( n_estimators = 1000, max_depth = 3, verbose = 2 )
classifier.fit( features, labels )
"""
My dataset has 300,000 samples with a 326-dimension feature for each one.
btw, I found a more dimension feature, like 500, could avoid this results.

@pprett

This comment has been minimized.

Show comment Hide comment
@pprett

pprett Jul 23, 2013

Member

it seems to me that the overflow happens in the computation of the loss but not in the gradient. @AngeldsWang what do you see when you look at the test error? It would be great if you could plot the test error as a function of the boosting iteration - you can do this via predict_stages::

for i, y_pred in enumerate(clf.staged_predict(X_test)):
    score[i] = accuracy_score(y_test, y_pred)

I'm curios what happens on iterations 725 and 726

Member

pprett commented Jul 23, 2013

it seems to me that the overflow happens in the computation of the loss but not in the gradient. @AngeldsWang what do you see when you look at the test error? It would be great if you could plot the test error as a function of the boosting iteration - you can do this via predict_stages::

for i, y_pred in enumerate(clf.staged_predict(X_test)):
    score[i] = accuracy_score(y_test, y_pred)

I'm curios what happens on iterations 725 and 726

@ghost ghost assigned pprett Jul 23, 2013

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jan 7, 2014

Member

any news on this? @AngeldsWang

Member

amueller commented Jan 7, 2014

any news on this? @AngeldsWang

@amueller amueller added this to the 0.15.1 milestone Jul 18, 2014

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jul 18, 2014

Member

Should we close this?

Member

amueller commented Jul 18, 2014

Should we close this?

@arjoly

This comment has been minimized.

Show comment Hide comment
@arjoly

arjoly Jul 18, 2014

Member

Either it comes from instability inthe computation of the variance, either in one of the formula of gradient boosting. Hard to fix without data.

Member

arjoly commented Jul 18, 2014

Either it comes from instability inthe computation of the variance, either in one of the formula of gradient boosting. Hard to fix without data.

@amueller amueller closed this Jul 18, 2014

@tachim

This comment has been minimized.

Show comment Hide comment
@tachim

tachim Apr 22, 2015

I'm also running into a similar issue:

      Iter       Train Loss   Remaining Time
         1           0.0595            1.44m
         2           0.0564            1.04m
         3           0.0543           54.16s
         4           0.0522           49.75s
         5           0.0497           47.08s
         6           0.0484           45.19s
         7           0.0476           43.65s
         8           0.0468           42.42s
         9           0.0460           41.42s
        10           0.0461           40.60s
        20         833.2245           34.44s
        30         833.2211           29.60s
        40         833.2189           25.10s
        50 23334983995534299609868282309893906660865323621944938715996244163978644380131800406273835948969081850597748
44247687200338083840.0000           20.84s
        60 23334983995534299609868282309893906660865323621944938715996244163978644380131800406273835948969081850597748
44247687200338083840.0000           16.69s
        70 23334983995534299609868282309893906660865323621944938715996244163978644380131800406273835948969081850597748
44247687200338083840.0000           12.53s
        80 2333498399553429960986828230989390666086532362194493871599624416397864438013180040627383594896908185059774844247687200338083840.0000            8.35s


        90 2333498399553429960986828230989390666086532362194493871599624416397864438013180040627383594896908185059774844247687200338083840.0000            4.18s
       100 2333498399553429960986828230989390666086532362194493871599624416397864438013180040627383594896908185059774844247687200338083840.0000            0.00s

tachim commented Apr 22, 2015

I'm also running into a similar issue:

      Iter       Train Loss   Remaining Time
         1           0.0595            1.44m
         2           0.0564            1.04m
         3           0.0543           54.16s
         4           0.0522           49.75s
         5           0.0497           47.08s
         6           0.0484           45.19s
         7           0.0476           43.65s
         8           0.0468           42.42s
         9           0.0460           41.42s
        10           0.0461           40.60s
        20         833.2245           34.44s
        30         833.2211           29.60s
        40         833.2189           25.10s
        50 23334983995534299609868282309893906660865323621944938715996244163978644380131800406273835948969081850597748
44247687200338083840.0000           20.84s
        60 23334983995534299609868282309893906660865323621944938715996244163978644380131800406273835948969081850597748
44247687200338083840.0000           16.69s
        70 23334983995534299609868282309893906660865323621944938715996244163978644380131800406273835948969081850597748
44247687200338083840.0000           12.53s
        80 2333498399553429960986828230989390666086532362194493871599624416397864438013180040627383594896908185059774844247687200338083840.0000            8.35s


        90 2333498399553429960986828230989390666086532362194493871599624416397864438013180040627383594896908185059774844247687200338083840.0000            4.18s
       100 2333498399553429960986828230989390666086532362194493871599624416397864438013180040627383594896908185059774844247687200338083840.0000            0.00s
@arjoly

This comment has been minimized.

Show comment Hide comment
@arjoly

arjoly Apr 22, 2015

Member

In that case, you might want to try with a slower the learning_rate.

Member

arjoly commented Apr 22, 2015

In that case, you might want to try with a slower the learning_rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment