Adaboost Classifier :- Stopped iterations when sample weights overflow and print a warning message #10096

fenilsuchak · 2017-11-09T14:53:23Z

Error fixed in adaboost classifier.At some iterations weighted error was underflowing, due to high learning rate.,Making error as NaN and hence making subsequent iterations useless.
This update halts the iterations when such warning is encountered and prints a warning message to inform about the underflow.

This is my first PR. I am open to all sorts of criticism.

massich

take care of pep8

massich · 2017-11-09T18:13:49Z

sklearn/ensemble/weight_boosting.py

@@ -145,7 +145,11 @@ def fit(self, X, y, sample_weight=None):
                random_state)

            # Early termination
-            if sample_weight is None:
+
+            if sample_weight is None and math.isnan(estimator_error):


no need to test sample_weight twice.

if sample_weight is None: if math.isnan(estimator_error): do_something_specific break

massich · 2017-11-09T18:14:08Z

sklearn/ensemble/weight_boosting.py

-            if sample_weight is None:
+
+            if sample_weight is None and math.isnan(estimator_error):
+                print("Underflow of weighted error occured during iterations! Iterations stopped ! High chances of Overfitting!, Try decreasing the learning rate or n_estimators to avoid this! ")


use proper warning messages

and lines should be <80 characters

massich · 2017-11-09T18:14:37Z

sklearn/ensemble/weight_boosting.py

@@ -497,6 +501,9 @@ def _boost_real(self, iboost, X, y, sample_weight, random_state):
        # Error fraction
        estimator_error = np.mean(
            np.average(incorrect, weights=sample_weight, axis=0))
+        if math.isnan(estimator_error):
+            return None,None,estimator_error


space after commas (otherwise is not pep8)

massich · 2017-11-09T18:14:46Z

sklearn/ensemble/weight_boosting.py

@@ -552,6 +559,8 @@ def _boost_discrete(self, iboost, X, y, sample_weight, random_state):
        # Error fraction
        estimator_error = np.mean(
            np.average(incorrect, weights=sample_weight, axis=0))
+        if math.isnan(estimator_error):
+            return None,None,estimator_error


what does "idem" mean? Sorry if the question is lame.

massich · 2017-11-09T18:17:16Z

wellcome. Tag your title [WIP] and [MRG] when you are ready for revision. Thx

massich · 2017-11-10T10:46:33Z

sklearn/ensemble/weight_boosting.py

            if sample_weight is None:
+                if estimator_error is not None and math.isnan(estimator_error):
+                    print("Early termination due to underflow of estimated_error.Iterations stopped")


We don't use print statements for warning. (see contributing guide for a warning example)

Or see this example in the code:

scikit-learn/sklearn/preprocessing/data.py

Lines 180 to 185 in 3e85359

if not np.allclose(mean_2, 0):

warnings.warn("Numerical issues were encountered "

"when scaling the data "

"and might not be solved. The standard "

"deviation of the data is probably "

"very close to 0. ")

More over if you introduce a if statement, the code is not tested therefore a test needs to be written. See the test of the previous example here:

scikit-learn/sklearn/preprocessing/tests/test_data.py

Lines 219 to 220 in 3e85359

w = "standard deviation of the data is probably very close to 0"

x_scaled = assert_warns_message(UserWarning, w, scale, x)

Thanks! I'll work on the test

@massich I am not so sure how a good test for this would be like
A separate function for testing?
Heres a snapshot. Any suggestions would be great,

If there's no place where something similar is tested, create a new function. But I would use a more descriptive name like test_early_stop_when_estimator_error_bcomes_nan. I would also add a reference to the original issue.

To avoid PEP8 errors, I would also set the warning message into a variable to further compare to. (And make sure that the warning raised and catch are the same (try to find a name bettern than w or early_stop_warnmsg)

early_stop_warnmsg = "Early termination due to underflow of estimated_error. Iterations stopped" assert_warns_message(UserWarning, early_stop_warnsg, clf.fit, iris.data, iris.target)

Great. Thanks, I will make these changes.

@massich I don't think the estimator error is underflowing. if it is underflowing then won't estimated error be approximated to zero? What I suppose is that sample_weight are shooting up to infinite when summed(probably due to high learning rate). Causing inf/inf division for estimated error. Giving a nan. Any suggestions?

fenilsuchak · 2017-12-08T18:54:23Z

@massich Any changes needed? Actually there is no underflow of estimator but sample weights reaching infinite values are causing the Nan. The original issue title is not what the problem is. So should I change my PR title?

jnothman · 2017-12-14T03:58:48Z

Yes, please update the PR title.

Does normalizing the sample_weight vector after each boosting step help? We normalize them when they are first input, and sample weights should be relative, so I don't see any harm in normalizing except that it could result in underflow.

If this is about numerical precision, it might also help to perform the boosting in log space, so using sample_weight = np.log(np.exp(sample_weight) + estimator_weight * incorrect * ...) instead of sample_weight *= np.exp(estimator_weight * incorrect * ...).

fenilsuchak · 2017-12-17T05:51:20Z

We do normalise the sample weight after each boosting step , which requires summing up all the sample weights, and thats where we get 'inf'.
Converting to logspace seems workable. I will give it a try.

jnothman · 2017-12-17T10:36:28Z

You're right, we do. Yes, please do try log-sum-exp.

fenilsuchak · 2017-12-18T17:35:56Z

@jnothman Converting to log-space isn't working well ,it is affecting accuracy and causing test_iris() to fail due to low accuracy.

So I think its better to generate a warning and stop iterations.

amueller · 2017-12-18T17:46:43Z

@Fenil3510 that sounds odd. Can you please commit the change so we can review?

fenilsuchak · 2017-12-19T11:21:13Z

@amueller please review

fenilsuchak

Converted to log-space I hope the conversion is right.

jnothman · 2017-12-20T21:45:55Z

sklearn/ensemble/weight_boosting.py

@@ -581,7 +581,8 @@ def _boost_discrete(self, iboost, X, y, sample_weight, random_state):
        # Only boost the weights if I will fit again
        if not iboost == self.n_estimators - 1:
            # Only boost positive weights
-            sample_weight *= np.exp(estimator_weight * incorrect *
+            sample_weight = np.log(np.exp(sample_weight) +


You have the exp and the log the wrong way around. You want to turn a product into a sum of logs.

Ok,
sample_weight = np.log(sample_weight * np.exp(estimator_weight*incorrect....))
This is fine I guess.
I'll commit the changes if the above is fine.

No, you need to exp(log(sample_weight) + est_weight * incorrect * ...)

jnothman · 2017-12-21T20:37:11Z

sklearn/ensemble/weight_boosting.py

@@ -581,7 +581,7 @@ def _boost_discrete(self, iboost, X, y, sample_weight, random_state):
        # Only boost the weights if I will fit again
        if not iboost == self.n_estimators - 1:
            # Only boost positive weights
-            sample_weight = np.log(np.exp(sample_weight) +
+            sample_weight = np.exp(np.log(sample_weight) *


no, you need a sum of logs. That * should be a +.

oh! I am really sorry. will fix

jnothman · 2017-12-22T07:18:05Z

Well Travis gave you green for everything but flake8

jnothman

But that still means we're getting overflow...

fenilsuchak · 2017-12-22T09:21:53Z

@jnothman Yes there is still an overflow. Anything else we could try?

thomasjpfan

Thank you for the PR @fenilsuchak !

Please add an entry to the change log at doc/whats_new/v0.24.rst with tag |Fix|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

thomasjpfan · 2020-07-12T17:19:10Z

sklearn/ensemble/weight_boosting.py

-            sample_weight *= np.exp(estimator_weight * incorrect *
-                                    ((sample_weight > 0) |
-                                     (estimator_weight < 0)))
+            sample_weight = np.exp(np.log(sample_weight) +


Let's include a comment here explaining why we are doing this in log space.

Not sure if we had an advantage doing this we still had an overflow.

sklearn/ensemble/weight_boosting.py

thomasjpfan · 2020-07-12T17:23:12Z

sklearn/ensemble/tests/test_weight_boosting.py

+    w = "Sample weights have reached infinite values"
+    clf = AdaBoostClassifier(n_estimators=30, learning_rate=5.,
+                             algorithm="SAMME")
+    assert_warns_message(UserWarning, w, clf.fit, iris.data, iris.target)


We have been moving toward using pytest.warns:

msg = "Sample weights have reached infinite values" with pytest.warns(UserWarning, match=msg): clf.fit(iris.data, iris.target)

fenilsuchak · 2020-07-13T10:41:32Z

@cmarmo not sure why the tests failing.

cmarmo · 2020-07-13T11:41:33Z

@cmarmo not sure why the tests failing.

Not sure either, but it's failing on upstream/master too with the same error...
@jeremiedbb I've found this old issue you fixed, #15443, are we in a similar situation?

jeremiedbb · 2020-07-13T12:06:11Z

It's not the same issue but it's a compilation on macOS issue as usual :)
There's a tmp fix in #17913.

fenilsuchak · 2020-07-13T17:21:29Z

Ok, will wait for the merge of #17913 then to rerun tests again?

cmarmo · 2020-07-18T17:17:32Z

Ok, will wait for the merge of #17913 then to rerun tests again?

#17913 has been merged ... if you have some time to sync... thanks!

cmarmo · 2020-08-15T19:44:23Z

Hi @thomasjpfan this is a three year old PR ... do you mind checking if your comments have been addressed? The check failure is unrelated with the PR itself. Thanks!

thomasjpfan · 2020-08-17T00:09:03Z

sklearn/ensemble/_weight_boosting.py

+            sample_weight = np.exp(np.log(sample_weight) +
+                                   estimator_weight * incorrect *
+                                   ((sample_weight > 0) |
+                                    (estimator_weight < 0)))


What was the reason behind estimator_weight < 0?

Yes, this looks like a backward incompatible change to me. I reverted this change.

rth

Thanks @fenilsuchak ! I fixed conflicts and added a changelog entry.

Co-authored-by: Fenil Suchak <fenilsuchak@fenil.local> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

stopped iterations when underflow

19e5ac5

massich reviewed Nov 9, 2017

View reviewed changes

fenilsuchak changed the title ~~Adaboost Classifier :- Stopped iterations when underflow and printed a warning message~~ [WIP}Adaboost Classifier :- Stopped iterations when underflow and printed a warning message Nov 10, 2017

fenilsuchak changed the title ~~[WIP}Adaboost Classifier :- Stopped iterations when underflow and printed a warning message~~ [WIP] Adaboost Classifier :- Stopped iterations when underflow and printed a warning message Nov 10, 2017

fenilsuchak added 2 commits November 10, 2017 13:35

fixed_underflow_of_error

85c6ca1

Error message whitespace fix

99fcc52

massich reviewed Nov 10, 2017

View reviewed changes

Fenil Suchak added 3 commits December 8, 2017 17:06

Added test and check for sample weights

4dd0f65

fixed flake8

41e8e2a

fixed flake8 again

4ca3d13

fenilsuchak changed the title ~~[WIP] Adaboost Classifier :- Stopped iterations when underflow and printed a warning message~~ [MRG] Adaboost Classifier :- Stopped iterations when underflow and printed a warning message Dec 8, 2017

fenilsuchak changed the title ~~[MRG] Adaboost Classifier :- Stopped iterations when underflow and printed a warning message~~ [WIP] Adaboost Classifier :- Stopped iterations when underflow and printed a warning message Dec 14, 2017

weight_update_in_logspace

522d0eb

fenilsuchak commented Dec 20, 2017

View reviewed changes

jnothman reviewed Dec 20, 2017

View reviewed changes

log-space-transform

f4b0adf

jnothman reviewed Dec 21, 2017

View reviewed changes

log-space

94c15dc

jnothman reviewed Dec 22, 2017

View reviewed changes

flake 8 error

c5fb6b8

cmarmo added Stalled and removed Waiting for Reviewer labels Jul 10, 2020

thomasjpfan reviewed Jul 12, 2020

View reviewed changes

fenilsuchak added 2 commits July 13, 2020 12:40

fixed_conflicts

a9bc7b7

removed merge marks

cd3a83d

cmarmo removed Stalled help wanted labels Jul 13, 2020

cmarmo added the Waiting for Reviewer label Jul 18, 2020

Merge remote-tracking branch 'upstream/master' into pr/10096

7413523

thomasjpfan reviewed Aug 17, 2020

View reviewed changes

cmarmo removed the Waiting for Reviewer label Aug 17, 2020

cmarmo added help wanted Stalled labels Dec 14, 2020

Base automatically changed from master to main January 22, 2021 10:49

rth added 2 commits August 6, 2021 13:42

Merge branch 'main' into fix_underflow_adaboost

d8883cb

Add changelog entry and fix merge issue

480d3da

rth changed the title ~~[MRG+1] Adaboost Classifier :- Stopped iterations when sample weights overflow and print a warning message~~ Adaboost Classifier :- Stopped iterations when sample weights overflow and print a warning message Aug 6, 2021

rth removed the Stalled label Aug 6, 2021

rth added 2 commits August 6, 2021 14:01

Improve warning message

584e218

Lint

69731c8

rth approved these changes Aug 6, 2021

View reviewed changes

Improve formulation in the changelog [skip ci]

f97cab9

rth merged commit cef0282 into scikit-learn:main Aug 6, 2021

	if not np.allclose(mean_2, 0):
	warnings.warn("Numerical issues were encountered "
	"when scaling the data "
	"and might not be solved. The standard "
	"deviation of the data is probably "
	"very close to 0. ")

	w = "standard deviation of the data is probably very close to 0"
	x_scaled = assert_warns_message(UserWarning, w, scale, x)

Adaboost Classifier :- Stopped iterations when sample weights overflow and print a warning message #10096

Adaboost Classifier :- Stopped iterations when sample weights overflow and print a warning message #10096

Conversation

fenilsuchak commented Nov 9, 2017 • edited by jnothman Loading

massich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

massich commented Nov 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fenilsuchak Nov 18, 2017 • edited Loading

Choose a reason for hiding this comment

massich Nov 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fenilsuchak Dec 1, 2017 • edited Loading

Choose a reason for hiding this comment

fenilsuchak commented Dec 8, 2017

jnothman commented Dec 14, 2017

fenilsuchak commented Dec 17, 2017

jnothman commented Dec 17, 2017

fenilsuchak commented Dec 18, 2017

amueller commented Dec 18, 2017

fenilsuchak commented Dec 19, 2017

fenilsuchak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fenilsuchak Dec 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fenilsuchak Dec 22, 2017 • edited Loading

Choose a reason for hiding this comment

jnothman commented Dec 22, 2017

jnothman left a comment

Choose a reason for hiding this comment

fenilsuchak commented Dec 22, 2017

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fenilsuchak commented Jul 13, 2020

cmarmo commented Jul 13, 2020

jeremiedbb commented Jul 13, 2020

fenilsuchak commented Jul 13, 2020

cmarmo commented Jul 18, 2020

cmarmo commented Aug 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth left a comment • edited Loading

Choose a reason for hiding this comment

fenilsuchak commented Nov 9, 2017 •

edited by jnothman

Loading

fenilsuchak Nov 18, 2017 •

edited

Loading

massich Nov 19, 2017 •

edited

Loading

fenilsuchak Dec 1, 2017 •

edited

Loading

fenilsuchak Dec 21, 2017 •

edited

Loading

fenilsuchak Dec 22, 2017 •

edited

Loading

cmarmo commented Aug 15, 2020 •

edited

Loading

rth left a comment •

edited

Loading