Skip to content

ENH: raise an error when MLP diverges#29773

Merged
adrinjalali merged 19 commits into
scikit-learn:mainfrom
MarcBresson:main
Oct 14, 2024
Merged

ENH: raise an error when MLP diverges#29773
adrinjalali merged 19 commits into
scikit-learn:mainfrom
MarcBresson:main

Conversation

@MarcBresson

Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Fixes #29504

What does this implement/fix? Explain your changes.

When MLP weights overflowed with early_stopping=True, the scorer function crashed because it could not compute a validation score. Now, when it overflows, the validation score is replaced by inf, and the model follows its course.

Any other comments?

Because it diverges, early_stopping will not actually stop the model before the number of epochs reached n_max_epochs. One condition for early stopping to trigger is that the new score must be lower or equal to the best score. Since the model diverges, the new score is always greater than the best score (which generally happens on the first epoch)

@github-actions

github-actions Bot commented Sep 2, 2024

Copy link
Copy Markdown

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 32c3074. Link to the linter CI: here

@MarcBresson MarcBresson force-pushed the main branch 2 times, most recently from 51f9830 to 5a37950 Compare September 5, 2024 07:49
self.validation_scores_.append(self._score(X_val, y_val))
try:
val_score = self._score(X_val, y_val)
except ValueError:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we could catch many ValueError, here I assume that we have something in the exception that tell us that this is a np.nan or something like this, isn't it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, i'll match on that

try:
val_score = self._score(X_val, y_val)
except ValueError:
val_score = np.inf

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to add a non-regression test to check that the code works as expected.

@glemaitre

Copy link
Copy Markdown
Member

In addition, we will need an entry in the changelog:

Please add an entry to the change log at doc/whats_new/v1.5.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@MarcBresson MarcBresson force-pushed the main branch 2 times, most recently from 69d14e0 to 4c2bfc3 Compare September 7, 2024 07:14
@MarcBresson

Copy link
Copy Markdown
Contributor Author

ok, I set the random state to enable reproducible results :) it should be good now, the test passes on my laptop

@glemaitre glemaitre self-requested a review September 9, 2024 20:52
@glemaitre

Copy link
Copy Markdown
Member

I see that we will not be able to accomplished this PR for 1.5.2.
I moved the entry from 1.5.2 to 1.6.

Otherwise everything looks good (I slightly change the entry for the changelog).

@glemaitre glemaitre left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcBresson

Let's have a second review.

@glemaitre

Copy link
Copy Markdown
Member

@adrinjalali do you want to have a look after debugging in the PR.

@glemaitre glemaitre added the Waiting for Second Reviewer First reviewer is done, need a second one! label Sep 9, 2024
Comment on lines +705 to +711
try:
val_score = self._score(X_val, y_val)
except ValueError as e:
if str(e) == "Input contains NaN.":
val_score = np.inf
else:
raise e

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite brittle. It depends on the internals and the actual text raised by accuracy_score and r2_score (which are both not ideal lol)

I rather have a check here explicitly for nan, and if we have nan, do the logic, instead of relying on the scorer to raise that exception.

@MarcBresson MarcBresson Sep 12, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X_val and y_val don't have any NaN. But because model weights are really large numbers, the computation of the score generates NaN values, which, when validation runs, raises the error.

The exception is raised deep in the code (check #29504 for the full traceback) so that's why I ended up just adding the try: except: statement.

I can change the logic behind the _score function to account for that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I did manage to do it better :) thank you for pushing me

Non-regression test for:
https://github.com/scikit-learn/scikit-learn/issues/29504
"""
mlp = MLPRegressor(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test both the regressor and the classifier.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a classifier predicts 0s and 1s, the accuracy score will never have NaNs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused cause the code seems to touch both classifier and regressor, but according to your comment here classifier is fine as is?

Comment thread doc/whats_new/v1.6.rst Outdated
:mod:`sklearn.neural_network`
.............................

- |Fix| :class:`neural_network.MLPClassifier` and :class:`neural_network.MLPRegressor`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here this doesn't apply to MLPClassifier?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, I'm the culprit here. I think that I change the changelog from the base class to the public classes. So I added MLPClassifier but then it might never have been an issue.

"""Private score method without input validation"""
# Input validation would remove feature names, so we disable it
return accuracy_score(y, self._predict(X, check_input=False))
return super()._score_with_function(X, y, score_function=accuracy_score)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so does this apply to Classifier or not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not apply to classifier, but I did a small refactor so that

        # Input validation would remove feature names, so we disable it
        return accuracy_score(y, self._predict(X, check_input=False))

is only written once :)

Non-regression test for:
https://github.com/scikit-learn/scikit-learn/issues/29504
"""
mlp = MLPRegressor(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused cause the code seems to touch both classifier and regressor, but according to your comment here classifier is fine as is?

@MarcBresson

Copy link
Copy Markdown
Contributor Author

@adrinjalali sorry for all the confusion, as much as I love atomic commit, aa7e7a8 was both about implementing your suggestion, and doing a small refactor so that self._predict(X, check_input=False) is only written twice

scikit-learn-bot and others added 6 commits September 19, 2024 10:29
Co-authored-by: Mr. Snrub <45150804+s-banach@users.noreply.github.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Shruti Nath <shrutinath@Shrutis-Laptop.local>
…istant when the model was strictly diverging with early_stopping
@adrinjalali

Copy link
Copy Markdown
Member

@MarcBresson we don't care about commits inside PRs much at all. Better not to force push so that we can see the history. At the end we squash and merge anyway.

For the conflicts, you might want to use the version of the lockfiles in main.

@MarcBresson

Copy link
Copy Markdown
Contributor Author

noted! The last change was just about changing np.NaN to np.nan :)

@adrinjalali adrinjalali merged commit ef784c8 into scikit-learn:main Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:neural_network Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get error "ValueError: Input contains NaN" when MLP regression model is exploding numerically and when early_stopping=True

6 participants