Skip to content

Yield stack trace information in resilient mode model_selection warnings #15622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 58 commits into from
Dec 14, 2019

Conversation

GregoryMorse
Copy link
Contributor

@GregoryMorse GregoryMorse commented Nov 13, 2019

Fixes #15621

Now a stack trace will be included in the FitFailedWarning so that not having early termination will still allow proper analysis of errors.

Note: this change could break functionality where the warning message is parsed and assumes a 2 line message with the warning tag line and the specific Python exception detail.

The new format would be warning tag line + stack trace + specific Python exception detail. If we want backward compatibility, then a flag would need to be added. However aside from test code which is being fixed, I think generally this could be an upgrade requirement, rather than over-cluttering the class parameters for every class which uses this - may be more than just GridSearchCV.

@rth
Copy link
Member

rth commented Nov 13, 2019

Thanks please add a unit test.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use format_exc?

@GregoryMorse
Copy link
Contributor Author

I will see about the unit test shortly, I realized the Exception object does not have the traceback which needs to be retrieved so it was giving errors. Also format_exception is certainly better and simply not trimming it to line 0.

@GregoryMorse GregoryMorse changed the title Yield stack trace information in resilient mode warnings Yield stack trace information in resilient mode model_selection warnings Nov 13, 2019
@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Nov 13, 2019

It seems the functionality is basically tested already through test_validation.py. However _utility.py assert_warns_message and it would be pretty contrived to put the exact stack trace in. However that macro-like function could be changed to support a stacktrace injected into the middle of the message which will clear the errors that are occurring now via regular expressions perhaps if making it not overly specific. Does this require such a precise stacktrace specific test? It would be highly unmaintainable since when line numbers change, so would the message need to change. For now I am trying to just make the errors vanish while having a stacktrace which I think is near achieved.

@GregoryMorse
Copy link
Contributor Author

I edited above for this important detail:

Note: this change could break functionality where the warning message is parsed and assumes a 2 line message with the warning tag line and the specific Python exception detail.

The new format would be warning tag line + stack trace + specific Python exception detail. If we want backward compatibility, then a flag would need to be added. However aside from test code which is being fixed, I think generally this could be an upgrade requirement, rather than over-cluttering the class parameters for every class which uses this - may be more than just GridSearchCV.

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Nov 14, 2019

It seems to be passing now. However some of the build systems seem to have an out of date exceptions.py or some bizarre newline detail is there. I think it's because they are not getting the version from the PR but from master. Passed on 5, failed on 4...

@GregoryMorse
Copy link
Contributor Author

Now with unit test and attempt at making it as backward compatible as possible.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm if tests pass

@GregoryMorse
Copy link
Contributor Author

I don't understand what they changed in doctest in Python 3.8 but its making getting straight forward things like this difficult. All the tests passed and the ellipsis should work just fine in all.

@jnothman
Copy link
Member

jnothman commented Dec 6, 2019

I don't think it's a change in py 3.8 ... We don't run doctest in all CI

I hope one of us can help soon

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Dec 6, 2019

.. We don't run doctest in all CI

The error message shows that it does not compare. But I was able to resolve the Py 3.8 doctest before using '\\\\n' instead of '\\n' so I am almost sure its some intricacy with newlines and ellipsis. I am not sure the doctest version on Py3.8 vs the rest. I have not yet even installed Py3.8 even on my own system.

@GregoryMorse
Copy link
Contributor Author

GregoryMorse commented Dec 6, 2019

=================================== FAILURES ===================================
________________ [doctest] sklearn.exceptions.FitFailedWarning _________________
[gw1] linux -- Python 3.8.0 /usr/share/miniconda/envs/testvenv/bin/python
125     Examples
126     --------
127     >>> from sklearn.model_selection import GridSearchCV
128     >>> from sklearn.svm import LinearSVC
129     >>> from sklearn.exceptions import FitFailedWarning
130     >>> import warnings
131     >>> warnings.simplefilter('always', FitFailedWarning)
132     >>> gs = GridSearchCV(LinearSVC(), {'C': [-1, -2]}, error_score=0, cv=2)
133     >>> X, y = [[1, 2], [3, 4], [5, 6], [7, 8]], [0, 0, 1, 1]
134     >>> with warnings.catch_warnings(record=True) as w:
Expected:
    FitFailedWarning('Estimator fit failed. The score on this train-test
    partition for these parameters will be set to 0.000000.
    Details: \n...ValueError: Penalty term must be positive; got (C=-2)\n'
    ...)
Got:
    FitFailedWarning('Estimator fit failed. The score on this train-test partition for these parameters will be set to 0.000000. Details: \nTraceback (most recent call last):\n  File "/home/vsts/work/1/s/sklearn/model_selection/_validation.py", line 515, in _fit_and_score\n    estimator.fit(X_train, y_train, **fit_params)\n  File "/home/vsts/work/1/s/sklearn/svm/_classes.py", line 230, in fit\n    raise ValueError("Penalty term must be positive; got (C=%r)"\nValueError: Penalty term must be positive; got (C=-2)\n')

Really I have no idea what changed in doctest 3.8 as I already checked https://docs.python.org/3/library/doctest.html and there are no significant changes since Python 3.4. Obviously the new line and ellipsis have some details that must have changed somehow.

A section What About Exceptions? implies tracebacks are handled specially. But in our case we catch the exception and merely yield the string for comparison so none of that should apply at all.

@GregoryMorse
Copy link
Contributor Author

I have tried being very creative even helping guide the pattern matching along. And everything always passes fine in Py3.5/3.7. As for pylatest, nothing yet has worked.

Copy link
Contributor Author

@GregoryMorse GregoryMorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this fix. I don't exactly understand still what changed in Python 3.8 that is causing such peculiarities with new lines.

@thomasjpfan
Copy link
Member

The issue stemmed from "..." being a greedy regex:

Reference: https://docs.python.org/3.7/library/doctest.html#doctest.ELLIPSIS

(I was using 3.7 for testing this locally)

@GregoryMorse
Copy link
Contributor Author

The issue stemmed from "..." being a greedy regex:
Reference: https://docs.python.org/3.7/library/doctest.html#doctest.ELLIPSIS
(I was using 3.7 for testing this locally)

I had known about it and tried to work around it - why I put ValueError in there twice but alas I failed to think about it enough. Regardless, appreciate your efforts - the PR is looking good now.

GregoryMorse and others added 2 commits December 12, 2019 18:25
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an Enhancement to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

@thomasjpfan thomasjpfan merged commit a05a5bc into scikit-learn:master Dec 14, 2019
@thomasjpfan
Copy link
Member

Thank you @GregoryMorse !

panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants