Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] add warning when importing old or new pickle. #7248

Merged
merged 4 commits into from Sep 10, 2016

Conversation

Projects
None yet
6 participants
@amueller
Copy link
Member

amueller commented Aug 25, 2016

Adds a warning when loading a possibly incompatible pickle from a different version of sklearn.
Fixes #7135.

@amueller amueller changed the title add warning when importing old or new pickle. [MRG] add warning when importing old or new pickle. Aug 25, 2016

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 25, 2016

So, this works nicely. My test is a bit crazy, removing __getstate__ from BaseEstimator by monkey-patching, to emulate the old (current) behavior. I could also overwrite the __getstate__ on a single estimator, making it the vanilla return self.__dict__.

ping @jnothman ;)

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 25, 2016

one question remains, should we leave the __version__ string stored in the estimator in unpacking? We can remove it to make pickle.loads(pickle.dumps(est)) a no-op.

@MechCoder MechCoder added this to the 0.18 milestone Aug 26, 2016

pickle_version = state.get("__version__", "pre-0.18")
if pickle_version != __version__:
warnings.warn(
"Trying to unpickle estimator from version {} when with using "

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 26, 2016

Member

drop "with"

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 26, 2016

Member

Given the lack of traceback with warnings, we should probably mention the estimator. (Although we could use stacklevel, the previous stack entries will be in the pickle module.) The annoyance of that will be that the message will be printed for every estimator loaded.

return dict(self.__dict__.items(), __version__=__version__)

def __setstate__(self, state):
pickle_version = state.get("__version__", "pre-0.18")

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 26, 2016

Member

should this be pop?

@@ -292,6 +293,20 @@ def __repr__(self):
return '%s(%s)' % (class_name, _pprint(self.get_params(deep=False),
offset=len(class_name),),)

def __getstate__(self):
return dict(self.__dict__.items(), __version__=__version__)

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 26, 2016

Member

Is there a way to constrain this to estimators within scikit-learn??

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 26, 2016

Member

type(self).__module__.startswith('sklearn.')?

This comment has been minimized.

Copy link
@amueller

amueller Aug 26, 2016

Author Member

Any idea how to test that? Should I? I guess I could overwrite type(self).__module__ but I'm not sure it's worth it.

"Trying to unpickle estimator from version {} when with using "
"version {}. This might lead to breaking code or invalid "
" results. Use at your own risk.".format(
pickle_version, __version__), UserWarning)

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 26, 2016

Member

I think we might want to give this a custom warning type so that someone can easily silence it. (Then again, silencing by message regexp might be safer??)

if pickle_version != __version__:
warnings.warn(
"Trying to unpickle estimator from version {} when with using "
"version {}. This might lead to breaking code or invalid "

This comment has been minimized.

Copy link
@lesteve

lesteve Aug 26, 2016

Member

To fix the Travis failure for Python 2.6, use {0} here.

# check that warning is raised on different version
tree_pickle_other = tree_pickle.replace(sklearn.__version__.encode(),
b"something")
message = ("Trying to unpickle estimator from version {} when with using "

This comment has been minimized.

Copy link
@lesteve

lesteve Aug 26, 2016

Member

You need to use {0} and {1} here too for python 2.6.

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 26, 2016

done


# check that no warning is raised for external estimators
DecisionTreeClassifier.__module__ = "notsklearn"
assert_no_warnings(pickle.loads, tree_pickle_noversion)

This comment has been minimized.

Copy link
@lesteve

lesteve Aug 27, 2016

Member

You probably need to reset the __module__ after the assert_no_warnings. I am guessing this is the reason of the test failures.

This comment has been minimized.

Copy link
@amueller

amueller Aug 29, 2016

Author Member

oh yeah that's true.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 27, 2016

Is there a way to make this monkey-patching not break stuff if tests are run with parallelism?

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 29, 2016

maybe I should just remove the monkey patching. It's evil.

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 29, 2016

Ah, wait, it's the fiddeling with the __module__ that breaks tests, not the way more intrusive monkey-patching of BaseEstimator... I'll fix the module stuff in a bit, not sure if the monkey patching of the BaseEstimator test is worth the potential trouble.

if type(self).__module__.startswith('sklearn.'):
return dict(self.__dict__.items(), __version__=__version__)
else:
return dict(self.__dict__.items())

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Aug 29, 2016

Member

Why don't we simply add a version_ attribute to BaseEstimator?

That way the version number of sklearn is stored on pickle without such a hack for everything that inherits from BaseEstimator.

This comment has been minimized.

Copy link
@amueller

amueller Aug 29, 2016

Author Member

Why hack? This adds the information to the file when the estimator is stored.
What would be the benefit of a version_ attribute outside of serialization?

The __setstate__ would still need to be changed the same way, right? (only without a pop)

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Aug 29, 2016

Member

Why hack? This adds the information to the file when the estimator is stored.
What would be the benefit of a version_ attribute outside of serialization?

Clarity and simplicity. Here we are relying on overriding the pickling
mechanism. We wouldn't need so. Less overriding of __ methods leads to
less tricky bug to debug.

The setstate would still need to be changed the same way, right? (only
without a pop)

I believe so.

This comment has been minimized.

Copy link
@amueller

amueller Aug 29, 2016

Author Member

I'm not fundamentally against adding it as an attribute to BaseEstimator but I'd rather make the attribute private to not pollute the autocomplete too much.

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux via email Aug 29, 2016

Member

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 29, 2016

Member

I think the class attribute issue is the bigger problem here. It doesn't get pickled unless it's set on the instance, does it?

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Aug 30, 2016

Member

Can we still have that only on models inside scikit-learn? It would be a class
attribute,

+1: see in the class of BaseEstimator. It should be very simple, just
adding

   _version_ = sklearn.__version__

before the init in the class.

The only doubt that I have is about circular imports by accessing the
central sklearn module.

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux via email Aug 30, 2016

Member

This comment has been minimized.

Copy link
@amueller

amueller Aug 31, 2016

Author Member

@GaelVaroquaux does that mean you don't object to the current solution?

This comment has been minimized.

Copy link
@amueller

amueller Sep 7, 2016

Author Member

ping @GaelVaroquaux [sorry, I know you're busy]

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Sep 7, 2016

@jnothman wdyt?

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 8, 2016

IsotonicRegression overrides __[gs]etstate__ without calling super.
This should be (part of) a common test. (Sorry for the added work.)


# check that not including any version also works:
# TreeNoVersion has no getstate, like pre-0.18
tree = TreeNoVersion().fit(iris.data, iris.target)

This comment has been minimized.

Copy link
@lesteve

lesteve Sep 8, 2016

Member

This is pretty neat compared to the previous version with monkey-patching!

add warning when importing old or new pickle.
add tests for pickle warning

add test that loading something stored with no custom __getstate__ will still work (and raise a warning)

added "within sklearn" check, included estimator name in warning message.

add that pickle warnings only apply to sklearn estimators.

changed tests to be threadsave and contain no monkey business
@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Sep 8, 2016

@jnothman added the tests to common tests. Only isotonic is not tested (because 1d input)....

@amueller amueller force-pushed the amueller:pickle_warning branch from 1edd87c to b28cfbc Sep 8, 2016

@@ -835,6 +835,7 @@ def check_estimators_pickle(name, Estimator):

# pickle and unpickle!
pickled_estimator = pickle.dumps(estimator)
assert_true(b"version" in pickled_estimator)

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 8, 2016

Member

Hmm... Perhaps only if module.startswith('sklearn.')!

This comment has been minimized.

Copy link
@amueller

amueller Sep 8, 2016

Author Member

yes

This comment has been minimized.

Copy link
@amueller

amueller Sep 8, 2016

Author Member

Done

@@ -835,6 +835,8 @@ def check_estimators_pickle(name, Estimator):

# pickle and unpickle!
pickled_estimator = pickle.dumps(estimator)
if Estimator.__module__.startswith('sklearn.'):
assert_true(b"version" in pickled_estimator)

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 8, 2016

Member

"version", no?

This comment has been minimized.

Copy link
@amueller

amueller Sep 8, 2016

Author Member

took me a while to figure out what you mean ;) __version__? the underscores get mangled, I think. I felt this was enough. I can try to get the mangling to work, but I'd rather merge this and get 0.18 out of the door...


def __setstate__(self, state):
if type(self).__module__.startswith('sklearn.'):
pickle_version = state.pop("__version__", "pre-0.18")

This comment has been minimized.

Copy link
@ogrisel

ogrisel Sep 8, 2016

Member

I am afraid that __version__ is very generic attribute name and could potentially conflict with other attribute with the same name either a downstream project or with future Python.

I would would feel more comfortable to name this: _sklearn_version and let leave the double-sided __ attribute name convention to Python language-level features.

This comment has been minimized.

Copy link
@amueller

amueller Sep 8, 2016

Author Member

Fine by me. I used it as it is usually used for package versions.

This comment has been minimized.

Copy link
@amueller

amueller Sep 8, 2016

Author Member

And done.

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Sep 8, 2016

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Sep 10, 2016

Merging!

@ogrisel ogrisel merged commit b4872fe into scikit-learn:master Sep 10, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Sep 12, 2016

sweet, thanks :)

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Sep 12, 2016

needs a whatsnew, right?

amueller added a commit to amueller/scikit-learn that referenced this pull request Sep 12, 2016

jnothman added a commit that referenced this pull request Sep 13, 2016

rsmith54 pushed a commit to rsmith54/scikit-learn that referenced this pull request Sep 14, 2016

rsmith54 pushed a commit to rsmith54/scikit-learn that referenced this pull request Sep 14, 2016

TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

afiodorov added a commit to unravelin/scikit-learn that referenced this pull request Apr 25, 2017

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

@tirkarthi tirkarthi referenced this pull request Jun 4, 2018

Open

Scikit-learn warning #158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.