Breakage in User Code #8316

Closed
HolgerPeters opened this Issue Feb 8, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@HolgerPeters
Contributor

HolgerPeters commented Feb 8, 2017

With this ticket, I would like to report some breakage in code involving scikit learn.

In commit b4872fe you introduce __getstate__ for the base estimator, in order to annotate it with a scikit-learn version. Problem is, that I have estimator classes that inherit from the base estimator, and also overwrite the __getstate__ method in which some temporary data is excluded from the data being pickled. The definition of the alternate __getstate__ is done via a mixin class, from which my estimator inherits. This estimator also inherits from scikit learns BaseEstimator. Now with b4872fe BaseEstimator defines __getstate__ which overwrites the logic in my variant of __getstate__.

So as a rough sketch, the situation is as follows:

class MyMixin(object):
    # ...
    def __getstate__(self):
        values_for_serialization = self.__dict__.copy()
        values_for_serialization['expert_'] = None
        return values_for_serialization

class MyEstimator(BaseEstimator, MyMixin):
    # ...

Of course I can fix this in my codebase (and as such, I do not request an upstream fix), but I wanted to communicate what kind of subtleties overwriting standard methods can bring.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Feb 8, 2017

Member

We tried to limit its effect with the
if type(self).__module__.startswith('sklearn.'):
but I think we should have also used return super(...).__getstate__() instead of return dict(self.__dict__.items()) to proceed up the MRO in the case of multiple inheritance. Would you be able to confirm that this fixes your case? Would you like to submit a pull request?

Member

jnothman commented Feb 8, 2017

We tried to limit its effect with the
if type(self).__module__.startswith('sklearn.'):
but I think we should have also used return super(...).__getstate__() instead of return dict(self.__dict__.items()) to proceed up the MRO in the case of multiple inheritance. Would you be able to confirm that this fixes your case? Would you like to submit a pull request?

@jnothman jnothman added the Bug label Feb 8, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Feb 8, 2017

Member

And you may not request an upstream fix, but I consider this a bug. I think we are not often enough aware of multiple inheritance/MRO issues, not just involved in overwriting standard methods.

Member

jnothman commented Feb 8, 2017

And you may not request an upstream fix, but I consider this a bug. I think we are not often enough aware of multiple inheritance/MRO issues, not just involved in overwriting standard methods.

@HolgerPeters

This comment has been minimized.

Show comment
Hide comment
@HolgerPeters

HolgerPeters Feb 8, 2017

Contributor

I will prepare a pull request. Glad to help.

Contributor

HolgerPeters commented Feb 8, 2017

I will prepare a pull request. Glad to help.

@jnothman jnothman closed this Feb 20, 2017

jnothman added a commit that referenced this issue Feb 20, 2017

[MRG+1] Fix pickling bug due to multiple inheritance & __getstate__ (#…
…8324)


Fixes #8316


* Don't use test classes to group tests

* only use formatting for parts of the string that change

* Flake 8 column limit

* Make the modification of the estimator more explicit in the tests

* As suggested in code review, prefer formatting over two literals

* Also assert, that __setstate__ overwriting works in mixin

* Remove cache property

* Use assertion functions from sklearn.utils.testing

* remove the protocol argument in tests

* Rename attributes to better convey their purpose

* Revert change of module in TreeNoVersion

* Adhere to column-limit

* changelog entry

* Fix commit message

sergeyf added a commit to sergeyf/scikit-learn that referenced this issue Feb 28, 2017

[MRG+1] Fix pickling bug due to multiple inheritance & __getstate__ (#…
…8324)


Fixes #8316


* Don't use test classes to group tests

* only use formatting for parts of the string that change

* Flake 8 column limit

* Make the modification of the estimator more explicit in the tests

* As suggested in code review, prefer formatting over two literals

* Also assert, that __setstate__ overwriting works in mixin

* Remove cache property

* Use assertion functions from sklearn.utils.testing

* remove the protocol argument in tests

* Rename attributes to better convey their purpose

* Revert change of module in TreeNoVersion

* Adhere to column-limit

* changelog entry

* Fix commit message

Sundrique added a commit to Sundrique/scikit-learn that referenced this issue Jun 14, 2017

[MRG+1] Fix pickling bug due to multiple inheritance & __getstate__ (#…
…8324)


Fixes #8316


* Don't use test classes to group tests

* only use formatting for parts of the string that change

* Flake 8 column limit

* Make the modification of the estimator more explicit in the tests

* As suggested in code review, prefer formatting over two literals

* Also assert, that __setstate__ overwriting works in mixin

* Remove cache property

* Use assertion functions from sklearn.utils.testing

* remove the protocol argument in tests

* Rename attributes to better convey their purpose

* Revert change of module in TreeNoVersion

* Adhere to column-limit

* changelog entry

* Fix commit message

NelleV added a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017

[MRG+1] Fix pickling bug due to multiple inheritance & __getstate__ (#…
…8324)


Fixes #8316


* Don't use test classes to group tests

* only use formatting for parts of the string that change

* Flake 8 column limit

* Make the modification of the estimator more explicit in the tests

* As suggested in code review, prefer formatting over two literals

* Also assert, that __setstate__ overwriting works in mixin

* Remove cache property

* Use assertion functions from sklearn.utils.testing

* remove the protocol argument in tests

* Rename attributes to better convey their purpose

* Revert change of module in TreeNoVersion

* Adhere to column-limit

* changelog entry

* Fix commit message

paulha added a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017

[MRG+1] Fix pickling bug due to multiple inheritance & __getstate__ (#…
…8324)


Fixes #8316


* Don't use test classes to group tests

* only use formatting for parts of the string that change

* Flake 8 column limit

* Make the modification of the estimator more explicit in the tests

* As suggested in code review, prefer formatting over two literals

* Also assert, that __setstate__ overwriting works in mixin

* Remove cache property

* Use assertion functions from sklearn.utils.testing

* remove the protocol argument in tests

* Rename attributes to better convey their purpose

* Revert change of module in TreeNoVersion

* Adhere to column-limit

* changelog entry

* Fix commit message

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017

[MRG+1] Fix pickling bug due to multiple inheritance & __getstate__ (#…
…8324)


Fixes #8316


* Don't use test classes to group tests

* only use formatting for parts of the string that change

* Flake 8 column limit

* Make the modification of the estimator more explicit in the tests

* As suggested in code review, prefer formatting over two literals

* Also assert, that __setstate__ overwriting works in mixin

* Remove cache property

* Use assertion functions from sklearn.utils.testing

* remove the protocol argument in tests

* Rename attributes to better convey their purpose

* Revert change of module in TreeNoVersion

* Adhere to column-limit

* changelog entry

* Fix commit message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment