Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pickling slots in subclasses of common classes #70766

Closed
serhiy-storchaka opened this issue Mar 17, 2016 · 10 comments
Closed

Support pickling slots in subclasses of common classes #70766

serhiy-storchaka opened this issue Mar 17, 2016 · 10 comments
Labels
3.11 only security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Mar 17, 2016

BPO 26579
Nosy @rhettinger, @pitrou, @vstinner, @avassalotti, @bitdancer, @berkerpeksag, @serhiy-storchaka
PRs
  • bpo-26579: Add object.__getstate__(). #2821
  • Files
  • copyreg_getstate.patch
  • copyreg_getstate2.patch
  • object_getstate.patch: Add object.getstate
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-04-06.17:02:02.310>
    created_at = <Date 2016-03-17.08:43:06.421>
    labels = ['extension-modules', 'type-feature', 'library', '3.11']
    title = 'Support pickling slots in subclasses of common classes'
    updated_at = <Date 2022-04-07.14:57:28.953>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2022-04-07.14:57:28.953>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-04-06.17:02:02.310>
    closer = 'serhiy.storchaka'
    components = ['Extension Modules', 'Library (Lib)']
    creation = <Date 2016-03-17.08:43:06.421>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['42183', '43462', '43716']
    hgrepos = []
    issue_num = 26579
    keywords = ['patch']
    message_count = 5.0
    messages = ['261903', '268830', '270403', '416888', '416931']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'pitrou', 'vstinner', 'alexandre.vassalotti', 'r.david.murray', 'berker.peksag', 'serhiy.storchaka']
    pr_nums = ['2821']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue26579'
    versions = ['Python 3.11']

    Linked PRs

    @serhiy-storchaka
    Copy link
    Member Author

    Pickling and copying instances of subclasses of some basic classes pickles and copies instance attributes. Example:

    >>> class BA(bytearray):
    ...     pass
    ... 
    >>> b = BA(b'abc')
    >>> b.x = 10
    >>> c = copy.copy(b)
    >>> c.x
    10
    >>> c = pickle.loads(pickle.dumps(b))
    >>> c.x
    10

    But this doesn't work if attributes are saved not in instance dictionary, but in slots.

    >>> class BA(bytearray):
    ...     __slots__ = ('x',)
    ... 
    >>> b = BA(b'abc')
    >>> b.x = 10
    >>> c = copy.copy(b)
    >>> c.x
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: x
    >>> c = pickle.loads(pickle.dumps(b))
    >>> c.x
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: x

    Since using __slots__ is implementation detail, this failure can be considered as a bug.

    Proposed patch adds support of pickling and copying slots in subclasses of all classes that already support pickling and copying non-slot attributes. It is backward compatible, classes with slots can be unpickled on older Python versions without slots. Affected classes: bytearray, set, frozenset, weakref.WeakSet, collections.OrderedDict, collections.deque, datetime.tzinfo.

    The patch adds the copyreg._getstate() function for Python classes and exposes the _PyObject_GetState() function for extension classes. An alternative (and simpler for end user) solution would be to add default implementation as object.__getstate__(). But this is not easy to reject non-pickleable classes (bpo-22995) in this case, since __getstate__ is looked up as instance attribute, not as other special methods.

    @serhiy-storchaka serhiy-storchaka added extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 17, 2016
    @serhiy-storchaka
    Copy link
    Member Author

    Synchronized with current sources.

    @serhiy-storchaka
    Copy link
    Member Author

    An alternative way is to expose a default state as object.__getstate__(). It is more efficient since it is implemented in C. Following patch implements this approach.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Jul 23, 2017
    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 884eba3 by Serhiy Storchaka in branch 'main':
    bpo-26579: Add object.__getstate__(). (GH-2821)
    884eba3

    @serhiy-storchaka serhiy-storchaka added 3.11 only security fixes and removed 3.7 (EOL) end of life labels Apr 6, 2022
    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2022

    bpo-26579: Add object.__getstate__(). (GH-2821)
    884eba3

    This change introduced reference leaks: see bpo-47250.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    oscarbenjamin added a commit to oscarbenjamin/sympy that referenced this issue May 16, 2022
    This is needed after the changes in
    
      python/cpython#70766
    
    Since object now has a __getstate__ method we need to make sure not to
    call it with any arguments. Probably the __getstate__ methods can be
    simplified (or removed?) in light of the cpython changes but for now
    this is a quick fix to restore previous behaviour.
    oscarbenjamin added a commit to oscarbenjamin/sympy that referenced this issue May 16, 2022
    This is needed after the changes in
    
      python/cpython#70766
    
    Since object now has a __getstate__ method we need to make sure not to
    call it with any arguments. Probably the __getstate__ methods can be
    simplified (or removed?) in light of the cpython changes but for now
    this is a quick fix to restore previous behaviour.
    oscarbenjamin added a commit to oscarbenjamin/sympy that referenced this issue May 16, 2022
    This is needed after the changes in
    
      python/cpython#70766
    
    Since object now has a __getstate__ method we need to make sure not to
    call it with any arguments. Probably the __getstate__ methods can be
    simplified (or removed?) in light of the cpython changes but for now
    this is a quick fix to restore previous behaviour.
    BenjaminBossan added a commit to BenjaminBossan/scikit-learn that referenced this issue Dec 13, 2022
    Since Python 3.11, objects have a __getstate__ method by default:
    
    python/cpython#70766
    
    Therefore, the exception in BaseEstimator.__getstate__ will no longer be
    raised, thus not falling back on using the object's __dict__:
    
    https://github.com/scikit-learn/scikit-learn/blob/dc580a8ef5ee2a8aea80498388690e2213118efd/sklearn/base.py#L274-L280
    
    If the instance dict of the object is empty, the return value will,
    however, be None. Therefore, the line below calling state.items()
    results in an error.
    
    In this bugfix, it is checked if the state is None and if it is, the
    object's __dict__ is used (which should always be empty).
    
    Not addressed in this PR is how to deal with slots (see also discussion
    in scikit-learn#10079). When there are __slots__, __getstate__ will actually return
    a tuple, as documented here:
    
    https://docs.python.org/3/library/pickle.html#object.__getstate__
    
    The user would thus still get an indiscriptive error message.
    @gpshead
    Copy link
    Member

    gpshead commented Jul 13, 2023

    This caused object.__getstate__ to exist where it hadn't in the past, causing trouble for CLIF generated extension modules.
    See https://github.com/google/clif/pull/64/files#diff-de65cd2a0ef58411075450cf769b38dd6d53803108d536309289cda5810a379aR23 for an example of what we're unfortunately likely to have to do with CLIF for its generated extension modules to not be broken by this behavior change.

    It'd be nice if the 3.11 What's New documentation covered this API change. https://docs.python.org/3/whatsnew/3.11.html#other-language-changes only extolls the change's virtues and does not mention anything about the behavior change consequences and workarounds at the Python and C API levels that people may need to adopt to retain compatibility.

    @serhiy-storchaka
    Copy link
    Member Author

    I don't know what can be suggested as a general workaround. Each problem requires an individual solution, depending on why that code was broken by that change.

    @lwinkler
    Copy link

    This also creates a bug in pyyaml and ruamel.yaml. It seems that for an empty class, the new getstate method returns None. This causes the serialization to crash.

    @serhiy-storchaka : Is there a good reason why getstate returns None for an empty class instead of an empty directory ?

    Minimal code:

    class Empty:
        pass
    
    empty = Empty()
    print("getstate", empty.__getstate__())
    

    prints: getstate None

    @serhiy-storchaka
    Copy link
    Member Author

    @serhiy-storchaka : Is there a good reason why getstate returns None for an empty class instead of an empty directory ?

    Because pickle.dumps() creates more compact pickle if __getstate__() returns None. Also, it saves time in __getstate__() for creating an empty dict and saves time in pickle.loads() for calling __setstate__().

    Add def __getstate__(self): return {} to an empty class and compare the result of pickle.dumps().

    @nitzmahone
    Copy link

    Accounting for this in PyYAML shouldn't be a big deal- IIUC None has always been a documented valid response from __getstate__() anyway, so we need to be able to handle and preserve that properly.

    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 23, 2023
    pythonGH-108379)
    
    (cherry picked from commit b6be188)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 23, 2023
    pythonGH-108379)
    
    (cherry picked from commit b6be188)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    gpshead added a commit that referenced this issue Aug 23, 2023
    …w. (GH-108379) (#108385)
    
    gh-70766: Mention the object getstate caveat in 3.11 What's new. (GH-108379)
    (cherry picked from commit b6be188)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    Yhg1s pushed a commit that referenced this issue Aug 23, 2023
    …w. (GH-108379) (#108384)
    
    gh-70766: Mention the object getstate caveat in 3.11 What's new. (GH-108379)
    (cherry picked from commit b6be188)
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants