Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling of methodcaller, attrgetter, and itemgetter #67144

Closed
anntzer mannequin opened this issue Nov 27, 2014 · 23 comments
Closed

Pickling of methodcaller, attrgetter, and itemgetter #67144

anntzer mannequin opened this issue Nov 27, 2014 · 23 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@anntzer
Copy link
Mannequin

anntzer mannequin commented Nov 27, 2014

BPO 22955
Nosy @rhettinger, @pitrou, @zware, @serhiy-storchaka, @anntzer, @MojoVampire, @thatneat
Files
  • pickle_getter_and_caller.patch
  • pickle_getter_and_caller2.patch
  • issue22955.diff: josh.r's patch with itemgetter and attrgetter reimplementations
  • pickle_getter_and_caller3.patch
  • pickle_getter_and_caller4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-05-27.08:49:19.200>
    created_at = <Date 2014-11-27.06:33:03.813>
    labels = ['type-feature', 'library']
    title = 'Pickling of methodcaller, attrgetter, and itemgetter'
    updated_at = <Date 2016-05-16.19:28:48.781>
    user = 'https://github.com/anntzer'

    bugs.python.org fields:

    activity = <Date 2016-05-16.19:28:48.781>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-05-27.08:49:19.200>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2014-11-27.06:33:03.813>
    creator = 'Antony.Lee'
    dependencies = []
    files = ['37314', '37315', '37320', '37448', '39395']
    hgrepos = []
    issue_num = 22955
    keywords = ['patch']
    message_count = 23.0
    messages = ['231752', '231768', '231831', '231841', '231848', '231849', '231850', '231851', '231872', '231873', '232363', '232364', '232370', '232512', '232616', '232617', '232641', '243364', '243676', '243687', '243746', '265718', '265727']
    nosy_count = 8.0
    nosy_names = ['rhettinger', 'pitrou', 'python-dev', 'zach.ware', 'serhiy.storchaka', 'Antony.Lee', 'josh.r', 'jason.curtis']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue22955'
    versions = ['Python 3.5']

    @anntzer
    Copy link
    Mannequin Author

    anntzer mannequin commented Nov 27, 2014

    methodcaller and attrgetter objects seem to be picklable, but in fact the pickling is erroneous:

    >>> import operator, pickle
    >>> pickle.loads(pickle.dumps(operator.methodcaller("foo")))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: methodcaller needs at least one argument, the method name
    >>> pickle.loads(pickle.dumps(operator.attrgetter("foo")))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: attrgetter expected 1 arguments, got 0

    When looking at the pickle disassembly, it seems that the argument to the constructor is indeed not pickled.

    >>> import pickletools; pickletools.dis(pickle.dumps(operator.methodcaller("foo")))
        0: \x80 PROTO      3
        2: c    GLOBAL     'operator methodcaller'
       25: q    BINPUT     0
       27: )    EMPTY_TUPLE
       28: \x81 NEWOBJ
       29: q    BINPUT     1
       31: .    STOP
    highest protocol among opcodes = 2

    @anntzer anntzer mannequin added the stdlib Python modules in the Lib dir label Nov 27, 2014
    @pitrou pitrou added the type-feature A feature request or enhancement label Nov 27, 2014
    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Nov 27, 2014

    I think this issue needs different solutions for 3.5 and maintained releases. We can implement the pickling of methodcaller, attrgetter and itemgetter in 3.5 (I agree this is good idea). And it would be good if pickling of these types will raise an exception in maintained releases.

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 27, 2014
    @zware
    Copy link
    Member

    zware commented Nov 28, 2014

    Note that pickling of the pure Python version of methodcaller works as expected:

    Python 3.4.2 (default, Nov 20 2014, 12:40:10) 
    [GCC 4.8.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.modules['_operator'] = None
    >>> import operator
    >>> import pickle
    >>> pickle.loads(pickle.dumps(operator.methodcaller('foo')))
    <operator.methodcaller object at 0x7ff869945898>

    The pure Python attrgetter and itemgetter don't work due to using functions defined in __init__().

    2.7 already raises TypeError on attempts to pickle any of the three.

    @zware zware changed the title Pickling of methodcaller and attrgetter Pickling of methodcaller, attrgetter, and itemgetter Nov 28, 2014
    @rhettinger
    Copy link
    Contributor

    rhettinger commented Nov 29, 2014

    +1 for adding pickling support to Python 3.5.

    I don't see much of a need for any revision to 3.4.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 29, 2014

    I've made a patch that I believe should cover all three cases, including tests.

    In addition to the pickling behavior, I've made two other changes:

    1. methodcaller verifies during construction that the name is a string (PyUnicode), and interns it; attrgetter did this already, and I tweaked methodcaller to match for correctness and performance reasons
    2. I added proper repr functionality to all three objects. Partially this is just to make it look nicer, but it was also a decent way to spot verify that the pickle/unpickle sequence behaved correctly

    Anyone care to review?

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 29, 2014

    Don't bother reviewing just yet. There is an issue with attrgetter's pickling (which the unit tests caught), and I need to update the pure Python modules to match.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 29, 2014

    Okay, this one passes the tests for the built-in module. I'm not sure what's going wrong with the pure Python module. I'm getting the error:

    _pickle.PicklingError: Can't pickle <class 'operator.attrgetter'>: it's not the same object as operator.attrgetter
    

    once for each of the three objects. Anyone recognize this? Is this some weird artifact of the multiple imports required to test both pure Python and C versions of the module that I need to work around, or did I make a mistake somewhere else?

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 29, 2014

    Ah, solved it (I think). The bootstrapper used to import the Python and C versions of the module leaves sys.modules unpopulated (Does pickle itself may populate it when it finds no module of that name?). I added a setUp method to the unittest class for operator that explicitly sets sys.modules['operator'] to whichever version is being tested at the time so pickle's lookup works as expected. Is that the right solution? New patch uploaded with that change.

    @zware
    Copy link
    Member

    zware commented Nov 29, 2014

    I'd prefer to just reimplement itemgetter and attrgetter to make them picklable rather than adding pickling methods to them; see attached patch.

    I also posted a few comments, but I just went ahead and addressed them myself in this patch. I'm not qualified to give the _operator.c changes a proper review, but they look good enough to me if others agree that __reduce__ is the best approach in C.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Nov 29, 2014

    operator.methodcaller is similar to functools.partial which is pickleable and can be used as a sample.

    In C implementation some code can be shared between __repr__ and __reduce__ methods.

    As for tests, different protocols should be tested. Also should be tested compatibility between C and Python implementations, instances pickled with one implementation should be unpickleable with other implementation. Move pickle tests into new test class.

    If add __repr__ methods, they need tests. The restriction of method name type should be tested too.

    @rhettinger
    Copy link
    Contributor

    rhettinger commented Dec 9, 2014

    I'd prefer to just reimplement itemgetter and attrgetter to make
    them picklable rather than adding pickling methods to them;
    see attached patch.

    That isn't the usual approach. The pickling methods are there for a reason. I prefer to leave the existing code in a stable state and avoid unnecessary code churn or risk introducing bugs into code that is working correctly and as designed.

    @rhettinger
    Copy link
    Contributor

    rhettinger commented Dec 9, 2014

    Please remember that a potential new pickling feature is the least import part of the design of methodcaller, itemgetter, and attrgetter. Pickle support should be driven by the design rather become a predominant consideration.

    One other note: the OP's original concern has very little to do with these particular objects. Instead, it is the picking and unpickling tools themselves that tend to have crummy error messages when presented with objects that weren't specially designed with pickle support.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Dec 9, 2014

    Instead, it is the picking and unpickling tools themselves that tend to have crummy error messages when presented with objects that weren't specially designed with pickle support.

    See bpo-22995 about this.

    @zware
    Copy link
    Member

    zware commented Dec 12, 2014

    Serhiy: functools.partial is a somewhat less than ideal comparison. The pure-Python version is not picklable, the Python and C versions return different things (the Python version is a function returning a function, the C version is a regular class and returns an instance). Also, both versions make their necessary attributes public anyway, unlike methodcaller.

    Raymond: Not necessarily the usual approach, no. However, I think my reimplementations of the pure-Python itemgetter and attrgetter have a few benefits, namely:

    • they're somewhat less complex and thus a bit easier to understand
    • they're slightly faster
    • they don't require extra pickling methods, which to me just seem like clutter when it's so simple to not need them

    Note that I have no intention of reimplementing the C versions: those are much more mature than the Python versions, and would likely require pickling methods anyway.

    All that said, I'm not going to fight about it; if I'm overruled, I'm overruled.

    Josh: Serhiy's points about needing more tests stand; would you like to add them? You can use your patch or mine as a base, depending on how you feel about reimplementing the pure-Python (item|attr}getter. If you use yours, please remember to look through my comments on it.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Dec 13, 2014

    functools.partial is a somewhat less than ideal comparison. The pure-Python version is not picklable, the Python and C versions return different things (the Python version is a function returning a function, the C version is a regular class and returns an instance).

    Looks as Python version of functools.partial() needs a fix.

    Reimplementations of the pure-Python itemgetter and attrgetter to automatically pickleable Python classes have a disadvantage. It makes the pickling incompatible between Python and C versions. This means that itemgetter pickled in CPython will be not unpickleable on Python implementation which don't use C accelerator and vice versa.

    @zware
    Copy link
    Member

    zware commented Dec 13, 2014

    Serhiy Storchaka added the comment:

    Reimplementations of the pure-Python itemgetter and attrgetter to
    automatically pickleable Python classes have a disadvantage. It makes
    the pickling incompatible between Python and C versions. This means
    that itemgetter pickled in CPython will be not unpickleable on Python
    implementation which don't use C accelerator and vice versa.

    That's a very good point that I hadn't thought about. Consider my
    patch withdrawn.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Dec 14, 2014

    Here is revised Josh's patch. Added tests for consistency between both implementations, fixed inconsistencies and bugs.

    I still hesitate about pickling format of methodcaller. First, there is asymmetry between positional and keyword arguments. Second, for now methodcaller is not inheritable, but if it will be in future (as functools.partial is), it would be harder to extend pickling format to support instance attributes.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented May 16, 2015

    methodcaller with keyword arguments pickled with pickle_getter_and_caller3.patch needs Python 3.5 to unpickle. Following patch pickles it in backward compatible way.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 20, 2015

    New changeset 435bc22f39e3 by Serhiy Storchaka in branch 'default':
    Issue bpo-22955: attrgetter, itemgetter and methodcaller objects in the operator
    https://hg.python.org/cpython/rev/435bc22f39e3

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 20, 2015

    New changeset c93e5ba1cc20 by Serhiy Storchaka in branch 'default':
    Issue bpo-22955: Fixed test_operator. It left Python implementation in
    https://hg.python.org/cpython/rev/c93e5ba1cc20

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 21, 2015

    New changeset 2688655e431a by Serhiy Storchaka in branch 'default':
    Issue bpo-22955: Fixed reference leak in attrgetter.repr().
    https://hg.python.org/cpython/rev/2688655e431a

    @thatneat
    Copy link
    Mannequin

    thatneat mannequin commented May 16, 2016

    This is still an issue with operator.attrgetter in 3.4.3, even after clearing sys.modules['_operator']:

    $ python3
    Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
    [GCC 4.8.4] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.modules['_operator'] = None
    >>> import operator
    >>> import pickle
    >>> pickle.loads(pickle.dumps(operator.attrgetter("foo")))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    _pickle.PicklingError: Can't pickle <function attrgetter.__init__.<locals>.func at 0x7f25728d5bf8>: attribute lookup func on operator failed

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented May 16, 2016

    This is new feature in 3.5.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants