Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot modify dictionaries inside dictionaries using Managers from multiprocessing #51015

Closed
carlosdf mannequin opened this issue Aug 23, 2009 · 21 comments
Closed

Cannot modify dictionaries inside dictionaries using Managers from multiprocessing #51015

carlosdf mannequin opened this issue Aug 23, 2009 · 21 comments
Assignees
Labels
3.7 type-bug An unexpected behavior, bug, or error

Comments

@carlosdf
Copy link
Mannequin

carlosdf mannequin commented Aug 23, 2009

BPO 6766
Nosy @1st1, @applio, @76creates
Files
  • mp_proxy_hack.diff: a crude hack that appears to work
  • test_dict_dict_arrays.py
  • nesting.py: Code example from comment
  • issue_6766_py36.patch
  • issue_6766_py36.nogit.patch: Patch for 3.6 branch in proper format
  • issue_6766_py36.nogit.yuryfeedback.patch: Patch for 3.6 with suggestions from Yury
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/applio'
    closed_at = <Date 2016-09-08.13:13:59.768>
    created_at = <Date 2009-08-23.17:46:28.718>
    labels = ['type-bug', '3.7']
    title = 'Cannot modify dictionaries inside dictionaries using Managers from multiprocessing'
    updated_at = <Date 2019-02-28.16:44:16.304>
    user = 'https://bugs.python.org/carlosdf'

    bugs.python.org fields:

    activity = <Date 2019-02-28.16:44:16.304>
    actor = 'dusan76'
    assignee = 'davin'
    closed = True
    closed_date = <Date 2016-09-08.13:13:59.768>
    closer = 'berker.peksag'
    components = []
    creation = <Date 2009-08-23.17:46:28.718>
    creator = 'carlosdf'
    dependencies = []
    files = ['15119', '21646', '34414', '44402', '44448', '44453']
    hgrepos = []
    issue_num = 6766
    keywords = ['patch']
    message_count = 20.0
    messages = ['91889', '93948', '93951', '93957', '93961', '93962', '98529', '98548', '133655', '213534', '255795', '257226', '274625', '274874', '274911', '274920', '274937', '309914', '311724', '336850']
    nosy_count = 16.0
    nosy_names = ['jnoller', 'carlosdf', 'terrence', 'kghose', 'python-dev', 'sbt', 'dariosg', 'yselivanov', 'Richard.Fothergill', 'dan.oreilly', 'Waldemar.Parzonka', 'davin', 'Justin Patrin', 'John_81', 'Snidhi', 'dusan76']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue6766'
    versions = ['Python 3.7']

    @carlosdf
    Copy link
    Mannequin Author

    carlosdf mannequin commented Aug 23, 2009

    It's not possible to modify a dict inside a dict using a manager from
    multiprocessing.

    Ex:

    from multiprocessing import Process,Manager
    
    def f(d):
        d['1'] = '1'
        d['2']['1'] = 'Try To Write'
    
    if __name__ == '__main__':
        manager = Manager()
    
        d = manager.dict()
    d['2'] = manager.dict()
    
    print d
    
        p = Process(target=f, args=(d,))
        p.start()
        p.join()
    print d
    
    d['2'] = 5
    print d
    

    The output Under Windows 7 (32 Bits) / Python 2.6.2 (32 Bits) is:

    {'2': {}}
    {'1': '1', '2': {}}
    {'1': '1', '2': 5}

    The output is the same if you change "d['2'] = manager.dict()" to
    "d['2'] = dict()"

    @carlosdf carlosdf mannequin added the type-bug An unexpected behavior, bug, or error label Aug 23, 2009
    @terrence
    Copy link
    Mannequin

    terrence mannequin commented Oct 13, 2009

    I get the same results on:
    Python 2.6.2 (r262:71600, Sep 14 2009, 18:47:57)
    [GCC 4.3.2] on linux2

    I think this is the same issue I was seeing yesterday. You can exercise
    the issue and cause an exception with just 6 lines:

    ##### CODE #####
    from multiprocessing import Manager
    manager = Manager()
    ns_proxy = manager.Namespace()
    evt_proxy = manager.Event()
    ns_proxy.my_event_proxy = evt_proxy
    print ns_proxy.my_event_proxy
    ##### TRACEBACK #####
    Traceback (most recent call last):
      File "test_nsproxy.py", line 39, in <module>
        print ns_proxy.my_event_proxy
      File "/usr/lib64/python2.6/multiprocessing/managers.py", line 989, in
    __getattr__
        return callmethod('__getattribute__', (key,))
      File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in
    _callmethod
        raise convert_to_error(kind, result)
    multiprocessing.managers.RemoteError:

    Unserializable message: ('#RETURN', <threading._Event object at 0x1494790>)
    ---------------------------------------------------------------------

    Storing a proxy into a proxied object and then accessing the proxy
    returns a copy of the object itself and not the stored proxy. Thus,
    updates to the nested dict are local and do not update the real object,
    and proxies to unpicklable objects raise an exception when accessed.

    @terrence
    Copy link
    Mannequin

    terrence mannequin commented Oct 14, 2009

    When a manager receives a message, it unpickles the arguments; this
    calls BaseProxy.__reduce__, which calls RebuildProxy. If we are in the
    manager, this returns the actual object, otherwise it returns a new
    proxy. If we naively disable the ability for proxied objects to be
    unredirected in the manager, as in the attached svn diff, this solves
    the problem that Carlos and I are seeing. Surprisingly, after applying
    this change, the full multiprocessing regression test still runs fine.
    I'm sure this change should have some greater impact, but I'm not sure
    what. I would appreciate if someone more knowledgeable could comment.

    @jnoller
    Copy link
    Mannequin

    jnoller mannequin commented Oct 14, 2009

    Nothing jumps out to me off the top of my head - I can take a closer look
    at this after my pycon planning duties finish up in a few weeks. I agree
    this is unintended behavior. I'll need to audit the tests to make sure
    that A> This is being tested, and B> Those tests are not disabled.

    When we included multiprocessing, some tests were deemed too unstable at
    the time, and we disabled. This was unfortunate, and I haven't been able
    to circle back and spend the time needed to refactor the test suite.

    @terrence
    Copy link
    Mannequin

    terrence mannequin commented Oct 14, 2009

    The tests for the SyncManager are being automagically generated at
    import time -- I was not quite able to follow that well enough to know
    exactly what is getting tested, or if they are even enabled. It did not
    appear to contain any recursion, however.

    @jnoller
    Copy link
    Mannequin

    jnoller mannequin commented Oct 14, 2009

    Yeah, the auto-generation is too clever and needs to be pulled out
    entirely.

    @kghose
    Copy link
    Mannequin

    kghose mannequin commented Jan 29, 2010

    Even with the patch, I can not resolve this problem. I can reproduce the problem with the patched version with the following code. My system is:

    Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32)
    IPython 0.10
    Platform is Mac OS X (10.5.8) Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 PDT 2009

    import multiprocessing as mp
    
    def f(d):
      d['f'] = {}
      d['f']['msg'] = 'I am here'
    
    manager = mp.Manager()
    d = manager.dict()
    
    p = mp.Process(target=f, args=(d,))
    
    p.start()
    p.join()

    print d

    d = {}
    f(d)

    print d

    Output:

    {'f': {}}
    {'f': {'msg': 'I am here'}}

    @terrence
    Copy link
    Mannequin

    terrence mannequin commented Jan 30, 2010

    Kaushik, in your example, d is a dict proxy, so assignment to d['f'] correctly ferries the assignment (a new normal dict) to the d['f'] in the original process. The new dict, however, is not a dict proxy, it's just a dict, so assignment of d['f']['msg'] goes nowhere. All hope is not lost, however, because the Manager can be forked to new processes. The slightly modified example below shows how this works:

    from multiprocessing import Process, Manager
    def f(m, d):
        d['f'] = m.dict()
        d['f']['msg'] = 'I am here'
    
    m = Manager()
    d = m.dict()
    p = Process(target=f, args=(m,d))
    p.start()
    p.join()
    print d
    {'f': <DictProxy object, typeid 'dict' at 0x7f1517902810>}
    print d['f']
    {'msg': 'I am here'}

    With the attached patch, the above works as shown, without, it gives the same output as your original example.

    @dariosg
    Copy link
    Mannequin

    dariosg mannequin commented Apr 13, 2011

    Hello,
    Trying to share a dictionary of dictionaries of lists with a manager I get the same problem with the patch applied in Python 2.7 (r27:82500, Nov 24 2010, 18:24:29) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2.

    The shared variable in results and what I'm trying to do is simultaneously parsing multiple files.

    The quality of the code is not very good because I'm a newbie python programmer.

    Best regards,
    Darío

    @RichardFothergill
    Copy link
    Mannequin

    RichardFothergill mannequin commented Mar 14, 2014

    I'm getting these results on both:
    Python 3.2.3 (default, Apr 10 2013, 06:11:55)
    [GCC 4.6.3] on linux2
    and
    Python 2.7.3 (default, Apr 10 2013, 06:20:15)
    [GCC 4.6.3] on linux2

    The symptoms are exactly as Terrence described.

    Nesting proxied containers is supposed to be a supported use case! From the documentation: http://docs.python.org/2/library/multiprocessing.html#proxy-objects

    >>> a = manager.list()
    >>> b = manager.list()
    >>> a.append(b)         # referent of a now contains referent of b
    >>> print a, b
    [[]] []
    >>> b.append('hello')
    >>> print a, b
    [['hello']] ['hello']
    
    The documented code works as expected, but:
    >>> a[0].append('world')  # Appends to b?
    >>> print a, b
    [['hello']] ['hello']

    I've attached my reproduction as a script.

    @JustinPatrin
    Copy link
    Mannequin

    JustinPatrin mannequin commented Dec 2, 2015

    I'm still running into these issues with Python 2.7.10. I'm trying to find a way to share dynamically allocated sub-dictionaries through multiprocessing as well as dynamically allocated RLock and Value instances. I can use the manager to create them but when I put them in a managed dict the various issues related in this ticket happen.

    @applio
    Copy link
    Member

    applio commented Dec 30, 2015

    Two core issues are compounding one another here:

    1. An un-pythonic, inconsistent behavior currently exists with how managed lists and dicts return different types of values.
    2. Confusion comes from reading what is currently in the docs regarding the expected behavior of nested managed objects (e.g. managed dict containing other managed dicts).

    As Terrence described, it is RebuildProxy where the decision is made to not return a proxy object but a new local instance (copy) of the managed object from the Server. Unfortunately there are use cases where Terrence's proposed modification won't work such as a managed list that contains a reference to itself or more generally a managed list/dict that contains a reference to another managed list/dict when an attempt is made to delete the outer managed list/dict before the inner. The reference counting implementation in multiprocessing.managers.Server obtains a lock before decrementing reference counts and any deleting of objects whose count has dropped to zero. In fact, when an object's ref count drops to zero, it deletes the object synchronously and won't release the lock until it's done. If that object contains a reference to another proxy object (managed by the same Manager and Server), it will follow a code path that leads it to wait forever for that same lock to be released before it can decref that managed object.

    I agree with Jesse's earlier assessment that the current behavior (returning a copy of the managed object and not a proxy) is unintended and has unintended consequences. There are hints in Richard's (sbt's) code that also suggest this is the case. Merely better documenting the current behavior does nothing to address the lack of or at least limited utility suggested in the comments here or the extra complications described in bpo-20854. As such, I believe this is behavior that should be addressed in 2.7 as well as 3.x.

    My proposed patch makes the following changes:

    1. Changes RebuildProxy to always return a proxy object (just like Terrence).
    2. Changes Server's decref() to asynchronously delete objects after their ref counts drop to 0.
    3. Updates the documentation to clarify the expected behavior and clean up the terminology to hopefully minimize potential for confusion or misinterpretation.
    4. Adds tests to validate this expected behavior and verify no lock contention.

    Concerned about performance, I've attempted applying the #2 change without the others and put it through stress tests on a 4.0GHz Core i7-4790K in a iMac-Retina5K-late2014 OS X system and discovered no degradation in execution speed or memory overhead. If anything with #2 applied it was slightly faster but the differences are too small to be regarded as anything more significant than noise.
    In separate tests, applying the #1 and #2 changes together has no noteworthy impact when stress testing with non-nested managed objects but when stress testing the use of nested managed objects does result in a slowdown in execution speed corresponding to the number of nested managed objects and the requisite additional communication surrounding them.

    These proposed changes enable the following code to execute and terminate cleanly:
    import multiprocessing
    m = multiprocessing.Manager()
    a = m.list()
    b = m.list([4, 5])
    a.append(b)
    print(str(a))
    print(str(b))
    print(repr(a[0]))
    a[0].append(6)
    print(str(b))

    To produce the following output:
    [<ListProxy object, typeid 'list' at 0x110b99260>]
    [4, 5]
    <ListProxy object, typeid 'list' at 0x110b7a538>
    [4, 5, 6]

    Justin: I've tested the RLock values I've gotten back from my managed lists too -- just didn't have that in the example above.

    Patches to be attached shortly (after cleaning up a bit).

    @applio applio assigned applio and unassigned jnoller Dec 30, 2015
    @applio
    Copy link
    Member

    applio commented Sep 6, 2016

    Attaching patch for default (3.6) branch which implements what was previously described and discussed, updates the documentation to explain this updated behavior, and includes new tests.

    @yselivanov: Can you think of any edge cases that should be handled but we're missing?

    @applio
    Copy link
    Member

    applio commented Sep 7, 2016

    Updating previously supplied patch for 3.6 to the right format.

    @applio
    Copy link
    Member

    applio commented Sep 7, 2016

    Attaching updated patch to reflect Yury's suggested changes from review.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 7, 2016

    New changeset 39e7307f9aee by Davin Potts in branch 'default':
    Fixes issue bpo-6766: Updated multiprocessing Proxy Objects to support nesting
    https://hg.python.org/cpython/rev/39e7307f9aee

    @applio
    Copy link
    Member

    applio commented Sep 8, 2016

    Fixed in upcoming 3.6.

    @John81
    Copy link
    Mannequin

    John81 mannequin commented Jan 14, 2018

    Hi all, I'm trying to use multiprocessing with a 3d list. From the documentation I expected it to work. As I found this report a bid later, I opened a bug report here: https://bugs.python.org/issue32538. Am I doing sth. wrong or is it still not working in 3.6.3?

    @Snidhi
    Copy link
    Mannequin

    Snidhi mannequin commented Feb 6, 2018

    Hi team,

    Looks like this issue remains per code below:

    import multiprocessing, sys, time, traceback;
    
    if __name__ == '__main__':
    print(sys.version);
    
        mpd = multiprocessing.Manager().dict();
        mpd['prcss'] = {'q' : 'queue_1', 'ctlg' : 'ctlg_1' };
    # update 1 - doesn't work!
    mpd['prcss'].update( { 'name': 'concfun_1'} );
    print('Result of failed update 1:', mpd['prcss']);
    
    # update 2 - doesn't work!
    mpd['prcss']['name'] = 'concfun_1';
    print('Result of failed update 2:', mpd['prcss']);
    
        # update 3 - works!
        mpd_prcss = mpd['prcss'];
        mpd_prcss['name'] = 'concfun_1';
        mpd['prcss'] = mpd_prcss;
        print('Result of successful update 3:', mpd['prcss']);

    ### --- output ###
    3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
    Result of failed update 1: {'q': 'queue_1', 'ctlg': 'ctlg_1'}
    Result of failed update 2: {'q': 'queue_1', 'ctlg': 'ctlg_1'}
    Result of successful update 3: {'q': 'queue_1', 'ctlg': 'ctlg_1', 'name': 'concfun_1'}

    @76creates
    Copy link
    Mannequin

    76creates mannequin commented Feb 28, 2019

    Hey folks,

    This is still an issue with 3.7.2

    ===============================================

    # Python 3.7.2 (default, Jan 10 2019, 23:51:51)
    # [GCC 8.2.1 20181127] on linux

    from multiprocessing import  Manager
    
    manager = Manager()
    d = manager.dict({})

    d["test"] = {"a": 123}
    # update fails
    d["test"]["a"] = 321
    # add fails
    d["test"]["b"] = 321

    print(d)

    @76creates 76creates mannequin added the 3.7 label Feb 28, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @charxhit
    Copy link

    charxhit commented Aug 20, 2022

    Just to be clear, the patch detailed by @applio only works if the nested dictionaries/lists are created through a manager. Therefore, you must store your dictionaries/lists inside a manager before nesting them inside another managed dictionary/list. There is, however, a way to handle this automatically without explicitly storing the nested objects inside a manager detailed here: https://stackoverflow.com/a/73418403/16310741

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants