Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise an error if a dict is modified during a lookup #58413

Closed
vstinner opened this issue Mar 5, 2012 · 29 comments
Closed

Raise an error if a dict is modified during a lookup #58413

vstinner opened this issue Mar 5, 2012 · 29 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

vstinner commented Mar 5, 2012

BPO 14205
Nosy @gvanrossum, @rhettinger, @vstinner, @markshannon, @JimJJewett
Files
  • dict_lookup.patch
  • nomodify.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-03-09.13:05:00.568>
    created_at = <Date 2012-03-05.21:19:54.963>
    labels = ['interpreter-core']
    title = 'Raise an error if a dict is modified during a lookup'
    updated_at = <Date 2012-05-13.18:50:18.220>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2012-05-13.18:50:18.220>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-03-09.13:05:00.568>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2012-03-05.21:19:54.963>
    creator = 'vstinner'
    dependencies = []
    files = ['24740', '24741']
    hgrepos = []
    issue_num = 14205
    keywords = ['patch']
    message_count = 29.0
    messages = ['154976', '154977', '154989', '154991', '154993', '154994', '155021', '155022', '155023', '155024', '155026', '155027', '155029', '155037', '155147', '155194', '155196', '155197', '155199', '155205', '155214', '155219', '155230', '155231', '155232', '155266', '156858', '157334', '160543']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'rhettinger', 'vstinner', 'Mark.Shannon', 'python-dev', 'Jim.Jewett']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue14205'
    versions = ['Python 3.3']

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2012

    Lib/test/crashers/nasty_eq_vs_dict.py does crash Python because of an
    infinite loop in the dictionary lookup. The script modifies the
    dictionary at each lookup, whereas Python tries a new lookup each time
    that the dictionary is modified.

    I proposed to make the lookup fail with a RuntimeError if the
    dictionary has been modified during a lookup. It should not occur if
    you are not doing something evil.

    @vstinner vstinner added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 5, 2012
    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2012

    Another possibility is to avoid the modification of a dictionary
    during a lookup: nomodify.patch.

    @markshannon
    Copy link
    Member

    I much prefer dict_lookup.patch to nomodify.patch.
    It doesn't increase the memory use of dict. One extra word per dict could be a lot of memory for a large application.

    There is a very subtle semantic change, but I think it is a positive one.
    Raising a runtimne seesm sensible as the dict iterators already raise a RuntimeError if the size of the dict changes.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2012

    I much prefer dict_lookup.patch to nomodify.patch.
    It doesn't increase the memory use of dict. One extra word
    per dict could be a lot of memory for a large application.

    nomodify.patch is the correct fix, but I agree that dict_lookup.patch is better (and sufficient) in practive.

    Raising a runtimne seesm sensible as the dict iterators already
    raise a RuntimeError if the size of the dict changes.

    Yes, that's how I chose the exception.

    >>> d={k: str(k) for k in range(10)}
    >>> for k in d:
    ...  del d[k]
    ... 
    RuntimeError: dictionary changed size during iteration
    
    >>> d={k for k in range(10)}
    >>> for k in d:
    ...  d.remove(k)
    ... 
    RuntimeError: Set changed size during iteration
    
    >>> d=list(range(10))
    >>> def keyfunc(x):
    ...  d.append(1)
    ...  return x
    ... 
    >>> d.sort(key=keyfunc)
    ValueError: list modified during sort

    @rhettinger rhettinger self-assigned this Mar 5, 2012
    @gvanrossum
    Copy link
    Member

    Yeah, dict_lookup.patch seems fine.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 6, 2012

    New changeset 934aaf2191d0 by Victor Stinner in branch 'default':
    Close bpo-14205: dict lookup raises a RuntimeError if the dict is modified during
    http://hg.python.org/cpython/rev/934aaf2191d0

    @python-dev python-dev mannequin closed this as completed Mar 6, 2012
    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Mar 6, 2012

    Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

    I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.

    Would it be worth adding a counter to lookdict, so that one modification is OK, but 5 aren't?

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2012

    Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

    The issue was triggered without threads.If the __eq__ method of the
    objects used for keys use C functions releasing the GIL, you may
    trigger the issue.

    Would it be worth adding a counter to lookdict, so that one modification is OK, but 5 aren't?

    If you implement a special type modifying the dict that contains the
    object, you should implement your own retry function on lookup failure
    (catch RuntimeError).

    Honestly, if you get RuntimeError, you should change your program, not
    retry the lookup!

    @markshannon
    Copy link
    Member

    Jim Jewett wrote:

    Jim Jewett <jimjjewett@gmail.com> added the comment:

    Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

    So, they are writing to a dict in one thread while reading from the same
    dict in another thread, without any external locks and with keys written
    in Python.

    I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.

    I suspect, they are already seeing sporadic failures.
    I think raising an exception is better than weird failures.

    We should document the new behaviour, as it is a change in semantics.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2012

    If the __eq__ method of the
    objects used for keys use C functions releasing the GIL, you may
    trigger the issue.

    Oh, I mean: trigger the issue with threads. I hope that your objects
    don't call C functions like open() in their __eq__() method!

    @gvanrossum
    Copy link
    Member

    On Tue, Mar 6, 2012 at 8:56 AM, Mark Shannon <report@bugs.python.org> wrote:

    Mark Shannon <mark@hotpy.org> added the comment:

    Jim Jewett wrote:
    > Jim Jewett <jimjjewett@gmail.com> added the comment:
    >
    > Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

    So, they are writing to a dict in one thread while reading from the same
    dict in another thread, without any external locks and with keys written
    in Python.
    >
    > I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.
    >
    I suspect, they are already seeing sporadic failures.
    I think raising an exception is better than weird failures.

    We should document the new behaviour, as it is a change in semantics.

    +1 on everything you said.

    @vstinner vstinner reopened this Mar 6, 2012
    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Mar 6, 2012

    On Tue, Mar 6, 2012 at 11:56 AM, Mark Shannon wrote:

    Jim Jewett:
    > Can't this be triggered by non-malicious code that just happened
    > to have a python comparison and get hit with a thread switch?

    So, they are writing to a dict in one thread while reading from the
    same dict in another thread, without any external locks and with
    keys written in Python.

    Correct. For example, it could be a configuration manager, or a
    cache, or even a worklist. If they're just adding new keys, or even
    deleting other (==> NOT the one being looked up) keys, why should that
    keep them from finding the existing, unchanged keys?

    > I'm not sure how often it happens, but today it would not be visible
    > to the user; after the patch, users will see a sporadic failure that
    > they can't easily defend against.

    I suspect, they are already seeing sporadic failures.

    How?

    The chain terminates as soon as the dict doesn't resize; without
    malicious intent, the odds of hitting several resizes in a row are so
    miniscule that it probably hasn't even slowed them down.

    @gvanrossum
    Copy link
    Member

    On Tue, Mar 6, 2012 at 9:43 AM, Jim Jewett <report@bugs.python.org> wrote:
    >
    > Jim Jewett <jimjjewett@gmail.com> added the comment:
    >
    > On Tue, Mar 6, 2012 at 11:56 AM, Mark Shannon wrote:
    >
    >> Jim Jewett:
    >>> Can't this be triggered by non-malicious code that just happened
    >>> to have a python comparison and get hit with a thread switch?
    >
    >> So, they are writing to a dict in one thread while reading from the
    >> same dict in another thread, without any external locks and with
    >> keys written in Python.
    >
    > Correct.  For example, it could be a configuration manager, or a
    > cache, or even a worklist.  If they're just adding new keys, or even
    > deleting other (==> NOT the one being looked up) keys, why should that
    > keep them from finding the existing, unchanged keys?

    Use a lock or a built-in key class. I realize that neither is ideal,
    but then, neither was the old situation.

    >> I'm not sure how often it happens, but today it would not be visible
    >> to the user; after the patch, users will see a sporadic failure that
    >> they can't easily defend against.

    > I suspect, they are already seeing sporadic failures.

    How?

    The chain terminates as soon as the dict doesn't resize; without
    malicious intent, the odds of hitting several resizes in a row are so
    miniscule that it probably hasn't even slowed them down.

    Now I'm torn. If we'd have this RuntimeError from the start, would we
    consider it a flaw in the dict implementation or a feature? The
    RuntimeError when changing a dict's size while iterating over it is
    definitely a feature (so as to allow the implementation to rehash
    everything upon insert/delete). But this is not quite the same. Or is
    it? On the one hand I think the scenario is pretty unlikely (mostly
    because it involves a user-defined comparison); OTOH it would be quite
    nasty to debug. Or would it? You do get a decent error message...

    Note that Victor's alternate fix (nomodify.diff) has the same problem
    -- it just raises RuntimeError in the mutating thread rather than in
    the lookup thread.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Mar 6, 2012

    On Tue, Mar 6, 2012 at 1:05 PM, Guido van Rossum

    Jim Jewett:

    >...  If they're just adding new keys, or even deleting other
    > (==> NOT the one being looked up) keys, why should that
    > keep them from finding the existing, unchanged keys?

    > ... The chain terminates as soon as the dict doesn't resize; without
    > malicious intent, the odds of hitting several resizes in a row are so
    > miniscule that it probably hasn't even slowed them down.

    Now I'm torn. If we'd have this RuntimeError from the start, would we
    consider it a flaw in the dict implementation or a feature?

    I would personally have considered it a flaw. Wrapping all dict
    access just in case some other code added a weird key seems ugly and
    inefficient.

    RuntimeError when changing a dict's size while iterating over it is
    definitely a feature (so as to allow the implementation to rehash
    everything upon insert/delete).

    In that case, you've explicitly asked for the whole set (or at least a
    snapshot); the RuntimeError indicates that you can't get it. Maybe
    there should be a way to say "I don't need the set to be perfect", but
    there isn't, and you aren't getting what you actually did ask for, so
    an Error is appropriate.

    But in this case, you're asking for a specific key, and the key is
    there. It may even be a perfectly normal string, that just happened
    to hash-collide with something some other code inserted.

    The RuntimeError exposes a race condition to user code -- and it may
    not be the code that stuck the oddball class in the dict in the first
    place.

    Note that Victor's alternate fix (nomodify.diff) has the same problem
    -- it just raises RuntimeError in the mutating thread rather than in
    the lookup thread.

    Right; I'm saying that raising an error is the wrong thing to do.

    But there is an analogy to the hash-code change vs collision counting.
    Breaking an application that got unlucky once is wrong. But if the
    bad luck *continues* to the point where it would only occur one time
    in a zillion, RuntimeError is better than a likely infinite loop.

    (It really was a question initially, but I've now convinced at least myself.)

    I would rather see something like replacing

    349 else {
    350 /* The compare did major nasty stuff to the
    351 * dict: start over.
    352 * XXX A clever adversary could prevent this
    353 * XXX from terminating.
    354 */
    355 return lookdict(mp, key, hash);

    with

    349 else {
    350 /* The compare did major nasty stuff to the
    351 * dict: start over.
    352 * XXX A clever adversary could prevent this
    353 * XXX from terminating.
    354 */
    355 return lookdict_paranoid(mp, key, hash, 1);

    where lookdict_paranoid is a near-copy of lookdict that adds a
    paranoia parameter and replaces those same lines with

                else {
                    /* The compare did major nasty stuff *again*. */
                    if (paranoia < PARANOIA_LEVEL) {
                        return lookdict_paranoid(mp, key, hash, paranoia+1);
                    }
                    /* Something is wrong; raise a RuntimeError. */
                }

    @rhettinger rhettinger removed their assignment Mar 7, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 8, 2012

    New changeset 0255bafbccf2 by Victor Stinner in branch 'default':
    Issue bpo-14205: document the change of dict[key] behaviour if dict is modified
    http://hg.python.org/cpython/rev/0255bafbccf2

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 9, 2012

    Guido: So, what do you think? Do you like the new behaviour, or would you prefer a softer solution like retrying N times? Python < 3.3 retries the lookup an unlimited number of times which lead to a crash, that's the issue fixed by my change.

    @gvanrossum
    Copy link
    Member

    I'm still torn. Can you implement the counter without adding an extra field to the dict object? I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.

    The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.

    I'm prepared to make the current semantic change a 3.3 feature, but I'm not sure it's safe to backport.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 9, 2012

    Can you implement the counter without adding an extra field to the dict object?

    Add a counter requires to change the prototype of the C lookup function:
    PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);

    I don't know if it is a problem for ABI compatibility or extension
    modules. I suppose that it is safer and simpler to not change it :-)

     I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.

    The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.

    Is it really useful to only retry once? Retrying once means that it
    would work in most cases, but not in some corner cases. For example,
    retrying once may hide the problem if you have only 1 client (1
    thread), but you get the exception when you have more clients (more
    threads).

    Or do I misunderstand the problem?

    @gvanrossum
    Copy link
    Member

    On Thu, Mar 8, 2012 at 4:56 PM, STINNER Victor <report@bugs.python.org> wrote:

    STINNER Victor <victor.stinner@gmail.com> added the comment:

    > Can you implement the counter without adding an extra field to the dict object?

    Add a counter requires to change the prototype of the C lookup function:
    PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);

    I don't know if it is a problem for ABI compatibility or extension
    modules. I suppose that it is safer and simpler to not change it :-)

    Agreed.

    > I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.
    >
    > The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.

    Is it really useful to only retry once? Retrying once means that it
    would work in most cases, but not in some corner cases. For example,
    retrying once may hide the problem if you have only 1 client (1
    thread), but you get the exception when you have more clients (more
    threads).

    Or do I misunderstand the problem?

    Sorry, I was joking; anyway, 1 in this context would mean do the
    lookup exactly once (so 0 would mean not to do it at all). One retry
    would effectively mean try it twice, which would have to be treated
    skeptically. :-)

    Given all this I think we should keep it as you have committed and add
    it to the docs and whatsnew.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Mar 9, 2012

    On Thu, Mar 8, 2012 at 7:43 PM, STINNER Victor wrote:

    Python < 3.3 retries the lookup an unlimited number of times which
    lead to a crash, that's the issue fixed by my change.

    How does it lead to a crash? I think it just leads to an infinite
    loop (which is worse) if the data is actually malicious, but
    eventually fixes itself if the data is merely unlucky or even
    ill-behaved.

    Guido van Rossume:

    > Can you implement the counter without adding an extra field to the dict object?

    Yes. The logic does add complexity, but there is no per-dict charge,
    and the extra logic can be isolated to a function that will normally
    not be paged in.

    Add a counter requires to change the prototype of the C lookup
    function:
    PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);

    I don't know if it is a problem for ABI compatibility or extension
    modules. I suppose that it is safer and simpler to not change it :-)

    That should still be the signature for the primary lookup; it can do
    its own delegation to a private function with an extra parameter
    if/when needed.

    The numbers 0, 1 and infinity are special; all others are to be
    treated with skepticism.

    Yes; it isn't clear whether to cut off after 1 retry or 3 or 10...

    Is it really useful to only retry once? Retrying once means that it
    would work in most cases, but not in some corner cases. For example,
    retrying once may hide the problem if you have only 1 client (1
    thread), but you get the exception when you have more clients (more
    threads).

    Without threads, there shouldn't be any problems for innocent code.
    Even with threads, the number of resets is not (for innocent code)
    tied to the number of alternative threads; only to the fact that they
    can run mid-lookup.

    -jJ

    @markshannon
    Copy link
    Member

    Unmodified CPython (without the patch) already passes the new test in the patch.

    The unmodified code already raises a Runtime error; a recursion limit exceeded error!

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 9, 2012

    Unmodified CPython (without the patch) already passes the new test in the patch.

    You should try Lib/test/crashers/nasty_eq_vs_dict.py, not my test.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 9, 2012

    New changeset a428f85de29c by Victor Stinner in branch 'default':
    Issue bpo-14205: Document the dict lookup change in What's New in Python 3.3
    http://hg.python.org/cpython/rev/a428f85de29c

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 9, 2012

    Guido> Given all this I think we should keep it as you have committed
    Guido> and add it to the docs and whatsnew.

    I updated What's New in Python 3.3 document. I also wrote an email to python-dev to notify all developers of this change. Let's close the issue.

    @vstinner vstinner closed this as completed Mar 9, 2012
    @markshannon
    Copy link
    Member

    STINNER Victor wrote:

    STINNER Victor <victor.stinner@gmail.com> added the comment:

    Guido> Given all this I think we should keep it as you have committed
    Guido> and add it to the docs and whatsnew.

    I updated What's New in Python 3.3 document. I also wrote an email to python-dev to notify all developers of this change. Let's close the issue.

    The tests still need to fixed

    test_mutating_lookuptest doesn't fail on the unmodified version of cpython.
    Lib/test/crashers/nasty_eq_vs_dict.py has been removed, but
    no equivalent test has been added to Lib/test/test_dict.py

    There should be at least one failing test for this issue.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 9, 2012

    New changeset 70fbb02d588c by Victor Stinner in branch 'default':
    Issue bpo-14205: Fix test_dict.test_mutating_lookup()
    http://hg.python.org/cpython/rev/70fbb02d588c

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Mar 26, 2012

    @gvanrossum
    Copy link
    Member

    Was the crasher ever converted into a unittest? I think that should be done regardless of the outcome of the ongoing discussion about this issue.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 13, 2012

    New changeset 93748e2d64e3 by Antoine Pitrou in branch 'default':
    Issue bpo-14417: Mutating a dict during lookup now restarts the lookup instead of raising a RuntimeError (undoes issue bpo-14205).
    http://hg.python.org/cpython/rev/93748e2d64e3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants