Raise an error if a dict is modified during a lookup #58413

vstinner · 2012-03-05T21:19:55Z

BPO	14205
Nosy	@gvanrossum, @rhettinger, @vstinner, @markshannon, @JimJJewett
Files	dict_lookup.patch nomodify.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-03-09.13:05:00.568>
created_at = <Date 2012-03-05.21:19:54.963>
labels = ['interpreter-core']
title = 'Raise an error if a dict is modified during a lookup'
updated_at = <Date 2012-05-13.18:50:18.220>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2012-05-13.18:50:18.220>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2012-03-09.13:05:00.568>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2012-03-05.21:19:54.963>
creator = 'vstinner'
dependencies = []
files = ['24740', '24741']
hgrepos = []
issue_num = 14205
keywords = ['patch']
message_count = 29.0
messages = ['154976', '154977', '154989', '154991', '154993', '154994', '155021', '155022', '155023', '155024', '155026', '155027', '155029', '155037', '155147', '155194', '155196', '155197', '155199', '155205', '155214', '155219', '155230', '155231', '155232', '155266', '156858', '157334', '160543']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'rhettinger', 'vstinner', 'Mark.Shannon', 'python-dev', 'Jim.Jewett']
pr_nums = []
priority = 'low'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue14205'
versions = ['Python 3.3']

vstinner · 2012-03-05T21:19:55Z

Lib/test/crashers/nasty_eq_vs_dict.py does crash Python because of an
infinite loop in the dictionary lookup. The script modifies the
dictionary at each lookup, whereas Python tries a new lookup each time
that the dictionary is modified.

I proposed to make the lookup fail with a RuntimeError if the
dictionary has been modified during a lookup. It should not occur if
you are not doing something evil.

vstinner · 2012-03-05T21:20:58Z

Another possibility is to avoid the modification of a dictionary
during a lookup: nomodify.patch.

markshannon · 2012-03-05T23:28:56Z

I much prefer dict_lookup.patch to nomodify.patch.
It doesn't increase the memory use of dict. One extra word per dict could be a lot of memory for a large application.

There is a very subtle semantic change, but I think it is a positive one.
Raising a runtimne seesm sensible as the dict iterators already raise a RuntimeError if the size of the dict changes.

vstinner · 2012-03-05T23:43:13Z

I much prefer dict_lookup.patch to nomodify.patch.
It doesn't increase the memory use of dict. One extra word
per dict could be a lot of memory for a large application.

nomodify.patch is the correct fix, but I agree that dict_lookup.patch is better (and sufficient) in practive.

Raising a runtimne seesm sensible as the dict iterators already
raise a RuntimeError if the size of the dict changes.

Yes, that's how I chose the exception.

>>> d={k: str(k) for k in range(10)}
>>> for k in d:
...  del d[k]
... 
RuntimeError: dictionary changed size during iteration

>>> d={k for k in range(10)}
>>> for k in d:
...  d.remove(k)
... 
RuntimeError: Set changed size during iteration

>>> d=list(range(10))
>>> def keyfunc(x):
...  d.append(1)
...  return x
... 
>>> d.sort(key=keyfunc)
ValueError: list modified during sort

gvanrossum · 2012-03-05T23:50:47Z

Yeah, dict_lookup.patch seems fine.

python-dev · 2012-03-06T00:13:03Z

New changeset 934aaf2191d0 by Victor Stinner in branch 'default':
Close bpo-14205: dict lookup raises a RuntimeError if the dict is modified during
http://hg.python.org/cpython/rev/934aaf2191d0

jimjjewett · 2012-03-06T16:35:37Z

Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.

Would it be worth adding a counter to lookdict, so that one modification is OK, but 5 aren't?

vstinner · 2012-03-06T16:54:11Z

Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

The issue was triggered without threads.If the __eq__ method of the
objects used for keys use C functions releasing the GIL, you may
trigger the issue.

Would it be worth adding a counter to lookdict, so that one modification is OK, but 5 aren't?

If you implement a special type modifying the dict that contains the
object, you should implement your own retry function on lookup failure
(catch RuntimeError).

Honestly, if you get RuntimeError, you should change your program, not
retry the lookup!

markshannon · 2012-03-06T16:56:29Z

Jim Jewett wrote:

Jim Jewett <jimjjewett@gmail.com> added the comment:

Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

So, they are writing to a dict in one thread while reading from the same
dict in another thread, without any external locks and with keys written
in Python.

I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.

I suspect, they are already seeing sporadic failures.
I think raising an exception is better than weird failures.

We should document the new behaviour, as it is a change in semantics.

vstinner · 2012-03-06T16:59:20Z

If the __eq__ method of the
objects used for keys use C functions releasing the GIL, you may
trigger the issue.

Oh, I mean: trigger the issue with threads. I hope that your objects
don't call C functions like open() in their __eq__() method!

gvanrossum · 2012-03-06T17:21:44Z

On Tue, Mar 6, 2012 at 8:56 AM, Mark Shannon <report@bugs.python.org> wrote:

Mark Shannon <mark@hotpy.org> added the comment:

Jim Jewett wrote:
> Jim Jewett <jimjjewett@gmail.com> added the comment:
>
> Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch?

So, they are writing to a dict in one thread while reading from the same
dict in another thread, without any external locks and with keys written
in Python.
>
> I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against.
>
I suspect, they are already seeing sporadic failures.
I think raising an exception is better than weird failures.

We should document the new behaviour, as it is a change in semantics.

+1 on everything you said.

jimjjewett · 2012-03-06T17:43:10Z

On Tue, Mar 6, 2012 at 11:56 AM, Mark Shannon wrote:

Jim Jewett:
> Can't this be triggered by non-malicious code that just happened
> to have a python comparison and get hit with a thread switch?

So, they are writing to a dict in one thread while reading from the
same dict in another thread, without any external locks and with
keys written in Python.

Correct. For example, it could be a configuration manager, or a
cache, or even a worklist. If they're just adding new keys, or even
deleting other (==> NOT the one being looked up) keys, why should that
keep them from finding the existing, unchanged keys?

> I'm not sure how often it happens, but today it would not be visible
> to the user; after the patch, users will see a sporadic failure that
> they can't easily defend against.

I suspect, they are already seeing sporadic failures.

How?

The chain terminates as soon as the dict doesn't resize; without
malicious intent, the odds of hitting several resizes in a row are so
miniscule that it probably hasn't even slowed them down.

gvanrossum · 2012-03-06T18:05:09Z

On Tue, Mar 6, 2012 at 9:43 AM, Jim Jewett <report@bugs.python.org> wrote:
>
> Jim Jewett <jimjjewett@gmail.com> added the comment:
>
> On Tue, Mar 6, 2012 at 11:56 AM, Mark Shannon wrote:
>
>> Jim Jewett:
>>> Can't this be triggered by non-malicious code that just happened
>>> to have a python comparison and get hit with a thread switch?
>
>> So, they are writing to a dict in one thread while reading from the
>> same dict in another thread, without any external locks and with
>> keys written in Python.
>
> Correct.  For example, it could be a configuration manager, or a
> cache, or even a worklist.  If they're just adding new keys, or even
> deleting other (==> NOT the one being looked up) keys, why should that
> keep them from finding the existing, unchanged keys?

Use a lock or a built-in key class. I realize that neither is ideal,
but then, neither was the old situation.

>> I'm not sure how often it happens, but today it would not be visible
>> to the user; after the patch, users will see a sporadic failure that
>> they can't easily defend against.

> I suspect, they are already seeing sporadic failures.

How?

The chain terminates as soon as the dict doesn't resize; without
malicious intent, the odds of hitting several resizes in a row are so
miniscule that it probably hasn't even slowed them down.

Now I'm torn. If we'd have this RuntimeError from the start, would we
consider it a flaw in the dict implementation or a feature? The
RuntimeError when changing a dict's size while iterating over it is
definitely a feature (so as to allow the implementation to rehash
everything upon insert/delete). But this is not quite the same. Or is
it? On the one hand I think the scenario is pretty unlikely (mostly
because it involves a user-defined comparison); OTOH it would be quite
nasty to debug. Or would it? You do get a decent error message...

Note that Victor's alternate fix (nomodify.diff) has the same problem
-- it just raises RuntimeError in the mutating thread rather than in
the lookup thread.

jimjjewett · 2012-03-06T20:33:47Z

On Tue, Mar 6, 2012 at 1:05 PM, Guido van Rossum

Jim Jewett:

>... If they're just adding new keys, or even deleting other
> (==> NOT the one being looked up) keys, why should that
> keep them from finding the existing, unchanged keys?

> ... The chain terminates as soon as the dict doesn't resize; without
> malicious intent, the odds of hitting several resizes in a row are so
> miniscule that it probably hasn't even slowed them down.

Now I'm torn. If we'd have this RuntimeError from the start, would we
consider it a flaw in the dict implementation or a feature?

I would personally have considered it a flaw. Wrapping all dict
access just in case some other code added a weird key seems ugly and
inefficient.

RuntimeError when changing a dict's size while iterating over it is
definitely a feature (so as to allow the implementation to rehash
everything upon insert/delete).

In that case, you've explicitly asked for the whole set (or at least a
snapshot); the RuntimeError indicates that you can't get it. Maybe
there should be a way to say "I don't need the set to be perfect", but
there isn't, and you aren't getting what you actually did ask for, so
an Error is appropriate.

But in this case, you're asking for a specific key, and the key is
there. It may even be a perfectly normal string, that just happened
to hash-collide with something some other code inserted.

The RuntimeError exposes a race condition to user code -- and it may
not be the code that stuck the oddball class in the dict in the first
place.

Note that Victor's alternate fix (nomodify.diff) has the same problem
-- it just raises RuntimeError in the mutating thread rather than in
the lookup thread.

Right; I'm saying that raising an error is the wrong thing to do.

But there is an analogy to the hash-code change vs collision counting.
Breaking an application that got unlucky once is wrong. But if the
bad luck *continues* to the point where it would only occur one time
in a zillion, RuntimeError is better than a likely infinite loop.

(It really was a question initially, but I've now convinced at least myself.)

I would rather see something like replacing

349 else {
350 /* The compare did major nasty stuff to the
351 * dict: start over.
352 * XXX A clever adversary could prevent this
353 * XXX from terminating.
354 */
355 return lookdict(mp, key, hash);

with

349 else {
350 /* The compare did major nasty stuff to the
351 * dict: start over.
352 * XXX A clever adversary could prevent this
353 * XXX from terminating.
354 */
355 return lookdict_paranoid(mp, key, hash, 1);

where lookdict_paranoid is a near-copy of lookdict that adds a
paranoia parameter and replaces those same lines with

            else {
                /* The compare did major nasty stuff *again*. */
                if (paranoia < PARANOIA_LEVEL) {
                    return lookdict_paranoid(mp, key, hash, paranoia+1);
                }
                /* Something is wrong; raise a RuntimeError. */
            }

python-dev · 2012-03-08T01:50:33Z

New changeset 0255bafbccf2 by Victor Stinner in branch 'default':
Issue bpo-14205: document the change of dict[key] behaviour if dict is modified
http://hg.python.org/cpython/rev/0255bafbccf2

vstinner · 2012-03-09T00:43:10Z

Guido: So, what do you think? Do you like the new behaviour, or would you prefer a softer solution like retrying N times? Python < 3.3 retries the lookup an unlimited number of times which lead to a crash, that's the issue fixed by my change.

gvanrossum · 2012-03-09T00:50:02Z

I'm still torn. Can you implement the counter without adding an extra field to the dict object? I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.

The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.

I'm prepared to make the current semantic change a 3.3 feature, but I'm not sure it's safe to backport.

vstinner · 2012-03-09T00:56:16Z

Can you implement the counter without adding an extra field to the dict object?

Add a counter requires to change the prototype of the C lookup function:
PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);

I don't know if it is a problem for ABI compatibility or extension
modules. I suppose that it is safer and simpler to not change it :-)

I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.

The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.

Is it really useful to only retry once? Retrying once means that it
would work in most cases, but not in some corner cases. For example,
retrying once may hide the problem if you have only 1 client (1
thread), but you get the exception when you have more clients (more
threads).

Or do I misunderstand the problem?

gvanrossum · 2012-03-09T01:01:47Z

On Thu, Mar 8, 2012 at 4:56 PM, STINNER Victor <report@bugs.python.org> wrote:

STINNER Victor <victor.stinner@gmail.com> added the comment:

> Can you implement the counter without adding an extra field to the dict object?

Add a counter requires to change the prototype of the C lookup function:
PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);

I don't know if it is a problem for ABI compatibility or extension
modules. I suppose that it is safer and simpler to not change it :-)

Agreed.

> I worry that we'll get the same objection we had when MAL proposed collision counting as a solution to the hash DoS attack.
>
> The numbers 0, 1 and infinity are special; all others are to be treated with skepticism.

Is it really useful to only retry once? Retrying once means that it
would work in most cases, but not in some corner cases. For example,
retrying once may hide the problem if you have only 1 client (1
thread), but you get the exception when you have more clients (more
threads).

Or do I misunderstand the problem?

Sorry, I was joking; anyway, 1 in this context would mean do the
lookup exactly once (so 0 would mean not to do it at all). One retry
would effectively mean try it twice, which would have to be treated
skeptically. :-)

Given all this I think we should keep it as you have committed and add
it to the docs and whatsnew.

jimjjewett · 2012-03-09T04:28:08Z

On Thu, Mar 8, 2012 at 7:43 PM, STINNER Victor wrote:

Python < 3.3 retries the lookup an unlimited number of times which
lead to a crash, that's the issue fixed by my change.

How does it lead to a crash? I think it just leads to an infinite
loop (which is worse) if the data is actually malicious, but
eventually fixes itself if the data is merely unlucky or even
ill-behaved.

Guido van Rossume:

> Can you implement the counter without adding an extra field to the dict object?

Yes. The logic does add complexity, but there is no per-dict charge,
and the extra logic can be isolated to a function that will normally
not be paged in.

Add a counter requires to change the prototype of the C lookup
function:
PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, Py_hash_t hash);

I don't know if it is a problem for ABI compatibility or extension
modules. I suppose that it is safer and simpler to not change it :-)

That should still be the signature for the primary lookup; it can do
its own delegation to a private function with an extra parameter
if/when needed.

The numbers 0, 1 and infinity are special; all others are to be
treated with skepticism.

Yes; it isn't clear whether to cut off after 1 retry or 3 or 10...

Is it really useful to only retry once? Retrying once means that it
would work in most cases, but not in some corner cases. For example,
retrying once may hide the problem if you have only 1 client (1
thread), but you get the exception when you have more clients (more
threads).

Without threads, there shouldn't be any problems for innocent code.
Even with threads, the number of resets is not (for innocent code)
tied to the number of alternative threads; only to the fact that they
can run mid-lookup.

-jJ

markshannon · 2012-03-09T09:00:09Z

Unmodified CPython (without the patch) already passes the new test in the patch.

The unmodified code already raises a Runtime error; a recursion limit exceeded error!

vstinner · 2012-03-09T09:44:08Z

Unmodified CPython (without the patch) already passes the new test in the patch.

You should try Lib/test/crashers/nasty_eq_vs_dict.py, not my test.

python-dev · 2012-03-09T13:04:09Z

New changeset a428f85de29c by Victor Stinner in branch 'default':
Issue bpo-14205: Document the dict lookup change in What's New in Python 3.3
http://hg.python.org/cpython/rev/a428f85de29c

vstinner · 2012-03-09T13:05:00Z

Guido> Given all this I think we should keep it as you have committed
Guido> and add it to the docs and whatsnew.

I updated What's New in Python 3.3 document. I also wrote an email to python-dev to notify all developers of this change. Let's close the issue.

markshannon · 2012-03-09T13:52:47Z

STINNER Victor wrote:

STINNER Victor <victor.stinner@gmail.com> added the comment:

Guido> Given all this I think we should keep it as you have committed
Guido> and add it to the docs and whatsnew.

I updated What's New in Python 3.3 document. I also wrote an email to python-dev to notify all developers of this change. Let's close the issue.

The tests still need to fixed

test_mutating_lookuptest doesn't fail on the unmodified version of cpython.
Lib/test/crashers/nasty_eq_vs_dict.py has been removed, but
no equivalent test has been added to Lib/test/test_dict.py

There should be at least one failing test for this issue.

python-dev · 2012-03-09T21:59:09Z

New changeset 70fbb02d588c by Victor Stinner in branch 'default':
Issue bpo-14205: Fix test_dict.test_mutating_lookup()
http://hg.python.org/cpython/rev/70fbb02d588c

jimjjewett · 2012-03-26T19:35:43Z

See http://bugs.python.org/issue14417

gvanrossum · 2012-04-01T23:26:21Z

Was the crasher ever converted into a unittest? I think that should be done regardless of the outcome of the ongoing discussion about this issue.

python-dev · 2012-05-13T18:50:18Z

New changeset 93748e2d64e3 by Antoine Pitrou in branch 'default':
Issue bpo-14417: Mutating a dict during lookup now restarts the lookup instead of raising a RuntimeError (undoes issue bpo-14205).
http://hg.python.org/cpython/rev/93748e2d64e3

vstinner added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 5, 2012

rhettinger self-assigned this Mar 5, 2012

python-dev mannequin closed this as completed Mar 6, 2012

vstinner reopened this Mar 6, 2012

rhettinger removed their assignment Mar 7, 2012

vstinner closed this as completed Mar 9, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise an error if a dict is modified during a lookup #58413

Raise an error if a dict is modified during a lookup #58413

vstinner commented Mar 5, 2012

vstinner commented Mar 5, 2012

vstinner commented Mar 5, 2012

markshannon commented Mar 5, 2012

vstinner commented Mar 5, 2012

gvanrossum commented Mar 5, 2012

python-dev mannequin commented Mar 6, 2012

jimjjewett mannequin commented Mar 6, 2012

vstinner commented Mar 6, 2012

markshannon commented Mar 6, 2012

vstinner commented Mar 6, 2012

gvanrossum commented Mar 6, 2012

jimjjewett mannequin commented Mar 6, 2012

gvanrossum commented Mar 6, 2012

jimjjewett mannequin commented Mar 6, 2012

python-dev mannequin commented Mar 8, 2012

vstinner commented Mar 9, 2012

gvanrossum commented Mar 9, 2012

vstinner commented Mar 9, 2012

gvanrossum commented Mar 9, 2012

jimjjewett mannequin commented Mar 9, 2012

markshannon commented Mar 9, 2012

vstinner commented Mar 9, 2012

python-dev mannequin commented Mar 9, 2012

vstinner commented Mar 9, 2012

markshannon commented Mar 9, 2012

python-dev mannequin commented Mar 9, 2012

jimjjewett mannequin commented Mar 26, 2012

gvanrossum commented Apr 1, 2012

python-dev mannequin commented May 13, 2012

Raise an error if a dict is modified during a lookup #58413

Raise an error if a dict is modified during a lookup #58413

Comments

vstinner commented Mar 5, 2012

vstinner commented Mar 5, 2012

vstinner commented Mar 5, 2012

markshannon commented Mar 5, 2012

vstinner commented Mar 5, 2012

gvanrossum commented Mar 5, 2012

python-dev mannequin commented Mar 6, 2012

jimjjewett mannequin commented Mar 6, 2012

vstinner commented Mar 6, 2012

markshannon commented Mar 6, 2012

vstinner commented Mar 6, 2012

gvanrossum commented Mar 6, 2012

jimjjewett mannequin commented Mar 6, 2012

gvanrossum commented Mar 6, 2012

jimjjewett mannequin commented Mar 6, 2012

python-dev mannequin commented Mar 8, 2012

vstinner commented Mar 9, 2012

gvanrossum commented Mar 9, 2012

vstinner commented Mar 9, 2012

gvanrossum commented Mar 9, 2012

jimjjewett mannequin commented Mar 9, 2012

markshannon commented Mar 9, 2012

vstinner commented Mar 9, 2012

python-dev mannequin commented Mar 9, 2012

vstinner commented Mar 9, 2012

markshannon commented Mar 9, 2012

python-dev mannequin commented Mar 9, 2012

jimjjewett mannequin commented Mar 26, 2012

gvanrossum commented Apr 1, 2012

python-dev mannequin commented May 13, 2012