-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OrderedDict.popitem()/__str__() raises KeyError #68855
Comments
While testing pywikibot using requests and urllib3 on Python 3.6 we got an interesting error: ====================================================================== Traceback (most recent call last):
File "/home/travis/build/xZise/pywikibot-core/tests/wikidataquery_tests.py", line 252, in testQueryApiGetter
data = w.query(q)
File "/home/travis/build/xZise/pywikibot-core/pywikibot/data/wikidataquery.py", line 601, in query
data = self.getDataFromHost(fullQueryString)
File "/home/travis/build/xZise/pywikibot-core/pywikibot/data/wikidataquery.py", line 563, in getDataFromHost
resp = http.fetch(url)
File "/home/travis/build/xZise/pywikibot-core/pywikibot/comms/http.py", line 359, in fetch
error_handling_callback(request)
File "/home/travis/build/xZise/pywikibot-core/pywikibot/comms/http.py", line 276, in error_handling_callback
raise request.data
File "/home/travis/build/xZise/pywikibot-core/pywikibot/comms/http.py", line 255, in _http_process
auth=auth, timeout=timeout, verify=True)
File "/home/travis/virtualenv/python3.6-dev/lib/python3.6/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/home/travis/virtualenv/python3.6-dev/lib/python3.6/site-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/home/travis/virtualenv/python3.6-dev/lib/python3.6/site-packages/requests/adapters.py", line 337, in send
conn = self.get_connection(request.url, proxies)
File "/home/travis/virtualenv/python3.6-dev/lib/python3.6/site-packages/requests/adapters.py", line 251, in get_connection
conn = self.poolmanager.connection_from_url(url)
File "/home/travis/virtualenv/python3.6-dev/lib/python3.6/site-packages/requests/packages/urllib3/poolmanager.py", line 139, in connection_from_url
return self.connection_from_host(u.host, port=u.port, scheme=u.scheme)
File "/home/travis/virtualenv/python3.6-dev/lib/python3.6/site-packages/requests/packages/urllib3/poolmanager.py", line 125, in connection_from_host
self.pools[pool_key] = pool
File "/home/travis/virtualenv/python3.6-dev/lib/python3.6/site-packages/requests/packages/urllib3/_collections.py", line 66, in __setitem__
_key, evicted_value = self._container.popitem(last=False)
KeyError: ('https', 'eu.wiktionary.org', 443) Now that doesn't make much sense, as OrderedDict.popitem() only returns a KeyError if it's empty but then not with that error message. Unfortunately I can't reproduce it without doing the complete pywikibot test suite. When I only execute the tests where it occurred I don't get any failures. So I modified the output in urllib3 and returned the key as well as the content before popitem(last=False) is called (I shortened the values): Key: ('https', 'bn.wikipedia.org', 443) As you can see it is not empty and the key in the KeyError is the first key in the OrderedDict. Also the key there is different from the key in the original exception I noticed so it's not a specific key that failed. I don't think versions before Python 3.6 are affected as we had tests running on Python 3.5 (before Travis switched to 3.6 recently) and these all worked. Also not all popitem() calls in that line fail. I'm using Python 3.6.0a0 (default:d6c91b8242d2, Jul 18 2015, 16:36:01). See also: urllib3/urllib3#680 and https://phabricator.wikimedia.org/T106212 |
Looking further into this issue, OrderedDict.pop() using the key returned from the KeyError (using eval(str(error))) also yields a KeyError. And OrderedDict.popitem() does not change the dictionary (so it's not like the KeyError is raised even though it worked). Also it appears to be empty actually. list(OrderedDict) returns an empty list even though str(OrderedDict) does not. So maybe some operations do only remove entries from one part of the data so that popitem() still thinks it's in the cache. Now I'm not familiar with urllib3 so I'm not sure how that internal OrderedDict is used to narrow down what might cause that issue. And additionally I tried to output keys(), items() and values() separately and suddenly I get a KeyError even when I just do str(): conn = self.poolmanager.connection_from_url(url)
File "/home/xzise/.pyenv/versions/3.6-dev/lib/python3.6/site-packages/requests/packages/urllib3/poolmanager.py", line 139, in connection_from_url
return self.connection_from_host(u.host, port=u.port, scheme=u.scheme)
File "/home/xzise/.pyenv/versions/3.6-dev/lib/python3.6/site-packages/requests/packages/urllib3/poolmanager.py", line 125, in connection_from_host
self.pools[pool_key] = pool
File "/home/xzise/.pyenv/versions/3.6-dev/lib/python3.6/site-packages/requests/packages/urllib3/_collections.py", line 66, in __setitem__
__before = str(self._container)
KeyError: ('https', 'ru.wikipedia.org', 443) |
Just to get this out of the way, are you running your tests against the latest beta (3)? There were some known bugs in earlier betas that have since been fixed. |
Correct me if I'm wrong but the travis-ci logs [1] seem to indicate it's using Python 3.6.0a0. [1] https://travis-ci.org/xZise/pywikibot-core/builds/71550286#L144 |
Well as I described in the opening post: "I don't think versions before Python 3.6 are affected as we had tests running on Python 3.5 (before Travis switched to 3.6 recently) and these all worked." Now tbh I don't know if a version of 3.5 is affected but was never tested by Travis. And if then the 3.6 branch was branched of that version. If that how it works. As I'm using pyenv I could easily test this on Python 3.5b3. By the way the last official test run on Pyhton 3.5 using Travis was using 3.5a4+: https://travis-ci.org/wikimedia/pywikibot-core/jobs/71443578#L144 |
Ah, sorry. I wasn't thinking past Python 3.5 (which is about to go to beta 4). While 3.6.0a0, doesn't tell us much, d6c91b8242d2 (r96935) does. That revision has all the necessary fixes to OrderedDict. I'll look into this some more. Do you get the failure consistently when running the test suite? Also, in case you weren't aware: Python 3.5 (hence 3.6) introduces a C implementation of OrderedDict. |
It is consistent as in it happens on every run of the test suite. But unfortunately I haven't checked if it's always happening at the same place. Luckily we have 4 builds on Travis with 3.6 and in all it happened from the beginning and got > 100 matches for “KeyError:”:
Maybe I can do additional analysis but I'm pretty sure for me locally I didn't get failures so soon. And no I wasn't aware about OrderedDict implemented in C. Now I haven't done tests in 3.5.0b3 (which seems to be the newest version of 3.5 available via pyenv at the moment) as it's relatively cumbersome and prevents me from doing any development on pywikibot at the same time. Anyway I still might do it and report whether I get the error too (unless OrderedDict is still implemented in Python in that version). |
Thanks for the extra info. I'm going to see if I can reproduce the issue by running the pywikibot test suite locally. What's the best way to set that up? Are there instructions somewhere? As to the C implementation, it was first released (as a special exception) in 3.5b2. |
Yes see the tests/README.rst. And afaik do you only need to have requests and six installed. |
Okay I did a test run on all three 3.5 betas available to me through pyenv. The beta 3 failed as Python 3.6 does that popitem() raises a KeyError. The beta 2 had the bug that popitem() does not support keyword arguments so I wasn't able to test it there. And the beta 1 works fine so it looks like the C implementation is the cause of this. |
I've thus far been unsuccessful in running the pywikibot test suite. I'm guessing there are some prerequisites (e.g. an account on some wiki site). Is there a way to run the tests without network access? Also, I ran into some trouble with i18n.__file__, but hacked my way around it. |
Oh sorry, I basically never need to install pywikibot anew so it's easy to forget but there is a submodule in scripts/i18n which needs to be cloned as well. With the following commands I could reproduce the error (and you don't even need to install requests and six): $ git clone --recursive git@github.com:wikimedia/pywikibot-core.git test
$ cd test/
$ echo "mylang='en'" >> user-config.py
$ echo "family='wikipedia'" >> user-config.py
$ python setup.py test With that you only need a network connection (as this error is in urllib3 I doubt it works without network connection) but you don't need any wiki account. If you want to test it with an account, execute “generate_user_files.py” before. Just as a note I've activated write and write failure tests but without an account these tests should not be run anyway. |
Just as a note to the tests: You may not get the issues with OrderedDict as a failure/error at the end of the test suite. And you may (depending on the version) get a few errors because NoUsername was raised. That is unrelated to this issue and can be fixed by using a version after 6255530f has been merged. |
That worked. I'll take a close look at what's going on as soon as I can. |
The attached test case raises a KeyError for __str__() |
@mark, note that you get the same behavior with the pure Python OrderedDict. Calling dict.* methods on an OrderedDict gives you undefined behavior. I expect the same is true for most subclasses of builtin types that override builtin methods. Anyway, the problem with using dict.* methods with OrderedDict is a separate issue from the bug with pop/popitem reported here. I've opened bpo-24721 to address the former. |
Regarding this bug, it's clear now that the ordered keys and the underlying dict are getting out of sync somewhere. This is either due to a bug in the C OrderedDict implementation or the use of the concrete dict C-API (or dict.* methods; thanks Mark). It's undoubtedly the former since the pure Python implementation did not have this problem. I'm hoping to have some time today to identify the source of the problem. |
It produces the same exception under Python 3.4, too. |
Really this bug and not the one mentioned by Mark? I've been using 3.4(.3) since we use requests (and therefore urllib3) which happened a few months ago. I also can't remember any failure on a Python 3.4 Travis build. |
Mark's test case file produced a KeyError under 3.4 when I tried it. |
Ah okay. If you look at Eric's response you'll see that this is a different issue and tracked in bpo-24721. I mean it is possible that urllib3 does it too (it's devs would need to answer that) but in the end highly unlikely as it doesn't happen on versions before 3.5b2. |
I've been able to pare down the tests that run to reproduce the bug. These 3 must run: archivebot Given how long the entire suite takes, this helps drastically improve my iteration time while debugging. At this point I should be able to more quickly identify where things go wrong. As soon as possible I'll post a simplified reproducer that does not rely on the pywikibot test suite. I'm not there yet, but getting closer. |
It's starting to look like this is threading-related. There's a single requests.Session defined as a module global in pywikibot.comms.http which is shared by all threads handling requests. requests.Session uses OrderedDict for an LRU cache for redirects. That's where we're running into trouble. So it seems that I missed a spot that is susceptible to issues with threading (not necessarily in pop/popitem). The catch is that it's not very racy. The failure is reproducing quite consistently. I'm going to continue investigating and constructing a simple reproducer. |
After some quick tests, it looks like during the tests only a single thread is used, so it is *not* threading-related. That makes a bit more sense to me given the consistency. Notwithstanding this development, I should be able to isolate the problem soon (when I can free up some time) given the information I have now. FWIW, I've attached a file containing source that I added to requests/packages/urllib3/_collections.py while investigating. In that same file I then set RecentlyUsedContainer.ContainerCls to _LoggingOrderedDict. |
Also, here is the output from running the pywikibot suite (using just the 3 test modules) with the logging odict wrapper. |
Getting closer. Here's a reproducer derived from the calls made while running the pywikibot test suite (3 tests only). Note that the KeyError indicates a different key for each run, even though the code is consistent. This means that order preservation is failing and implies that popitem is depending on the underlying dict rather than on the linked list. I'll look into that next. |
As expected (considering the link to dict ordering here), the inconsistent results are tied to hash randomization: $ for i in `seq 1 25`; do echo $i; PYTHONHASHSEED=$i ./python /tmp/odict_reproduce.py; done
1
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1794, in <module>
od.popitem(last=False)
KeyError: ('https', 'ml.wikipedia.org', 443)
2
3
4
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1737, in <module>
od.popitem(last=False)
KeyError: ('https', 'li.wikipedia.org', 443)
5
6
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1699, in <module>
od.popitem(last=False)
KeyError: ('https', 'ksh.wikipedia.org', 443)
7
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1889, in <module>
od.popitem(last=False)
KeyError: ('https', 'nds-nl.wikipedia.org', 443)
8
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1794, in <module>
od.popitem(last=False)
KeyError: ('https', 'ml.wikipedia.org', 443)
9
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1528, in <module>
od.popitem(last=False)
KeyError: ('https', 'ga.wikipedia.org', 443)
10
11
12
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1585, in <module>
od.popitem(last=False)
KeyError: ('https', 'hu.wikipedia.org', 443)
13
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1737, in <module>
od.popitem(last=False)
KeyError: ('https', 'li.wikipedia.org', 443)
14
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1775, in <module>
od.popitem(last=False)
KeyError: ('https', 'mg.wikipedia.org', 443)
15
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1965, in <module>
od.popitem(last=False)
KeyError: ('https', 'pdc.wikipedia.org', 443)
16
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1680, in <module>
od.popitem(last=False)
KeyError: ('https', 'it.wikipedia.org', 443)
17
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1452, in <module>
od.popitem(last=False)
KeyError: ('https', 'fa.wikipedia.org', 443)
18
19
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1965, in <module>
od.popitem(last=False)
KeyError: ('https', 'nn.wikipedia.org', 443)
20
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1642, in <module>
od.popitem(last=False)
KeyError: ('https', 'ilo.wikipedia.org', 443)
21
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1319, in <module>
od.popitem(last=False)
KeyError: ('https', 'da.wikipedia.org', 443)
22
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1661, in <module>
od.popitem(last=False)
KeyError: ('https', 'it.wikipedia.org', 443)
23
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1547, in <module>
od.popitem(last=False)
KeyError: ('https', 'gl.wikipedia.org', 443)
24
25
Traceback (most recent call last):
File "/tmp/odict_reproduce.py", line 1699, in <module>
od.popitem(last=False)
KeyError: ('https', 'ksh.wikipedia.org', 443) |
For the reproducer I'm sticking with a seed of 1: PYTHONHASHSEED=1 ./python /tmp/odict_reproduce.py |
I've verified that it is definitely the linked list that is getting updated incorrectly at the point that a key is popped off. The underlying dict is working fine. The erroneous behavior is happening with either pop, popitem, or __delitem__. However, it is likely in the common code used to remove a node from the linked list (e.g. _odict_clear_node). |
It's looking like the problem is in _odict_add_new_node. If that's the case then I should be able to resolve this issue soon. |
It turns out the problem was that the odict resize mechanism was not getting triggered in all the cases that it should have been. dict resizes after a certain number of insertions, whether or not previous deletions have cleared out space. odict only resizes its fast lookup table when it needs to do a fast node lookup and only when the current dict keys size does not match the current size of the odict fast lookup table. This bug is the consequence of a corner case which odict did not handle correctly. When you delete an item and then insert another, the resulting size of the dict keys is the same as the it was before the deletion. However, the insertion may have triggered a resize. This matters because when a resize happens the keys are re-inserted into the hash table. If it so happens that they key you just removed was in a slot that would have otherwise been occupied by another key (i.e. there was an earlier collision) then upon resizing that other key will be moved to its preferred slot. Here's a patch that changes odict to rely on the pointer value of dict's ma_keys (rather than on ma_keys.dk_size) to indicate the need for a resize. This works because dict creates a new keys struct every time it resizes. I'll point out that one alternative would be to track a "state" counter on dict that increments each time there's a resize. Then odict could check that rather than the ma_keys pointer value. |
Thank you for figuring it out and providing a patch after I couldn't really give good information. I haven't been able to test it for myself but your explanation makes sense :) |
@fabian, hey, thanks for bringing it to our attention! |
If I don't get any feedback before then, I'll commit the patch on Friday. |
Okay I think I've run the test suite on 3.6 although it only works with https://gerrit.wikimedia.org/r/#/c/229683/ now as inspect.getargspec has been removed (when I initially posted this, it was just deprecated). Well writing this I just thought I could've tested that on 3.5… well anyway I don't get any KeyErrors. |
That's good to know. Thanks. |
New changeset adb510322c8f by Eric Snow in branch '3.5': New changeset 47287c998bb0 by Eric Snow in branch 'default': |
I've pushed the fix for RC1. Thanks again Fabian for bringing it to our attention. |
FWIW I set up a test harness that runs test_collections and odict_reproduce with monotonically increasing PYTHONHASHSEED values. I let it run overnight; it's now past iteration 158600. Apart from some weirdness in the low 70ks that I can't reproduce it's been rock solid. So, looks good! |
Yeah, I stopped it at 300k iterations. No problems. Glad this fix got in for rc1. |
Thanks for the double-check, Larry. I'd done similar testing, though not |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: