Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeakValueDictionary does not handle unhashable keys correctly #15956

Closed
saraedum opened this issue Mar 16, 2014 · 19 comments
Closed

WeakValueDictionary does not handle unhashable keys correctly #15956

saraedum opened this issue Mar 16, 2014 · 19 comments

Comments

@saraedum
Copy link
Member

The WeakValueDictionary behaves differently from regular dicts for unhashable keys:

sage: {}[[]]
TypeError: unhashable type: 'list'
sage: sage.misc.weak_dict.WeakValueDictionary()[[]]
KeyError: []

This is because http://docs.python.org/2/c-api/dict.html#PyDict_GetItem does not throw a TypeError.

CC: @nbruin @simon-king-jena

Component: misc

Keywords: hash

Author: Julian Rueth, Nils Bruin

Branch/Commit: 52e02b1

Reviewer: Nils Bruin, Julian Rueth

Issue created by migration from https://trac.sagemath.org/ticket/15956

@saraedum saraedum added this to the sage-6.2 milestone Mar 16, 2014
@nbruin
Copy link
Contributor

nbruin commented Mar 16, 2014

comment:1

Hm, unfortunate situation. In python 3 we'd be able to use PyDict_GetItemWithError:
http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1109
I'd be tempted to let this sit for now. We'd basically have to write our own.

@saraedum
Copy link
Member Author

comment:2

You are right. It is annoying to fix this in python2. I ran into it while working on #11895 (disable hashing of padics) btw.

@saraedum
Copy link
Member Author

Branch: u/saraedum/ticket/15956

@nbruin
Copy link
Contributor

nbruin commented Mar 20, 2014

comment:5

The proposed fix is very elegant and minimal, but it does noticeably affect a code path that deserves to be very fast: testing that a key does not occur in the dictionary. Timings:

%cpaste
cython("""
def test(N,D):
    cdef long n=N
    for i in range(n):
        1 in D
""")
--
sage: D=sage.misc.weak_dict.WeakValueDictionary()
sage: D[2]=ZZ
sage: timeit("test(10^7,D)")
5 loops, best of 3: 122 ms per loop
//with the extra hash call:
5 loops, best of 3: 151 ms per loop
sage: D[1]=ZZ
sage: timeit("test(10^7,D)")
5 loops, best of 3: 466 ms per loop
//this of course times the same with the extra hash call

Compared with a normal dictionary:

sage: N={}
sage: N[2]=ZZ
sage: timeit("test(10^7,N)")
5 loops, best of 3: 90.6 ms per loop
sage: N[1]=ZZ
sage: timeit("test(10^7,N)")
5 loops, best of 3: 377 ms per loop

The change from 122ms to 151ms is an increase of 25% in time taken (and this is for integers, which hash extremely quickly. I've also tried it with a string key '1', where I got 101ms vs. 127ms, so a penalty of around 30ms/10^7 seems to be expected).

Of course, it's nice if sage.misc.weak_dict.WeakValueDictionary is a perfect drop-in replacement for weakref.WeakValueDictionary, but having better performance at the expense of different edge-case error reporting (and it's just reporting "key not here" in cases where that's actually true, and where some other dictionaries raise "key not hashable", so it's hard to think of an edge case where that seriously affects legitimate program logic) shouldn't just be dismissed.

We could of course patch python and backport PyDict_GetItemWithError and get a proper fix, without performance degradation.

For the current proposal, I'd like to see some real-world motivation to bring the error reporting in line with other dictionaries before I'd consider significantly degrading performance.

A further problem with the current proposal: it only fixes one symptom. The nastier example http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1046 (stack overflow during key comparison) isn't addressed by verifying that the key hashes. Going all the way with the PyDict_GetItemWithError solution (or equivalent) would also solve that.


New commits:

3a05a1aImproved handling of unhashable keys in weak dictionary

@nbruin
Copy link
Contributor

nbruin commented Mar 20, 2014

Commit: 3a05a1a

@nbruin
Copy link
Contributor

nbruin commented Mar 20, 2014

Changed branch from u/saraedum/ticket/15956 to u/nbruin/ticket/15956

@nbruin
Copy link
Contributor

nbruin commented Mar 20, 2014

comment:7

Turns out backporting PyDict_GetItemWithError isn't nearly as bad as it seems in cython. We are relying on details about the PyDict implementation that are not officially part of the CAPI, but they do occur in Python.h. We're already doing worse here by digging up the non-exported Dummy value for dict tables.

The current implementation is actually a little faster than PyDict_GetItem (because we don't need to clear errors, probably). We could unwrap the use of these even further into the various places where PyDict_GetItemWithError is used, but for now this should be good enough.


New commits:

c60d3c6trac #15956: Provide PyDict_GetItemWithError and use it to obtain appropriate errors when looking up weakdict entries

@nbruin
Copy link
Contributor

nbruin commented Mar 20, 2014

Changed commit from 3a05a1a to c60d3c6

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 21, 2014

Changed commit from c60d3c6 to 109c161

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 21, 2014

Branch pushed to git repo; I updated commit sha1. New commits:

109c161move a comment to its proper place

@nbruin
Copy link
Contributor

nbruin commented Mar 26, 2014

comment:9

PING I'm happy with Julian's doctests. The current use of PyDict_GetItemWithError actually gives slightly better performance than we had before AND we get fully compliant error reporting. So I'd be happy with the current solution. If Julian can give my change a positive review, we can get this tiny, unimportant improvement merged. Please complete the reviewer field when you're done.

@nbruin
Copy link
Contributor

nbruin commented Mar 26, 2014

Reviewer: Nils Bruin, ???

@nbruin
Copy link
Contributor

nbruin commented Mar 26, 2014

Author: Julian, Nils Bruin

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 27, 2014

Changed commit from 109c161 to 52e02b1

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 27, 2014

Branch pushed to git repo; I updated commit sha1. New commits:

52e02b1trac #15956: improve error detection by adding "except ..." to some type declarations and inline some things in "__contains__" to obtain better performance than standard dict on non-occurring keys

@saraedum
Copy link
Member Author

saraedum commented Apr 2, 2014

comment:11

Thanks for taking care of this :)

@saraedum
Copy link
Member Author

saraedum commented Apr 2, 2014

Changed author from Julian, Nils Bruin to Julian Rueth, Nils Bruin

@saraedum
Copy link
Member Author

saraedum commented Apr 2, 2014

Changed reviewer from Nils Bruin, ??? to Nils Bruin, Julian Rueth

@vbraun
Copy link
Member

vbraun commented Apr 4, 2014

Changed branch from u/nbruin/ticket/15956 to 52e02b1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants