New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a more efficient way to perform dict.get(key, default) #82459
Comments
In performance-critical python code, it's quite common to need to get an item from a dictionary, falling back on a default (e.g. None, 0 etc) if it doesn't yet exist. The obvious way to do this based on the documentation is to call the dict.get() method:
... however the performance of natural approach is very poor (2.2 times slower!) compared to the time really needed to look up the value:
There are various ways to do this more efficiently, but they all have significant performance or functional drawbacks: custom dict subclass with __missing__() override: promising approach, but use of a custom class instead of dict seems to increase [] cost significantly:
get() with caching of function lookup - somewhat better but not great:
[] and "in" (inevitably a bit slow due to needing to do the lookup twice):
try/except approach: quickest solution if it exists, but clunky syntax, and VERY slow if it doesn't exist
collections.defaultdict: reasonable performance if key exists, but unwanted behaviour of adding the key if missing (which if used with an unbounded universe of keys could produce a memory leak):
I bet we can do better! Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest? |
dict.get() is a method call wheras "key in dict" and "dict[key]" are operators. Method calls are still slower than operators, even after past optimizations. For example, when dict.get was converted to METH_FASTCALL, it was around 20 ns faster: See also closed bpo-17170 "string method lookup is too slow". |
This benchmark is not a fair comparison: the 'XXX' key is created at the first access. In short, this benchmark measure the performance of a dict lookup:
|
This issue doesn't propose any concrete solution, but discuss ideas. I suggest you to open a thread on the python-ideas mailing list instead. I suggest to close this issue. I bet that defaultdict is *not* faster once you will manage to write a fair micro-benchmark. |
Thanks... yep I realise method calls are slower than operators, am hoping we can still find a cunning way to speed up this use case nonetheless. :D For example by having a configuration option on dict (or making a new subclass) that gives the (speedy!) [] operator the same no-exception semantics you'd get from calling get(). As you can see from my timeit benchmarks none of the current workarounds are very appealing for this use case, and a 2.2x slowdown for this common operation is a shame. |
I also suggest you to not focus on such micro-benchmarks. It's not relevant to make a whole application faster. Use PyPy if you care about performances. PyPy has a very efficient implementation for dictionary and it's JIT compiler can go way further than CPython. In some cases, PyPy can even replace hash table lookup with array access: |
Was LOAD_METHOD optimized for builtin methods? |
Maybe this can be done with specialization. |
The differences are much less dramatic now (3.12):
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: