Need a more efficient way to perform dict.get(key, default) #82459

ben-spiller · 2019-09-25T14:35:17Z

BPO	38278
Nosy	@gvanrossum, @vstinner, @methane, @markshannon, @serhiy-storchaka, @ben-spiller, @tahia-khan, @iritkatriel

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-09-25.14:35:17.224>
labels = ['interpreter-core', 'type-feature', '3.7']
title = 'Need a more efficient way to perform dict.get(key, default)'
updated_at = <Date 2021-07-14.02:31:58.867>
user = 'https://github.com/ben-spiller'

bugs.python.org fields:

activity = <Date 2021-07-14.02:31:58.867>
actor = 'methane'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2019-09-25.14:35:17.224>
creator = 'benspiller'
dependencies = []
files = []
hgrepos = []
issue_num = 38278
keywords = []
message_count = 8.0
messages = ['353206', '353207', '353208', '353209', '353210', '353211', '353333', '397450']
nosy_count = 8.0
nosy_names = ['gvanrossum', 'vstinner', 'methane', 'Mark.Shannon', 'serhiy.storchaka', 'benspiller', 'ta1hia', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue38278'
versions = ['Python 3.7']

ben-spiller · 2019-09-25T14:35:16Z

In performance-critical python code, it's quite common to need to get an item from a dictionary, falling back on a default (e.g. None, 0 etc) if it doesn't yet exist. The obvious way to do this based on the documentation is to call the dict.get() method:

python -m timeit -s "d={'abc':123}" "x=d.get('abc',None)"
5000000 loops, best of 5: 74.6 nsec per loop

... however the performance of natural approach is very poor (2.2 times slower!) compared to the time really needed to look up the value:

python -m timeit -s "d={'abc':123}" "x=d['abc']"
5000000 loops, best of 5: 33 nsec per loop

There are various ways to do this more efficiently, but they all have significant performance or functional drawbacks:

custom dict subclass with __missing__() override: promising approach, but use of a custom class instead of dict seems to increase [] cost significantly:

python -m timeit -s "class mydict(dict):" -s " def __missing__(self,key):return None" -s "d = mydict({'abc':123})" "x=d['abc']"
5000000 loops, best of 5: 60.4 nsec per loop

get() with caching of function lookup - somewhat better but not great:

python -m timeit -s "d={'abc':123}; G=d.get" "x=G('abc',None)"
5000000 loops, best of 5: 59.8 nsec per loop

[] and "in" (inevitably a bit slow due to needing to do the lookup twice):

python -m timeit -s "d={'abc':123}" "x=d['abc'] if 'abc' in d else None"
5000000 loops, best of 5: 53.9 nsec per loop

try/except approach: quickest solution if it exists, but clunky syntax, and VERY slow if it doesn't exist

python -m timeit -s "d={'abc':123}" "try:" " x=d['abc']" "except KeyError: pass"
5000000 loops, best of 5: 41.8 nsec per loop
python -m timeit -s "d={'abc':123}" "try:" " x=d['XXX']" "except KeyError: pass"
1000000 loops, best of 5: 174 nsec per loop

collections.defaultdict: reasonable performance if key exists, but unwanted behaviour of adding the key if missing (which if used with an unbounded universe of keys could produce a memory leak):

python -m timeit -s "import collections; d=collections.defaultdict(lambda: None); d['abc']=123; " "x=d['XXX']"
5000000 loops, best of 5: 34.3 nsec per loop

I bet we can do better!

Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest?

vstinner · 2019-09-25T14:46:22Z

dict.get() is a method call wheras "key in dict" and "dict[key]" are operators. Method calls are still slower than operators, even after past optimizations. For example, when dict.get was converted to METH_FASTCALL, it was around 20 ns faster:
https://vstinner.github.io/fastcall-microbenchmarks.html

See also closed bpo-17170 "string method lookup is too slow".

vstinner · 2019-09-25T14:49:32Z

python -m timeit -s "import collections; d=collections.defaultdict(lambda: None); d['abc']=123; " "x=d['XXX']"
5000000 loops, best of 5: 34.3 nsec per loop

This benchmark is not a fair comparison: the 'XXX' key is created at the first access. In short, this benchmark measure the performance of a dict lookup:

python -m timeit -s "d={'abc':123}" "x=d['abc']"
5000000 loops, best of 5: 33 nsec per loop

vstinner · 2019-09-25T14:52:10Z

Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest?

This issue doesn't propose any concrete solution, but discuss ideas. I suggest you to open a thread on the python-ideas mailing list instead. I suggest to close this issue.

I bet that defaultdict is *not* faster once you will manage to write a fair micro-benchmark.

ben-spiller · 2019-09-25T14:52:30Z

Thanks... yep I realise method calls are slower than operators, am hoping we can still find a cunning way to speed up this use case nonetheless. :D For example by having a configuration option on dict (or making a new subclass) that gives the (speedy!) [] operator the same no-exception semantics you'd get from calling get(). As you can see from my timeit benchmarks none of the current workarounds are very appealing for this use case, and a 2.2x slowdown for this common operation is a shame.

vstinner · 2019-09-25T14:56:19Z

I also suggest you to not focus on such micro-benchmarks. It's not relevant to make a whole application faster. Use PyPy if you care about performances. PyPy has a very efficient implementation for dictionary and it's JIT compiler can go way further than CPython. In some cases, PyPy can even replace hash table lookup with array access:
https://twitter.com/cfbolz/status/1175493837516627968

serhiy-storchaka · 2019-09-26T19:06:33Z

Was LOAD_METHOD optimized for builtin methods?

iritkatriel · 2021-07-13T22:56:44Z

Was LOAD_METHOD optimized for builtin methods?

Maybe this can be done with specialization.

iritkatriel · 2022-06-23T09:26:20Z

The differences are much less dramatic now (3.12):

% ./python.exe -m timeit -s "d={'abc':123}" "x=d.get('abc',None)"
10000000 loops, best of 5: 28.7 nsec per loop
% ./python.exe -m timeit -s "d={'abc':123}" "x=d['abc']"
10000000 loops, best of 5: 20.2 nsec per loop
% ./python.exe -m timeit -s "d={'abc':123}; G=d.get" "x=G('abc',None)"
10000000 loops, best of 5: 25.8 nsec per loop

ben-spiller mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Sep 25, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

serhiy-storchaka closed this as completed Jun 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a more efficient way to perform dict.get(key, default) #82459

Need a more efficient way to perform dict.get(key, default) #82459

ben-spiller mannequin commented Sep 25, 2019

ben-spiller mannequin commented Sep 25, 2019

vstinner commented Sep 25, 2019

vstinner commented Sep 25, 2019

vstinner commented Sep 25, 2019

ben-spiller mannequin commented Sep 25, 2019

vstinner commented Sep 25, 2019

serhiy-storchaka commented Sep 26, 2019

iritkatriel commented Jul 13, 2021

iritkatriel commented Jun 23, 2022 •

edited

Need a more efficient way to perform dict.get(key, default) #82459

Need a more efficient way to perform dict.get(key, default) #82459

Comments

ben-spiller mannequin commented Sep 25, 2019

ben-spiller mannequin commented Sep 25, 2019

vstinner commented Sep 25, 2019

vstinner commented Sep 25, 2019

vstinner commented Sep 25, 2019

ben-spiller mannequin commented Sep 25, 2019

vstinner commented Sep 25, 2019

serhiy-storchaka commented Sep 26, 2019

iritkatriel commented Jul 13, 2021

iritkatriel commented Jun 23, 2022 • edited

iritkatriel commented Jun 23, 2022 •

edited