Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a more efficient way to perform dict.get(key, default) #82459

Closed
ben-spiller mannequin opened this issue Sep 25, 2019 · 9 comments
Closed

Need a more efficient way to perform dict.get(key, default) #82459

ben-spiller mannequin opened this issue Sep 25, 2019 · 9 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@ben-spiller
Copy link
Mannequin

ben-spiller mannequin commented Sep 25, 2019

BPO 38278
Nosy @gvanrossum, @vstinner, @methane, @markshannon, @serhiy-storchaka, @ben-spiller, @tahia-khan, @iritkatriel

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-09-25.14:35:17.224>
labels = ['interpreter-core', 'type-feature', '3.7']
title = 'Need a more efficient way to perform dict.get(key, default)'
updated_at = <Date 2021-07-14.02:31:58.867>
user = 'https://github.com/ben-spiller'

bugs.python.org fields:

activity = <Date 2021-07-14.02:31:58.867>
actor = 'methane'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2019-09-25.14:35:17.224>
creator = 'benspiller'
dependencies = []
files = []
hgrepos = []
issue_num = 38278
keywords = []
message_count = 8.0
messages = ['353206', '353207', '353208', '353209', '353210', '353211', '353333', '397450']
nosy_count = 8.0
nosy_names = ['gvanrossum', 'vstinner', 'methane', 'Mark.Shannon', 'serhiy.storchaka', 'benspiller', 'ta1hia', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue38278'
versions = ['Python 3.7']

@ben-spiller
Copy link
Mannequin Author

ben-spiller mannequin commented Sep 25, 2019

In performance-critical python code, it's quite common to need to get an item from a dictionary, falling back on a default (e.g. None, 0 etc) if it doesn't yet exist. The obvious way to do this based on the documentation is to call the dict.get() method:

python -m timeit -s "d={'abc':123}" "x=d.get('abc',None)"
5000000 loops, best of 5: 74.6 nsec per loop

... however the performance of natural approach is very poor (2.2 times slower!) compared to the time really needed to look up the value:

python -m timeit -s "d={'abc':123}" "x=d['abc']"
5000000 loops, best of 5: 33 nsec per loop

There are various ways to do this more efficiently, but they all have significant performance or functional drawbacks:

custom dict subclass with __missing__() override: promising approach, but use of a custom class instead of dict seems to increase [] cost significantly:

python -m timeit -s "class mydict(dict):" -s " def __missing__(self,key):return None" -s "d = mydict({'abc':123})" "x=d['abc']"
5000000 loops, best of 5: 60.4 nsec per loop

get() with caching of function lookup - somewhat better but not great:

python -m timeit -s "d={'abc':123}; G=d.get" "x=G('abc',None)"
5000000 loops, best of 5: 59.8 nsec per loop

[] and "in" (inevitably a bit slow due to needing to do the lookup twice):

python -m timeit -s "d={'abc':123}" "x=d['abc'] if 'abc' in d else None"
5000000 loops, best of 5: 53.9 nsec per loop

try/except approach: quickest solution if it exists, but clunky syntax, and VERY slow if it doesn't exist

python -m timeit -s "d={'abc':123}" "try:" " x=d['abc']" "except KeyError: pass"
5000000 loops, best of 5: 41.8 nsec per loop
python -m timeit -s "d={'abc':123}" "try:" " x=d['XXX']" "except KeyError: pass"
1000000 loops, best of 5: 174 nsec per loop

collections.defaultdict: reasonable performance if key exists, but unwanted behaviour of adding the key if missing (which if used with an unbounded universe of keys could produce a memory leak):

python -m timeit -s "import collections; d=collections.defaultdict(lambda: None); d['abc']=123; " "x=d['XXX']"
5000000 loops, best of 5: 34.3 nsec per loop

I bet we can do better!

Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest?

@ben-spiller ben-spiller mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Sep 25, 2019
@vstinner
Copy link
Member

dict.get() is a method call wheras "key in dict" and "dict[key]" are operators. Method calls are still slower than operators, even after past optimizations. For example, when dict.get was converted to METH_FASTCALL, it was around 20 ns faster:
https://vstinner.github.io/fastcall-microbenchmarks.html

See also closed bpo-17170 "string method lookup is too slow".

@vstinner
Copy link
Member

python -m timeit -s "import collections; d=collections.defaultdict(lambda: None); d['abc']=123; " "x=d['XXX']"
5000000 loops, best of 5: 34.3 nsec per loop

This benchmark is not a fair comparison: the 'XXX' key is created at the first access. In short, this benchmark measure the performance of a dict lookup:

python -m timeit -s "d={'abc':123}" "x=d['abc']"
5000000 loops, best of 5: 33 nsec per loop

@vstinner
Copy link
Member

Lots of solutions are possible - maybe some kind of peephole optimization to make dict.get() itself perform similarly to the [] operator, or if that's challenging perhaps providing a class or option that behaves like defaultdict but without the auto-adding behaviour and with comparable [] performance to the "dict" type - for example dict.raiseExceptionOnMissing=False, or perhaps even some kind of new syntax (e.g. dict['key', default=None]). Which option would be easiest/nicest?

This issue doesn't propose any concrete solution, but discuss ideas. I suggest you to open a thread on the python-ideas mailing list instead. I suggest to close this issue.

I bet that defaultdict is *not* faster once you will manage to write a fair micro-benchmark.

@ben-spiller
Copy link
Mannequin Author

ben-spiller mannequin commented Sep 25, 2019

Thanks... yep I realise method calls are slower than operators, am hoping we can still find a cunning way to speed up this use case nonetheless. :D For example by having a configuration option on dict (or making a new subclass) that gives the (speedy!) [] operator the same no-exception semantics you'd get from calling get(). As you can see from my timeit benchmarks none of the current workarounds are very appealing for this use case, and a 2.2x slowdown for this common operation is a shame.

@vstinner
Copy link
Member

I also suggest you to not focus on such micro-benchmarks. It's not relevant to make a whole application faster. Use PyPy if you care about performances. PyPy has a very efficient implementation for dictionary and it's JIT compiler can go way further than CPython. In some cases, PyPy can even replace hash table lookup with array access:
https://twitter.com/cfbolz/status/1175493837516627968

@serhiy-storchaka
Copy link
Member

Was LOAD_METHOD optimized for builtin methods?

@iritkatriel
Copy link
Member

Was LOAD_METHOD optimized for builtin methods?

Maybe this can be done with specialization.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@iritkatriel
Copy link
Member

iritkatriel commented Jun 23, 2022

The differences are much less dramatic now (3.12):

% ./python.exe -m timeit -s "d={'abc':123}" "x=d.get('abc',None)"
10000000 loops, best of 5: 28.7 nsec per loop
% ./python.exe -m timeit -s "d={'abc':123}" "x=d['abc']"
10000000 loops, best of 5: 20.2 nsec per loop
% ./python.exe -m timeit -s "d={'abc':123}; G=d.get" "x=G('abc',None)"
10000000 loops, best of 5: 25.8 nsec per loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants