-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dict: Use smaller entry for Unicode-key only dict. #91001
Comments
Currently, PyDictKeyEntry is 24bytes (hash, key, and value). We can drop the hash from entry when all keys are unicode, because unicode objects caches hash already. This will cause some performance regression on microbenchmark because dict need one more indirect access to compare hash value. On the other hand, this will reduce some RAM usage. Additionally, unlike docstrings and annotations, this includes much **hot** RAM. It will make Python more cache efficient. This is work in progress code: methane#43 |
CPython, at least, allows users to insert non-string keys in namespace dicts that are conceptually string-key only. >>> globals()[0] = 'zero'
>>> globals()[0]
'zero'
>>> vars()
{'__name__': '__main__', ..., 0: 'zero'}
[This is for consenting adults only, as it prevents sorting keys and string-only operations on keys.
>>> dir()
...
TypeError: '<' not supported between instances of 'int' and 'str']
Do you propose to
1. Only use StringKeyDicts when non-string keys are not possible? (Where would this be?)
2. Switch to a normal dict when a non-string key is added? (But likely not switch back when the last non-string key is removed.)
3. Deprecate and remove the option to add non-string keys to namespace dicts? (Proposed and rejected at least once as not gaining much.) |
|
In most case, first PyDict_SetItem decides which format should be used. But _PyDict_NewPresized() can be a problem. It creates a hash table before inserting the first key, when 5 < (expected size) < 87382. In CPython code base, _PyDict_NewPresized() is called from three places:
Current pull request assumes the dict keys are unicode-only key. So building dict from non-Unicode keys become slower.
There are some approaches to fix this problem:
I think this performance regression is acceptable level.
// Create a new dict from keys and values. |
I added _PyDict_FromItems() to the PR.
Overhead of checking keys types is not so large. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: