-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact and ordered dict #71537
Comments
I've implemented compact ordered dictionary, introduced in PyPy blog 1. To finish my work, I really need core developer's help. Please see TODO comment See also: |
#if SIZEOF_VOID_P == 4
You can use PY_INT32_T for 32-bit signed integers, short for 16-bit signed integers (if SIZEOF_SHORT == 2) and signed char for 8-bit signed integers. Or introduce PY_INT16_T and PY_INT8_T similarly as it was done for PY_INT32_T. |
Thanks!
I found PY_INT32_T in pyport.h. It seems only available when stdint.h is available. |
When stdint.h is not available you can define PY_INT32_T externally. E.g. CFLAGS="-DPY_INT32_T=__int32". |
Make sense! I did it. |
As I posted to python-dev ML, this compact ordered dict doesn't keep insertion order >>> class A:
... ...
...
>>> a = A()
>>> b = A()
>>> a.a = 1
>>> a.b = 2
>>> b.b = 3
>>> b.a = 4
>>> a.__dict__.items()
dict_items([('a', 1), ('b', 2)])
>>> b.__dict__.items()
dict_items([('a', 4), ('b', 3)]) |
That doesn't sound right. It should be available always. |
To elaborate: assuming not Windows, the configure script has two checks: if AC_CHECK_TYPE(int32_t, ...) succeeds (which should happen whenever int32_t is defined in either stdint.h or inttypes.h), it #defines HAVE_INT32_T. If AC_TYPE_INT32_T succeeds (which should happen whenever int32_t is *not* defined in either stdint.h or inttypes.h), it #defines int32_t to be a suitable type. Then the pyport check makes sure than in both circumstances, PY_INT32_T is #defined. I don't believe there are any platforms out there that we care about for Python for which both checks fail. |
Thank you, mark. I've added PY_INT16_T and PY_UINT16_T for windows, too. methane@dfaa44c I'll post next patchset soon. |
FYI, bench result of USABLE_FRACTION(n) = (n/2): Report on Linux ip-10-0-1-249 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-1 (2016-03-06) x86_64 I've changed recommended usable fraction from 5/8~2/3 to 1/2~2/3. |
Last patch I've posted implements "strict ordering rule" on key sharing dict.
I ran sphinx-build on this patch and master branch. https://gist.github.com/methane/df89221222cc2474af1fe61a960e100d Summary |
And Python benchmark result is here. pickup ### call_method_slots ### ### call_method_unknown ### ### call_simple ### ### chameleon_v2 ### ### chaos ### ### django_v3 ### ### formatted_logging ### [97/1823] ### normal_startup ### ### raytrace ### ### regex_compile ### ### richards ### ### startup_nosite ### |
Anyone can review this by Python 3.6a3 ? |
ping |
You make need to ask for review help on python-dev. Patches this big take a lot of time and energy to review. Making pings into the blind doesn't help much (most of the patch reviewers are highly time constrained). |
I see, and I'm sorry about that. |
Update the patch to use standard int types instead of adding PY_INT16_T |
New changeset 0bd618fe0639 by Victor Stinner in branch 'default': New changeset 66a539984c9c by Victor Stinner in branch 'default': |
Hi INADA Naoki, we discuss a lot about your compact dict change at the current Python core dev sprint. We *all* want your change in Python 3.6! I looked at the code and I found some minor issues, but sadly I'm very busy right now with soooo many other topics. I proposed to just push your change *right now* (before Python 3.6 feature freeze, scheduled for this week-end with the first beta) but "polish" the code later. I made a tiny change to not hardcode PyDictKeysObject.dk_indices size (8). I plan to write more documentations about the implementation and add some assertions, because right now it's quite hard to see the overall picture and understand details. I already started to write some doc locally. I discussed with Eric Snow about the PEP-468 (preserve keyword ordering). He asked me the remove the sentence from the change, because he is going to rephrase his PEP to replace "OrderedDict" with "ordered mapping" (don't be specific on the type). But technically it's true that compact dict preserves the keyword order, so implement the PEP :-) I pushed the latest patch using intXX_t types. I made another change: I skipped test_gdb which failed because python-gdb.py wasn't updated for the new dict: I opened the issue bpo-28023 to try to not forget to do it. I already worked on this file, so I can try to update the gdb plugin. I also added "_PyDict_Dummy() function has been removed." to the commit message ;-) Congrats Naoki and thanks! -- TODO: The compact dict is not documented in What's New in Python 3.6. It would be nice to run some benchmarks and compare the memory usage to give some exciting numbers. |
Did you test the patch with your superstable benchmarking tools Victor? |
New changeset 36545fa93d2b by Victor Stinner in branch 'default': |
Serhiy Storchaka added the comment:
Not yet, as I wrote, we should benchmark the code (CPU and memory) to See my long comment for more details ;-) |
New changeset 19902d840448 by Victor Stinner in branch 'default': |
New changeset 9434f511937d by Raymond Hettinger in branch 'default': |
New changeset 48a1f97d03b4 by Victor Stinner in branch 'default': New changeset fc8aaa073eb4 by Victor Stinner in branch 'default': New changeset 378e000a6878 by Victor Stinner in branch 'default': |
Isn't this premature to commit changes to critical part of Python core without having reliable benchmark results? |
For an unknown reason, the code fails on a few buildbots, but not all of them.
These buildbot slaves use non-Intel CPU: POWER7, POWER8, s390x System Z (I don't know this platform at all!). gcc -pthread -o Programs/_freeze_importlib Programs/_freeze_importlib.o (...) -lpthread -ldl -lutil -lm |
Please keep it open for the remaining issue: "TODO: The compact dict is not documented in What's New in Python 3.6. It would be nice to run some benchmarks and compare the memory usage to give some exciting numbers." |
What affect will this have with hash randomization? |
Hash randomization is still used and useful to avoid the hash DoS. |
It seems like the memory usage is between 20% and 25% smaller. Great job! Memory usage, Python 3.5 => Python 3.6 on Linux x86_64: ./python -c 'import sys; print(sys.getsizeof({str(i):i for i in range(10)}))'
Note: the size is the the size of the container itself, not of keys nor values. |
As I expected, a dictionary lookup is a _little bit_ slower (3%) between Python 3.5 and Python 3.6: $ ./python -m perf timeit -s 'd={str(i):i for i in range(100)}' 'd["10"]; d["20"]; d["30"]; d["40"]; d["50"]; d["10"]; d["20"]; d["30"]; d["40"]; d["50"]' --rigorous Median +- std dev: [lookup35] 309 ns +- 10 ns -> [lookup36] 320 ns +- 8 ns: 1.03x slower |
There are a lot of other changes in interpreter core between 3.5 and 3.5 (such as new bytecode and optimized function calls). Could you compare the performance between the version just before adding new dict implementation and the version just after this? |
3% slowdown in microbench is not surprising. Instead, I've added freelist for most compact PyDictKeys. |
New changeset 0773e5cb8608 by Victor Stinner in branch 'default': |
Serhiy: "Could you compare the performance between the version just before adding new dict implementation and the version just after this?" I'm running benchmarks, I will keep you in touch :-) Since the main issue (implement compact dict) is solved, I close the issue. Please open new specific issue if you want to enhance the implementation, fix bugs, etc. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: