Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact and ordered dict #71537

Closed
methane opened this issue Jun 19, 2016 · 36 comments
Closed

Compact and ordered dict #71537

methane opened this issue Jun 19, 2016 · 36 comments
Labels
interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@methane
Copy link
Member

methane commented Jun 19, 2016

BPO 27350
Nosy @rhettinger, @jcea, @vstinner, @jab, @methane, @ethanfurman, @serhiy-storchaka, @sdrees
Files
  • compact-dict.patch
  • compact-dict.patch: Use SIZEOF_VOID_P
  • compact-dict.patch
  • compact-dict.patch: Use Py_LOCAL_INLINE
  • compact-dict.patch: empty keys and values arn't special enough to break the rules
  • compact-dict.patch: fix leak
  • compact-dict.patch: Win32 support, fix comments and some refactoring.
  • compact-dict.patch: more nit fix
  • compact-dict.patch: ordered split dict, and bugfix
  • compact-dict.patch: Add comments and tests
  • compact-dict.patch: small fixups, thanks for Jim
  • compact-dict.patch: Added freelist for PyDictKeys object.
  • compact-dict.patch: Fix compile error on Windows
  • compact-dict.patch: Add more comments and NEWS entry
  • compact-dict.patch: Wrap lines longer than 79 characters
  • compact-dict.patch: Update patch for current tip
  • compact-dict.patch: Stop adding PY_INT16_T
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-12.12:49:21.502>
    created_at = <Date 2016-06-19.03:43:15.983>
    labels = ['interpreter-core', 'performance']
    title = 'Compact and ordered dict'
    updated_at = <Date 2017-05-17.17:24:07.031>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2017-05-17.17:24:07.031>
    actor = 'jcea'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-12.12:49:21.502>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2016-06-19.03:43:15.983>
    creator = 'methane'
    dependencies = []
    files = ['43463', '43465', '43467', '43472', '43480', '43484', '43509', '43520', '43542', '43558', '44086', '44110', '44194', '44259', '44267', '44395', '44418']
    hgrepos = []
    issue_num = 27350
    keywords = ['patch']
    message_count = 36.0
    messages = ['268837', '268842', '268843', '268844', '268846', '269004', '269011', '269012', '269068', '269072', '269367', '269369', '269429', '272098', '272135', '272144', '274703', '275065', '275067', '275071', '275090', '275092', '275099', '275112', '275116', '275118', '275120', '275357', '275378', '275382', '275581', '275587', '275589', '275626', '276030', '276031']
    nosy_count = 9.0
    nosy_names = ['rhettinger', 'jcea', 'vstinner', 'jab', 'methane', 'ethan.furman', 'python-dev', 'serhiy.storchaka', 'dilettant']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue27350'
    versions = ['Python 3.6']

    @methane
    Copy link
    Member Author

    methane commented Jun 19, 2016

    I've implemented compact ordered dictionary, introduced in PyPy blog 1.

    To finish my work, I really need core developer's help. Please see TODO comment
    in the patch.

    See also:
    https://mail.python.org/pipermail/python-dev/2016-June/145299.html
    methane#1 (same to attached patch)

    @methane methane added interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jun 19, 2016
    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Jun 19, 2016

    How to support sizeof(Py_ssize_t) == 4?

    #if SIZEOF_VOID_P == 4

    Can I use int8_t, int16_t and int32_t

    You can use PY_INT32_T for 32-bit signed integers, short for 16-bit signed integers (if SIZEOF_SHORT == 2) and signed char for 8-bit signed integers. Or introduce PY_INT16_T and PY_INT8_T similarly as it was done for PY_INT32_T.

    @methane
    Copy link
    Member Author

    methane commented Jun 19, 2016

    > How to support sizeof(Py_ssize_t) == 4?
    #if SIZEOF_VOID_P == 4

    Thanks!

    > Can I use int8_t, int16_t and int32_t
    You can use PY_INT32_T for 32-bit signed integers, short for 16-bit signed integers (if SIZEOF_SHORT == 2) and signed char for 8-bit signed integers. Or introduce PY_INT16_T and PY_INT8_T similarly as it was done for PY_INT32_T.

    I found PY_INT32_T in pyport.h. It seems only available when stdint.h is available.
    What is the meaning of using PY_INT32_T instead of int32_t ?

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Jun 19, 2016

    When stdint.h is not available you can define PY_INT32_T externally. E.g. CFLAGS="-DPY_INT32_T=__int32".

    @methane
    Copy link
    Member Author

    methane commented Jun 19, 2016

    Make sense! I did it.

    @methane
    Copy link
    Member Author

    methane commented Jun 21, 2016

    As I posted to python-dev ML, this compact ordered dict doesn't keep insertion order
    for key-shared dict.

    >>> class A:
    ...   ...
    ...
    >>> a = A()
    >>> b = A()
    >>> a.a = 1
    >>> a.b = 2
    >>> b.b = 3
    >>> b.a = 4
    >>> a.__dict__.items()
    dict_items([('a', 1), ('b', 2)])
    >>> b.__dict__.items()
    dict_items([('a', 4), ('b', 3)])

    @mdickinson
    Copy link
    Member

    mdickinson commented Jun 21, 2016

    I found PY_INT32_T in pyport.h. It seems only available when stdint.h is available.

    That doesn't sound right. It should be available always.

    @mdickinson
    Copy link
    Member

    mdickinson commented Jun 21, 2016

    That doesn't sound right. It should be available always.

    To elaborate: assuming not Windows, the configure script has two checks: if AC_CHECK_TYPE(int32_t, ...) succeeds (which should happen whenever int32_t is defined in either stdint.h or inttypes.h), it #defines HAVE_INT32_T. If AC_TYPE_INT32_T succeeds (which should happen whenever int32_t is *not* defined in either stdint.h or inttypes.h), it #defines int32_t to be a suitable type. Then the pyport check makes sure than in both circumstances, PY_INT32_T is #defined.

    I don't believe there are any platforms out there that we care about for Python for which both checks fail.

    @methane
    Copy link
    Member Author

    methane commented Jun 22, 2016

    Thank you, mark.

    I've added PY_INT16_T and PY_UINT16_T for windows, too.

    methane@dfaa44c
    methane@af80dc2

    I'll post next patchset soon.

    @methane
    Copy link
    Member Author

    methane commented Jun 22, 2016

    FYI, bench result of USABLE_FRACTION(n) = (n/2):

    Report on Linux ip-10-0-1-249 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-1 (2016-03-06) x86_64
    Total CPU cores: 2
    +--------------+---------+------------+--------------+------------------------+
    | Benchmark | default | usable n/2 | Change | Significance |
    +==============+=========+============+==============+========================+
    | 2to3 | 5.18 | 5.17 | 1.00x faster | Not significant |
    +--------------+---------+------------+--------------+------------------------+
    | chameleon_v2 | 4.03 | 3.87 | 1.04x faster | Significant (t=61.85) |
    +--------------+---------+------------+--------------+------------------------+
    | django_v3 | 0.4 | 0.39 | 1.01x faster | Not significant |
    +--------------+---------+------------+--------------+------------------------+
    | fastpickle | 0.33 | 0.33 | 1.00x slower | Not significant |
    +--------------+---------+------------+--------------+------------------------+
    | fastunpickle | 0.41 | 0.42 | 1.04x slower | Significant (t=-56.83) |
    +--------------+---------+------------+--------------+------------------------+
    | json_dump_v2 | 1.93 | 1.94 | 1.01x slower | Not significant |
    +--------------+---------+------------+--------------+------------------------+
    | json_load | 0.33 | 0.33 | 1.01x faster | Not significant |
    +--------------+---------+------------+--------------+------------------------+
    | nbody | 0.19 | 0.19 | 1.00x slower | Not significant |
    +--------------+---------+------------+--------------+------------------------+
    | regex_v8 | 0.04 | 0.04 | 1.00x faster | Not significant |
    +--------------+---------+------------+--------------+------------------------+
    | tornado_http | 0.19 | 0.2 | 1.01x slower | Not significant |
    +--------------+---------+------------+--------------+------------------------+

    I've changed recommended usable fraction from 5/8~2/3 to 1/2~2/3.

    @methane
    Copy link
    Member Author

    methane commented Jun 27, 2016

    Last patch I've posted implements "strict ordering rule" on key sharing dict.

    • Insertion order should be strictly equal to order in shared key.
      If insertion position is not equal to ma_used, convert it to combined
      form.

    • Deleting from split table is prohibited. Convert the table to combined form. (to keep ma_used == next insertion position rule).

    I ran sphinx-build on this patch and master branch.
    ("intern" in the result is incomplete implementation of my new idea.
    Please ignore it in this issue.)

    https://gist.github.com/methane/df89221222cc2474af1fe61a960e100d

    Summary
    -------------
    Speed: No regression from master branch.
    Memory usage: Reduced from 172452k to 160876k

    @methane
    Copy link
    Member Author

    methane commented Jun 27, 2016

    And Python benchmark result is here.
    https://gist.github.com/methane/e7a2ae389ca13905508905f5fa4ad46c

    pickup
    -------

    ### call_method_slots ###
    Min: 0.282221 -> 0.266215: 1.06x faster
    Avg: 0.282379 -> 0.266448: 1.06x faster
    Significant (t=780.35)
    Stddev: 0.00015 -> 0.00032: 2.0993x larger

    ### call_method_unknown ###
    Min: 0.280347 -> 0.273366: 1.03x faster
    Avg: 0.280579 -> 0.273538: 1.03x faster
    Significant (t=800.73)
    Stddev: 0.00011 -> 0.00011: 1.0061x smaller

    ### call_simple ###
    Min: 0.202884 -> 0.208746: 1.03x slower
    Avg: 0.203037 -> 0.208866: 1.03x slower
    Significant (t=-642.05)
    Stddev: 0.00013 -> 0.00009: 1.3236x smaller

    ### chameleon_v2 ###
    Min: 3.715995 -> 3.819750: 1.03x slower
    Avg: 3.738736 -> 3.851562: 1.03x slower
    Significant (t=-62.21)
    Stddev: 0.00851 -> 0.01602: 1.8817x larger

    ### chaos ###
    Min: 0.195156 -> 0.203020: 1.04x slower
    Avg: 0.195803 -> 0.203704: 1.04x slower
    Significant (t=-179.73)
    Stddev: 0.00026 -> 0.00035: 1.3371x larger

    ### django_v3 ###
    Min: 0.369087 -> 0.386535: 1.05x slower
    Avg: 0.370569 -> 0.388277: 1.05x slower
    Significant (t=-184.30)
    Stddev: 0.00064 -> 0.00072: 1.1206x larger

    ### formatted_logging ### [97/1823]
    Min: 0.226584 -> 0.233564: 1.03x slower
    Avg: 0.229062 -> 0.235876: 1.03x slower
    Significant (t=-35.32)
    Stddev: 0.00133 -> 0.00140: 1.0505x larger

    ### normal_startup ###
    Min: 0.277946 -> 0.269843: 1.03x faster
    Avg: 0.279173 -> 0.271878: 1.03x faster
    Significant (t=17.65)
    Stddev: 0.00175 -> 0.00374: 2.1348x larger

    ### raytrace ###
    Min: 0.961784 -> 0.991375: 1.03x slower
    Avg: 0.963318 -> 0.994727: 1.03x slower
    Significant (t=-159.11)
    Stddev: 0.00073 -> 0.00183: 2.5204x larger

    ### regex_compile ###
    Min: 0.256675 -> 0.267169: 1.04x slower
    Avg: 0.257559 -> 0.268213: 1.04x slower
    Significant (t=-136.14)
    Stddev: 0.00050 -> 0.00060: 1.1996x larger

    ### richards ###
    Min: 0.129063 -> 0.134478: 1.04x slower
    Avg: 0.130158 -> 0.135382: 1.04x slower
    Significant (t=-18.28)
    Stddev: 0.00284 -> 0.00036: 7.8299x smaller

    ### startup_nosite ###
    Min: 0.165788 -> 0.159139: 1.04x faster
    Avg: 0.166089 -> 0.159848: 1.04x faster
    Significant (t=219.05)
    Stddev: 0.00021 -> 0.00035: 1.6594x larger

    @methane
    Copy link
    Member Author

    methane commented Jun 28, 2016

    Anyone can review this by Python 3.6a3 ?

    @methane
    Copy link
    Member Author

    methane commented Aug 6, 2016

    ping

    @rhettinger
    Copy link
    Contributor

    rhettinger commented Aug 7, 2016

    You make need to ask for review help on python-dev. Patches this big take a lot of time and energy to review. Making pings into the blind doesn't help much (most of the patch reviewers are highly time constrained).

    @methane
    Copy link
    Member Author

    methane commented Aug 8, 2016

    I see, and I'm sorry about that.

    @methane
    Copy link
    Member Author

    methane commented Sep 7, 2016

    Update the patch to use standard int types instead of adding PY_INT16_T
    ref: https://bugs.python.org/issue17884

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 8, 2016

    New changeset 0bd618fe0639 by Victor Stinner in branch 'default':
    Implement compact dict
    https://hg.python.org/cpython/rev/0bd618fe0639

    New changeset 66a539984c9c by Victor Stinner in branch 'default':
    Add Py_MEMBER_SIZE macro
    https://hg.python.org/cpython/rev/66a539984c9c

    @vstinner
    Copy link
    Member

    vstinner commented Sep 8, 2016

    Hi INADA Naoki, we discuss a lot about your compact dict change at the current Python core dev sprint. We *all* want your change in Python 3.6!

    I looked at the code and I found some minor issues, but sadly I'm very busy right now with soooo many other topics. I proposed to just push your change *right now* (before Python 3.6 feature freeze, scheduled for this week-end with the first beta) but "polish" the code later. I made a tiny change to not hardcode PyDictKeysObject.dk_indices size (8).

    I plan to write more documentations about the implementation and add some assertions, because right now it's quite hard to see the overall picture and understand details. I already started to write some doc locally.

    I discussed with Eric Snow about the PEP-468 (preserve keyword ordering). He asked me the remove the sentence from the change, because he is going to rephrase his PEP to replace "OrderedDict" with "ordered mapping" (don't be specific on the type). But technically it's true that compact dict preserves the keyword order, so implement the PEP :-)

    I pushed the latest patch using intXX_t types.

    I made another change: I skipped test_gdb which failed because python-gdb.py wasn't updated for the new dict: I opened the issue bpo-28023 to try to not forget to do it. I already worked on this file, so I can try to update the gdb plugin.

    I also added "_PyDict_Dummy() function has been removed." to the commit message ;-)

    Congrats Naoki and thanks!

    --

    TODO: The compact dict is not documented in What's New in Python 3.6. It would be nice to run some benchmarks and compare the memory usage to give some exciting numbers.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 8, 2016

    Did you test the patch with your superstable benchmarking tools Victor?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 8, 2016

    New changeset 36545fa93d2b by Victor Stinner in branch 'default':
    Add assertions to dk_set_index()
    https://hg.python.org/cpython/rev/36545fa93d2b

    @vstinner
    Copy link
    Member

    vstinner commented Sep 8, 2016

    Serhiy Storchaka added the comment:

    Did you test the patch with your superstable benchmarking tools Victor?

    Not yet, as I wrote, we should benchmark the code (CPU and memory) to
    document the code:

    See my long comment for more details ;-)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 8, 2016

    New changeset 19902d840448 by Victor Stinner in branch 'default':
    Split lookdict_unicode_nodummy() assertion to debug
    https://hg.python.org/cpython/rev/19902d840448

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 8, 2016

    New changeset 9434f511937d by Raymond Hettinger in branch 'default':
    Issue bpo-27350: Add credits
    https://hg.python.org/cpython/rev/9434f511937d

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 8, 2016

    New changeset 48a1f97d03b4 by Victor Stinner in branch 'default':
    dk_get_index/dk_set_index uses a type indices variable
    https://hg.python.org/cpython/rev/48a1f97d03b4

    New changeset fc8aaa073eb4 by Victor Stinner in branch 'default':
    Reindeint DK_xxx macros
    https://hg.python.org/cpython/rev/fc8aaa073eb4

    New changeset 378e000a6878 by Victor Stinner in branch 'default':
    Add documentation to the dict implementation
    https://hg.python.org/cpython/rev/378e000a6878

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 8, 2016

    Isn't this premature to commit changes to critical part of Python core without having reliable benchmark results?

    @vstinner
    Copy link
    Member

    vstinner commented Sep 8, 2016

    For an unknown reason, the code fails on a few buildbots, but not all of them.

    • PPC64 Fedora 3.x: GNU/Linux ppc64 POWER7 Fedora 20 GCC Compile Farm host gcc110.fsffrance.org
    • PPC64LE Fedora 3.x: GNU/Linux ppc64le POWER8 Fedora 21 GCC Compile Farm host gcc112.fsffrance.org
    • s390x Debian 3.x: GNU/Linux s390x System Z Debian Jessie
    • s390x RHEL 3.x: GNU/Linux s390x System Z RHEL 7.1
    • s390x SLES 3.x: GNU/Linux s390x System Z SLES 12

    These buildbot slaves use non-Intel CPU: POWER7, POWER8, s390x System Z (I don't know this platform at all!).

    gcc -pthread -o Programs/_freeze_importlib Programs/_freeze_importlib.o (...) -lpthread -ldl -lutil -lm
    ./Programs/_freeze_importlib \
    ./Lib/importlib/_bootstrap.py Python/importlib.h
    _freeze_importlib: Objects/dictobject.c:805: lookdict_unicode_nodummy: Assertion `ep->me_key != ((void *)0)' failed.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 9, 2016

    benjamin.peterson set status: open -> closed

    Please keep it open for the remaining issue:

    "TODO: The compact dict is not documented in What's New in Python 3.6. It would be nice to run some benchmarks and compare the memory usage to give some exciting numbers."

    @vstinner vstinner reopened this Sep 9, 2016
    @ethanfurman
    Copy link
    Member

    ethanfurman commented Sep 9, 2016

    What affect will this have with hash randomization?

    @vstinner
    Copy link
    Member

    vstinner commented Sep 9, 2016

    What affect will this have with hash randomization?

    Hash randomization is still used and useful to avoid the hash DoS.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 10, 2016

    It seems like the memory usage is between 20% and 25% smaller. Great job!

    Memory usage, Python 3.5 => Python 3.6 on Linux x86_64:

    ./python -c 'import sys; print(sys.getsizeof({str(i):i for i in range(10)}))'

    • 10 items: 480 B => 384 B (-20%)
    • 100 items: 6240 B => 4720 B (-24%)
    • 1000 items: 49248 B => 36984 B (-25%)

    Note: the size is the the size of the container itself, not of keys nor values.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 10, 2016

    As I expected, a dictionary lookup is a _little bit_ slower (3%) between Python 3.5 and Python 3.6:

    $ ./python -m perf timeit -s 'd={str(i):i for i in range(100)}' 'd["10"]; d["20"]; d["30"]; d["40"]; d["50"]; d["10"]; d["20"]; d["30"]; d["40"]; d["50"]' --rigorous

    Median +- std dev: [lookup35] 309 ns +- 10 ns -> [lookup36] 320 ns +- 8 ns: 1.03x slower

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 10, 2016

    There are a lot of other changes in interpreter core between 3.5 and 3.5 (such as new bytecode and optimized function calls). Could you compare the performance between the version just before adding new dict implementation and the version just after this?

    @methane
    Copy link
    Member Author

    methane commented Sep 10, 2016

    3% slowdown in microbench is not surprising.
    Compact dict introduces one additional indirection.

    Instead, I've added freelist for most compact PyDictKeys.
    So I think overall performance is almost same to before compact dict.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 12, 2016

    New changeset 0773e5cb8608 by Victor Stinner in branch 'default':
    Issue bpo-27350: Document compact dict memory usage
    https://hg.python.org/cpython/rev/0773e5cb8608

    @vstinner
    Copy link
    Member

    vstinner commented Sep 12, 2016

    Serhiy: "Could you compare the performance between the version just before adding new dict implementation and the version just after this?"

    I'm running benchmarks, I will keep you in touch :-)

    Since the main issue (implement compact dict) is solved, I close the issue. Please open new specific issue if you want to enhance the implementation, fix bugs, etc.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants