-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed-up dict.copy() up to 5.5 times. #75362
Comments
It's possible to significantly improve performance of shallow dict copy. Currently, PyDict_Copy creates a new empty dict object and then inserts key/values into it one by one. My idea is to simply memcpy the whole keys/items region and do the necessary increfs after it. This works just fine for non-key-sharing dicts. With the following simple microbenchmark: import time
N = 1000000
for size in [0, 1, 10, 20, 50, 100, 500, 1000]:
d = dict([(str(i), i) for i in range(size)])
t = time.monotonic()
for i in range(N):
d.copy()
e = time.monotonic() - t
Output for 3.7 master:
Output for patched 3.7:
|
Why not creating a "preallocated" dict in that case? _PyDict_NewPresized() |
I don't think it's related to the proposed patch. Please take a look at the PR. |
I like idea. Slightly off topic. Copy on write can be implemented via dk_refcnt. |
The PR adds over 50 lines of code for optimising not very often used feature. There are two obvious ways of copying, dict(d) and d.copy(), the PR optimises just the latter one, and I'm not sure this is the most used way. The PR duplicates the low-level code, that increases maintainability cost. The PR changes the behavior. Currently the effect of copying is compacting the dict. >>> import sys
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
41020
>>> sys.getsizeof(dict(d))
41020
>>> for i in range(1000): del d[i]
...
>>> sys.getsizeof(dict(d))
20544
>>> sys.getsizeof(d.copy())
20544
>>> sys.getsizeof(d)
41020
>>> import sys
>>> d = dict.fromkeys(range(2000))
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
41020
>>> d = dict.fromkeys(range(2000))
>>> for i in range(1999): del d[i]
...
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
136 The PR preserves non compact layout in the copy. |
Why "del" doesn't compact the dict? |
I've added this check. See the updated PR.
The check that INADA suggested enables compacting on copy, if it is needed.
I started to look into the problem because I need this for my upcoming PEP, so please don't dismiss this idea right away. I also think that copying a dict isn't a "not very often used feature", it depends on your frame of references. In some applications you do copy dict a lot. 50 lines of code speeding up one of the core methods 5.5x is a fair price to pay.
That can also be easily optimized, btw. I'll see if I can do that without impacting the performance of creating new dicts.
FWIW, the PR doesn't duplicate any of the code. It provides a new implementation that is more efficient than the old approach. |
This is a good question, btw. |
The side effect of this patch is making dict.copy() atomic. This is a worthy feature if extent it to dict constructor. For now the only way of making an atomic (or almost atomic) copy of a dict is dict(list(d.itemview())). It isn't very time and memory efficient. If you will make dict copying removing holes and extend your patch to dict constructor, it could be more useful. Look at the set implementation. It doesn't just use memcpy, but it contains specialized insertion implementation for the case if all items are unique. Fast copying is more important for dicts since the copying is more common for sets. It is a part of set operations and it is common to convert a set to a frozenset. |
I've pushed a new version of the patch that I intend to merge tomorrow. The last version has only one minor change: it uses fast-path for "slightly" non-compact dicts too (dicts don't use *at most* 3 entries). This protects us from pathological cases when a huge dict being almost emptied with pop/del and then gets copied -- we indeed want the copy to be compact. Although I believe that the real issue is that del and pop don't compact dicts from time to time, but I don't want that issue to hold off this patch in any way.
It's already useful -- I'm supporting a large code base (>0.5M LOC) which uses dict.copy() extensively, and it shows up in profile. I've seen it in many other places (particularly ORMs love to store information in dicts and use dict.copy() to track dirty state/changes). Please don't say that dict.copy() is not a common operation or that dict(other_dict) is more common than other_dict.copy() -- that's simply incorrect.
I agree. I'll work on that later in a follow-up PR. Let's move in small steps. |
Yury: Would you mind to open an issue to investigate why dict are not compatected automatically? |
Victor: done; https://bugs.python.org/issue32623 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: