gh-150815: Speed up copy.deepcopy() of containers with atomic elements#150822
gh-150815: Speed up copy.deepcopy() of containers with atomic elements#150822gaborbernat wants to merge 2 commits into
Conversation
…lements The dict, list and tuple deep-copiers send every element back through deepcopy(), paying a function call even for atomic immutable elements that deepcopy() returns unchanged. Inline the atomic-type check into the three copiers so those elements are returned as-is. Behavior is identical, including shared references, recursion and int/tuple subclasses.
aeab6f7 to
8b8b7e8
Compare
|
|
||
|
|
||
| def _deepcopy_list(x, memo, deepcopy=deepcopy): | ||
| def _deepcopy_list(x, memo, deepcopy=deepcopy, _atomic=_atomic_types): |
There was a problem hiding this comment.
Is this trick of capturing the global in a local still a net positive in newer Python?
There was a problem hiding this comment.
The benchmark numbers I provided were done against a build on the main branch. Now I haven't tried enabling the JIT or any other advanced features, but out of the box there is a significant benefit here.
There was a problem hiding this comment.
But did those gains come from using a local or only from doing the type(...) in check?
There was a problem hiding this comment.
I think we should let the performance people do their thing and not try to artificially speed things up like this.
|
There are more options available to make deepcopy faster (see #91610 (comment)). If we want to make deepcopy faster, I believe we should gather enough support so a core dev can review a C (or rust?) implementation. With a C implementation we have much larger performance gains (see https://github.com/percolab/copium for example). |
The benchmark shows the speedup comes entirely from the inlined type(...) in _atomic_types check, not from binding the global to a local default argument, so drop the local capture for a minimal change.
|
I don't think a pure-Python tweak and a C/Rust I've also reduced this PR to the minimal change: a 3-way benchmark (base / inline-check-with-local-capture / inline-check-using-the-global) showed the entire speedup comes from the inlined |
copy.deepcopy()copies a structure by sending every element back throughdeepcopy(). For elements that need no copying at all — strings, ints,None, booleans, floats and the other immutable atomic types — that round trip costs a full function call each, even though the value handed back is the same object. Real data is dominated by these atomic leaves: a parsed JSON document, a settings dict cloned before mutation, a record copied inside a framework. The keys are strings and most values are strings and numbers, so copying spends most of its time callingdeepcopy()only to get the same object straight back.This folds the atomic-type check that already gates the top of
deepcopy()into thedict,listandtuplecopiers, so an atomic element is returned as-is without the per-item call. The check is the same onedeepcopy()runs, and atomic objects are not memoized either way, so the result is identical for shared references, recursive structures andint/tuplesubclasses.Deep-copying 105 JSON documents drawn from the top-1000 PyPI projects improves from 1.21 ms to 990 µs, 22% faster. This follows the atomic fast path added in gh-114264, extending it from the entry point to the per-element loop.
Benchmark (pyperf)
Run base vs patched by swapping
Lib/copy.pyon the same interpreter. The figure above is from 105 JSON documents in the top-1000 PyPI corpus; the self-contained script below builds an equivalent atomic-heavy structure and shows a comparable percentage gain.Resolves #150815.