Skip to content

gh-150815: Speed up copy.deepcopy() of containers with atomic elements#150822

Open
gaborbernat wants to merge 2 commits into
python:mainfrom
gaborbernat:opt/deepcopy-inline-atomic
Open

gh-150815: Speed up copy.deepcopy() of containers with atomic elements#150822
gaborbernat wants to merge 2 commits into
python:mainfrom
gaborbernat:opt/deepcopy-inline-atomic

Conversation

@gaborbernat
Copy link
Copy Markdown
Contributor

@gaborbernat gaborbernat commented Jun 2, 2026

copy.deepcopy() copies a structure by sending every element back through deepcopy(). For elements that need no copying at all — strings, ints, None, booleans, floats and the other immutable atomic types — that round trip costs a full function call each, even though the value handed back is the same object. Real data is dominated by these atomic leaves: a parsed JSON document, a settings dict cloned before mutation, a record copied inside a framework. The keys are strings and most values are strings and numbers, so copying spends most of its time calling deepcopy() only to get the same object straight back.

This folds the atomic-type check that already gates the top of deepcopy() into the dict, list and tuple copiers, so an atomic element is returned as-is without the per-item call. The check is the same one deepcopy() runs, and atomic objects are not memoized either way, so the result is identical for shared references, recursive structures and int/tuple subclasses.

Deep-copying 105 JSON documents drawn from the top-1000 PyPI projects improves from 1.21 ms to 990 µs, 22% faster. This follows the atomic fast path added in gh-114264, extending it from the entry point to the per-element loop.

Benchmark base patched
deepcopy 105 real corpus JSON objects 1.21 ms 990 µs: 22% faster
Benchmark (pyperf)

Run base vs patched by swapping Lib/copy.py on the same interpreter. The figure above is from 105 JSON documents in the top-1000 PyPI corpus; the self-contained script below builds an equivalent atomic-heavy structure and shows a comparable percentage gain.

import copy, pyperf

# Representative of parsed-JSON / config data: string keys, scalar leaves.
doc = {
    "name": "example-package", "version": "1.2.3", "private": False,
    "scripts": {"build": "tsc", "test": "pytest", "lint": "ruff check ."},
    "keywords": ["cli", "async", "http", "json"],
    "dependencies": {f"dep{i}": f"^{i}.0.0" for i in range(20)},
    "authors": [{"name": f"Person {i}", "email": f"p{i}@example.com", "active": True} for i in range(10)],
    "config": {"timeout": 30, "retries": 3, "verbose": False, "level": None},
}
objs = [doc] * 50

runner = pyperf.Runner()
runner.bench_func("deepcopy atomic-heavy structures", lambda: [copy.deepcopy(o) for o in objs])

Resolves #150815.

…lements

The dict, list and tuple deep-copiers send every element back through
deepcopy(), paying a function call even for atomic immutable elements that
deepcopy() returns unchanged. Inline the atomic-type check into the three
copiers so those elements are returned as-is. Behavior is identical, including
shared references, recursion and int/tuple subclasses.
@gaborbernat gaborbernat force-pushed the opt/deepcopy-inline-atomic branch from aeab6f7 to 8b8b7e8 Compare June 2, 2026 23:45
Comment thread Lib/copy.py Outdated


def _deepcopy_list(x, memo, deepcopy=deepcopy):
def _deepcopy_list(x, memo, deepcopy=deepcopy, _atomic=_atomic_types):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this trick of capturing the global in a local still a net positive in newer Python?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark numbers I provided were done against a build on the main branch. Now I haven't tried enabling the JIT or any other advanced features, but out of the box there is a significant benefit here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But did those gains come from using a local or only from doing the type(...) in check?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should let the performance people do their thing and not try to artificially speed things up like this.

@eendebakpt
Copy link
Copy Markdown
Contributor

There are more options available to make deepcopy faster (see #91610 (comment)).

If we want to make deepcopy faster, I believe we should gather enough support so a core dev can review a C (or rust?) implementation. With a C implementation we have much larger performance gains (see https://github.com/percolab/copium for example).

The benchmark shows the speedup comes entirely from the inlined
type(...) in _atomic_types check, not from binding the global to a
local default argument, so drop the local capture for a minimal change.
@gaborbernat
Copy link
Copy Markdown
Contributor Author

I don't think a pure-Python tweak and a C/Rust deepcopy are opposing directions — they optimize different ends and can coexist. This change helps every build today with no extension to compile and no new maintenance surface, and it doesn't block or complicate a future C implementation; if one lands, this just becomes a small fast path that the C version supersedes.

I've also reduced this PR to the minimal change: a 3-way benchmark (base / inline-check-with-local-capture / inline-check-using-the-global) showed the entire speedup comes from the inlined type(...) in _atomic_types check, and the local-variable capture added nothing measurable, so I dropped it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up copy.deepcopy() of containers holding atomic elements

4 participants