Feature or enhancement
Proposal:
Inspired by some of the optimization patterns used in #148700 , I took a look at the pure Python pickler and found some low-hanging fruit. Some of these changes do increase the complexity of maintaining the code, but the double-digit performance gains might be worth it given dill and cloudpickle usage patterns.
FYI, my testing showed 19-37% speed increase for dill so far.
Below is Opus's summary of the things kept in the PR I will be submitting. The PR includes microbenchmark experiments (Misc/pickle-perf-diary.md) for a much broader set of changes that did not survive.
The pure-Python pickle._Pickler (Lib/pickle.py) is used as the fallback when
the _pickle C accelerator is unavailable and by code that explicitly
subclasses pickle.Pickler in Python. Its save() hot path currently does
more work than necessary relative to Modules/_pickle.c::save():
- the memo.get(id(obj)) and reducer_override probes run before the atomic-
type dispatch, so int / None / bool / float values pay for checks the
C reference implementation skips for these types;
- _batch_appends and _batch_setitems use itertools.batched() + enumerate()
even for exact list / dict instances, where the C accelerator's
batch_list_exact / batch_dict_exact skip that machinery;
- memoize() dispatches through put() to write a single MEMOIZE byte for
protocol >= 4 (the common case);
- save_long rebuilds the 2-byte BININT1 opcode via struct.pack on every
small-int save;
- framer.commit_frame() is invoked as a full method call per save() even
though the hot check is a single length compare;
- bytes values aren't in the save() dispatch fast path alongside str.
Attached patch addresses each of these with measured wins of -20% to -49%
on pure-Python _Pickler dump across a representative workload set
(list_of_ints, list_of_strs, dict of str->int, deep list, nested list of
dicts, list of short bytes, dict of bytes->int). Full CPython test suite,
dill 0.4.1, and cloudpickle 3.1.2 all pass / match baseline. Detailed
experiment ledger with raw bench data committed under
Misc/pickle-perf-diary.md and Misc/pickle-perf-data/ on the proposed
branch.
There is notable user-facing change that can be addressed but I'd defer to reviewers on how important it is: atomic types no longer invoke reducer_override, wich actually aligns this with _pickle.c::save().
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Linked PRs
Feature or enhancement
Proposal:
Inspired by some of the optimization patterns used in #148700 , I took a look at the pure Python pickler and found some low-hanging fruit. Some of these changes do increase the complexity of maintaining the code, but the double-digit performance gains might be worth it given
dillandcloudpickleusage patterns.FYI, my testing showed 19-37% speed increase for
dillso far.Below is Opus's summary of the things kept in the PR I will be submitting. The PR includes microbenchmark experiments (
Misc/pickle-perf-diary.md) for a much broader set of changes that did not survive.There is notable user-facing change that can be addressed but I'd defer to reviewers on how important it is: atomic types no longer invoke reducer_override, wich actually aligns this with _pickle.c::save().
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Linked PRs