Skip to content

Pure-Python pickle._Pickler can be significantly optimized #148706

@mjbommar

Description

@mjbommar

Feature or enhancement

Proposal:

Inspired by some of the optimization patterns used in #148700 , I took a look at the pure Python pickler and found some low-hanging fruit. Some of these changes do increase the complexity of maintaining the code, but the double-digit performance gains might be worth it given dill and cloudpickle usage patterns.

FYI, my testing showed 19-37% speed increase for dill so far.

Below is Opus's summary of the things kept in the PR I will be submitting. The PR includes microbenchmark experiments (Misc/pickle-perf-diary.md) for a much broader set of changes that did not survive.

  The pure-Python pickle._Pickler (Lib/pickle.py) is used as the fallback when
  the _pickle C accelerator is unavailable and by code that explicitly
  subclasses pickle.Pickler in Python. Its save() hot path currently does
  more work than necessary relative to Modules/_pickle.c::save():

  - the memo.get(id(obj)) and reducer_override probes run before the atomic-
    type dispatch, so int / None / bool / float values pay for checks the
    C reference implementation skips for these types;
  - _batch_appends and _batch_setitems use itertools.batched() + enumerate()
    even for exact list / dict instances, where the C accelerator's
    batch_list_exact / batch_dict_exact skip that machinery;
  - memoize() dispatches through put() to write a single MEMOIZE byte for
    protocol >= 4 (the common case);
  - save_long rebuilds the 2-byte BININT1 opcode via struct.pack on every
    small-int save;
  - framer.commit_frame() is invoked as a full method call per save() even
    though the hot check is a single length compare;
  - bytes values aren't in the save() dispatch fast path alongside str.

  Attached patch addresses each of these with measured wins of -20% to -49%
  on pure-Python _Pickler dump across a representative workload set
  (list_of_ints, list_of_strs, dict of str->int, deep list, nested list of
  dicts, list of short bytes, dict of bytes->int). Full CPython test suite,
  dill 0.4.1, and cloudpickle 3.1.2 all pass / match baseline. Detailed
  experiment ledger with raw bench data committed under
  Misc/pickle-perf-diary.md and Misc/pickle-perf-data/ on the proposed
  branch.

There is notable user-facing change that can be addressed but I'd defer to reviewers on how important it is: atomic types no longer invoke reducer_override, wich actually aligns this with _pickle.c::save().

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions