Pure-Python pickle._Pickler can be significantly optimized

# Feature or enhancement

### Proposal:

Inspired by some of the optimization patterns used in #148700 , I took a look at the pure Python pickler and found some low-hanging fruit.  Some of these changes do increase the complexity of maintaining the code, but the double-digit performance gains might be worth it given `dill` and `cloudpickle` usage patterns.

FYI, my testing showed 19-37% speed increase for `dill` so far.

Below is Opus's summary of the things *kept* in the PR I will be submitting. The PR includes microbenchmark experiments (`Misc/pickle-perf-diary.md`) for a much broader set of changes that did not survive.

```
  The pure-Python pickle._Pickler (Lib/pickle.py) is used as the fallback when
  the _pickle C accelerator is unavailable and by code that explicitly
  subclasses pickle.Pickler in Python. Its save() hot path currently does
  more work than necessary relative to Modules/_pickle.c::save():

  - the memo.get(id(obj)) and reducer_override probes run before the atomic-
    type dispatch, so int / None / bool / float values pay for checks the
    C reference implementation skips for these types;
  - _batch_appends and _batch_setitems use itertools.batched() + enumerate()
    even for exact list / dict instances, where the C accelerator's
    batch_list_exact / batch_dict_exact skip that machinery;
  - memoize() dispatches through put() to write a single MEMOIZE byte for
    protocol >= 4 (the common case);
  - save_long rebuilds the 2-byte BININT1 opcode via struct.pack on every
    small-int save;
  - framer.commit_frame() is invoked as a full method call per save() even
    though the hot check is a single length compare;
  - bytes values aren't in the save() dispatch fast path alongside str.

  Attached patch addresses each of these with measured wins of -20% to -49%
  on pure-Python _Pickler dump across a representative workload set
  (list_of_ints, list_of_strs, dict of str->int, deep list, nested list of
  dicts, list of short bytes, dict of bytes->int). Full CPython test suite,
  dill 0.4.1, and cloudpickle 3.1.2 all pass / match baseline. Detailed
  experiment ledger with raw bench data committed under
  Misc/pickle-perf-diary.md and Misc/pickle-perf-data/ on the proposed
  branch.
```

There is notable user-facing change that can be addressed but I'd defer to reviewers on how important it is: atomic types no longer invoke reducer_override, wich actually aligns this with _pickle.c::save().

### Has this already been discussed elsewhere?

No response given

### Links to previous discussion of this feature:

_No response_


### Linked PRs
* gh-148707

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pure-Python pickle._Pickler can be significantly optimized #148706

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Pure-Python pickle._Pickler can be significantly optimized #148706

Description

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions