cache_utils: fix QuantizedLayer to correctly propagate reorder_cache, crop, and batch ops to quantized buffers by GitGlimpse895 · Pull Request #45510 · huggingface/transformers

GitGlimpse895 · 2026-04-19T07:34:56Z

What does this PR do?

QuantizedLayer maintains two separate storage regions: a full-precision
residual buffer (self.keys / self.values) and a quantized buffer
(self._quantized_keys / self._quantized_values). However, the four
mutation methods inherited from DynamicLayer — reorder_cache,
crop, batch_repeat_interleave, and batch_select_indices — only
operated on the residual buffer, silently leaving the quantized buffer
untouched.

Concrete failure modes:

Beam search (reorder_cache): the quantized buffer stays in
original beam order while the residual reorders, causing crossed
attention across beams with no error raised.
Constrained generation rollback (crop): cumulative_length
diverges from the actual stored state, corrupting subsequent
get_seq_length calls.
Group beam search / contrastive decoding (batch_select_indices,
batch_repeat_interleave): batch dimension of the quantized buffer
is never updated, producing mismatched batch sizes between the two
storage regions.

This PR overrides all four methods in QuantizedLayer. Since
_quantized_keys/_quantized_values are opaque backend objects for
both QuantoQuantizedLayer and HQQQuantizedLayer, the fix uses a
dequantize → operate → re-quantize pattern, which is backend-agnostic
and does not compound quantization error meaningfully beyond what is
already introduced at the original quantization step.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@gante @SunMarc

I confirm that this is not a pure code agent PR.

… crop, and batch ops to quantized buffers

cache_utils: fix QuantizedLayer to correctly propagate reorder_cache,…

851a938

… crop, and batch ops to quantized buffers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache_utils: fix QuantizedLayer to correctly propagate reorder_cache, crop, and batch ops to quantized buffers#45510

cache_utils: fix QuantizedLayer to correctly propagate reorder_cache, crop, and batch ops to quantized buffers#45510
GitGlimpse895 wants to merge 1 commit intohuggingface:mainfrom
GitGlimpse895:fix/quantized-layer-cache-ops

GitGlimpse895 commented Apr 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GitGlimpse895 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GitGlimpse895 commented Apr 19, 2026 •

edited

Loading