Skip to content

cache_utils: fix QuantizedLayer to correctly propagate reorder_cache, crop, and batch ops to quantized buffers#45510

Open
GitGlimpse895 wants to merge 1 commit intohuggingface:mainfrom
GitGlimpse895:fix/quantized-layer-cache-ops
Open

cache_utils: fix QuantizedLayer to correctly propagate reorder_cache, crop, and batch ops to quantized buffers#45510
GitGlimpse895 wants to merge 1 commit intohuggingface:mainfrom
GitGlimpse895:fix/quantized-layer-cache-ops

Conversation

@GitGlimpse895
Copy link
Copy Markdown

@GitGlimpse895 GitGlimpse895 commented Apr 19, 2026

What does this PR do?

QuantizedLayer maintains two separate storage regions: a full-precision
residual buffer (self.keys / self.values) and a quantized buffer
(self._quantized_keys / self._quantized_values). However, the four
mutation methods inherited from DynamicLayerreorder_cache,
crop, batch_repeat_interleave, and batch_select_indices — only
operated on the residual buffer, silently leaving the quantized buffer
untouched.

Concrete failure modes:

  • Beam search (reorder_cache): the quantized buffer stays in
    original beam order while the residual reorders, causing crossed
    attention across beams with no error raised.
  • Constrained generation rollback (crop): cumulative_length
    diverges from the actual stored state, corrupting subsequent
    get_seq_length calls.
  • Group beam search / contrastive decoding (batch_select_indices,
    batch_repeat_interleave): batch dimension of the quantized buffer
    is never updated, producing mismatched batch sizes between the two
    storage regions.

This PR overrides all four methods in QuantizedLayer. Since
_quantized_keys/_quantized_values are opaque backend objects for
both QuantoQuantizedLayer and HQQQuantizedLayer, the fix uses a
dequantize → operate → re-quantize pattern, which is backend-agnostic
and does not compound quantization error meaningfully beyond what is
already introduced at the original quantization step.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@gante @SunMarc

  • I confirm that this is not a pure code agent PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant