Remove cache_position in more models (3) by Cyrilvallez · Pull Request #44759 · huggingface/transformers

Cyrilvallez · 2026-03-16T15:38:13Z

What does this PR do?

Follow-up of many related PR, last one in time being #44602.

This PR completes all the models that may need non-trivial treatment. Only about 30-40 models still have mentions of cache_position, and those are trivial arg forwarding/cache update. They will be extremely easy to remove in #44667 (or any other PR)

HuggingFaceDocBuilderDev · 2026-03-16T15:48:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2026-03-16T16:08:34Z

I think it's overlapping with #44667 😅 Saw an opportunity and started deleting last week

Cyrilvallez · 2026-03-16T17:42:55Z

Ha indeed 😬 Though from what I'm seeing, I don't think the change are fully correct on #44667, at least for the mambas! If you don't mind, can we keep this PR at least for the mambas and a few audio models which need special treatments? I believe most of the rest is very standard, and everything can be simply erased

zucchini-nlp · 2026-03-16T18:31:29Z

@Cyrilvallez ah yeah, the PR is still at fix-and-replace stage and I didn't have time to check out mambas and audio models. Those need special treatment

I am fine with merging this PR first with mambas and I'll rebase the second one later. Does that sound good?

Cyrilvallez · 2026-03-17T15:43:03Z

run-slow: kyutai_speech_to_text mamba encoder_decoder clipseg falcon_h1 clvp xglm x_clip gptj musicgen_melody musicgen recurrent_gemma vision_encoder_decoder csm shieldgemma2 speech_to_text ctrl moshi chameleon zamba2 nemotron_h zamba owlvit umt5 groupvit falcon_mamba cpmant owlv2 falcon mamba2 codegen

github-actions · 2026-03-17T16:00:49Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	0961e31a	workflow commit (merge commit)
PR	db7035e5	branch commit (from PR)
main	bbe251a4	base commit (on `main`)

⚠️ No test being reported (jobs are skipped or cancelled)!

github-actions · 2026-03-17T16:07:56Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/chameleon", "models/clipseg", "models/clvp", "models/codegen", "models/cpmant", "models/csm", "models/ctrl", "models/encoder_decoder", "models/falcon", "models/falcon_h1", "models/falcon_mamba", "models/gptj", "models/groupvit", "models/kyutai_speech_to_text", "models/mamba", "models/mamba2", "models/moshi", "models/musicgen", "models/musicgen_melody", "models/nemotron_h", "models/owlv2", "models/owlvit", "models/recurrent_gemma", "models/shieldgemma2", "models/speech_to_text", "models/umt5", "models/vision_encoder_decoder", "models/x_clip", "models/xglm", "models/zamba", "models/zamba2"]
quantizations: []

github-actions · 2026-03-17T17:39:55Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	a2a93b4a	workflow commit (merge commit)
PR	aabc9f73	branch commit (from PR)
main	af93d384	base commit (on `main`)

⚠️ Model CI failed to report results

The test failure analysis could not be completed. Please check the workflow run for details.

Cyrilvallez · 2026-03-18T00:12:34Z

run-slow is not working, but I personally checked that all IntegrationTests were similar on this PR and on main! So all good!

Cyrilvallez · 2026-03-18T00:17:04Z

src/transformers/models/mamba/modeling_mamba.py

+        if cache_init:
+            self.conv_states[layer_idx].copy_(new_conv_state)
+        else:
+            conv_state = self.conv_states[layer_idx].roll(shifts=-1, dims=-1)
+            conv_state[:, :, -1:] = new_conv_state
+            self.conv_states[layer_idx].copy_(conv_state)


@vasqu this is what I discussed with you offline. This is technically not correct here and in some other mambas, however should be fine in practice. So I simplified this to remove the useless usage of cache_position and align with mamba2. The behavior is the same as before (technically wrong), but let's fix later when we refactor mamba caches as it should be fine in practice!

Yup, let's add a small comment there tho so others are aware

vasqu

Careful approval, mostly nits but a few smaller questions which might be relevant

src/transformers/models/clvp/modeling_clvp.py

src/transformers/models/codegen/modeling_codegen.py

src/transformers/models/cpmant/modeling_cpmant.py

src/transformers/models/csm/modeling_csm.py

src/transformers/models/moshi/modeling_moshi.py

vasqu · 2026-03-18T12:04:42Z

src/transformers/models/musicgen/modeling_musicgen.py

    def prepare_inputs_for_generation(
        self,
        decoder_input_ids,
+        next_sequence_length: int | None = None,


Just to be sure, this is the correct order, not that passing args here would mess things up

Yes, I mirrored the general prepare_inputs_for_generation here. Though we always pass next_sequence_length as kwarg in generate anyway to avoid those issues 👌

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

vasqu · 2026-03-18T12:06:10Z

src/transformers/models/umt5/modeling_umt5.py

Same as in t5 I suppose? Just that this did not have any copies or similar anymore?

vasqu · 2026-03-18T12:07:44Z

src/transformers/models/voxtral_realtime/modeling_voxtral_realtime.py

-            end_idx = (model_inputs["cache_position"][-1] + 1) * self.config.downsample_factor
+            past_key_values = model_inputs.get("past_key_values")
+            past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
+            current_seq_len = model_inputs.get("position_ids").shape[-1]


Sure that it's shape?

Yes, as it was already truncated in the super() call. We could use the main input (i.e. input_ids/inputs_embeds), but as it can change between the 2, easier to use the pos_ids which are always there

github-actions · 2026-03-18T12:53:03Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: chameleon, clipseg, clvp, codegen, cpmant, csm, ctrl, encoder_decoder, falcon, falcon_h1, falcon_mamba, gptj, groupvit, kyutai_speech_to_text, mamba, mamba2

start on the mambas

c390b67

Cyrilvallez changed the title ~~start on the mambas~~ Remove cache_position in more models (3) Mar 16, 2026

fix mambas

86bd919

Cyrilvallez and others added 11 commits March 16, 2026 19:42

moshi and kyutai

d4dc9c8

a few more special ones

82ff06f

Merge branch 'main' into cache-position

4e98e52

refactor recurrent gemma

a5146ca

a bit more

a2120f7

zambas

db7035e

fix mamba

7b7be2c

fixes

24d7ea0

fix csm

d642b67

kyutao

56b8185

fix test

aabc9f7

huggingface deleted a comment from github-actions bot Mar 17, 2026

Cyrilvallez and others added 3 commits March 17, 2026 18:49

Merge branch 'main' into cache-position

8198d08

align and simpolify mamba cache

b5cba0e

fix mask for recurrent gemma

a55266c

huggingface deleted a comment from github-actions bot Mar 18, 2026

Cyrilvallez commented Mar 18, 2026

View reviewed changes

small oupsi

44d4038

vasqu approved these changes Mar 18, 2026

View reviewed changes

align falcon_h1 + other review comments

f9f6e66

Merge branch 'main' into cache-position

c795b63

Cyrilvallez merged commit 83a6c5b into main Mar 18, 2026
29 checks passed

Cyrilvallez deleted the cache-position branch March 18, 2026 13:09

Cyrilvallez mentioned this pull request Mar 18, 2026

Remove cache_position in more models (4 and last one) #44828

Merged

Conversation

Cyrilvallez commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 16, 2026

Uh oh!

zucchini-nlp commented Mar 16, 2026

Uh oh!

Cyrilvallez commented Mar 16, 2026

Uh oh!

zucchini-nlp commented Mar 16, 2026

Uh oh!

Cyrilvallez commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

CI Results

Commit Info

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

CI Results

Commit Info

Uh oh!

Cyrilvallez commented Mar 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez commented Mar 16, 2026 •

edited

Loading