Remove `cache_position` in more models (2) by Cyrilvallez · Pull Request #44602 · huggingface/transformers

Cyrilvallez · 2026-03-11T16:19:43Z

What does this PR do?

As per the title. Follow up of #44330

Also take the opportunity to simplify t5 and its children, because the way they computeposition_bias was super convoluted/overcomplicated, and required several additional unnecessary arguments etc...

HuggingFaceDocBuilderDev · 2026-03-11T16:32:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez · 2026-03-11T20:00:59Z

run-slow: bridgetower mt5 qwen2_5_vl xlm_roberta audioflamingo3 data2vec_text gpt2 qwen2_5_omni roberta smolvlm camembert gpt_neo paddleocr_vl pix2struct idefics3 llama4 bark mllama bert_generation qwen2_vl whisper xlm_roberta_xl electra esm clip bert decision_transformer idefics2 pop2piano t5 blt blip_text idefics udop bloom roc_bert roberta_prelayernorm autoformer qwen2_audio ernie switch_transformers longt5 xmod

github-actions · 2026-03-11T20:03:03Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/audioflamingo3", "models/autoformer", "models/bark", "models/bert", "models/bert_generation", "models/bloom", "models/blt", "models/bridgetower", "models/camembert", "models/clip", "models/decision_transformer", "models/electra", "models/ernie", "models/esm", "models/gpt2", "models/gpt_neo", "models/idefics", "models/idefics2", "models/idefics3", "models/llama4", "models/longt5", "models/mllama", "models/mt5", "models/paddleocr_vl", "models/pix2struct", "models/pop2piano", "models/qwen2_5_omni", "models/qwen2_5_vl", "models/qwen2_audio", "models/qwen2_vl", "models/roberta", "models/roberta_prelayernorm", "models/roc_bert", "models/smolvlm", "models/switch_transformers", "models/t5", "models/udop", "models/whisper", "models/xlm_roberta", "models/xlm_roberta_xl", "models/xmod"]
quantizations: []

github-actions · 2026-03-11T21:59:59Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	d5c33a0e	workflow commit (merge commit)
PR	17fd0942	branch commit (from PR)
main	852f785a	base commit (on `main`)

Model CI Report

❌ 2 new failed tests from this PR 😭

idefics2:
tests/models/idefics2/test_modeling_idefics2.py::Idefics2ForConditionalGenerationIntegrationTest::test_integration_test_4bit (❌ ⟹ ❌)
tests/models/idefics2/test_modeling_idefics2.py::Idefics2ForConditionalGenerationIntegrationTest::test_integration_test_4bit_batch2 (❌ ⟹ ❌)

Cyrilvallez · 2026-03-12T08:41:37Z

The 2 failed idefics2 tests are false positive: they actually both pass locally, on this PR and on main.
So no new failures at all!

vasqu

Just a few smaller comments, looks overall good, thanks

Probably needs to resolve some merge conflicts because of the capturing PR I merged, sorry 😬

src/transformers/models/autoformer/modeling_autoformer.py

src/transformers/models/bert/modeling_bert.py

src/transformers/models/clip/modeling_clip.py

src/transformers/models/ernie/modular_ernie.py

src/transformers/models/llama4/modeling_llama4.py

vasqu · 2026-03-12T20:30:55Z

src/transformers/models/longt5/modeling_longt5.py

        return relative_buckets

-    def compute_bias(self, query_length, key_length, device=None, cache_position=None):
+    def compute_bias(self, query_length, key_length, device=None, past_seen_tokens=0):


That's the cleanup I suppose

Well that and all the weird stuff about real_seq_length, passing a query_length as argument to the Attention's forward which is not the actual query length, etc...

vasqu · 2026-03-12T20:31:24Z

src/transformers/models/longt5/modeling_longt5.py

 class LongT5Block(GradientCheckpointingLayer):
    def __init__(self, config, has_relative_attention_bias=False, layer_idx: int | None = None):
        super().__init__()
+        self.layer_idx = layer_idx


Interesting, was this not needed before?

Wrong leftover! Good catch!

src/transformers/models/mllama/modeling_mllama.py

github-actions · 2026-03-12T22:23:55Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, autoformer, bark, bert, bert_generation, blip, bloom, blt, bridgetower, camembert, clip, data2vec, decision_transformer, electra, ernie, ernie4_5_vl_moe

Cyrilvallez added 2 commits March 11, 2026 17:18

bert and the likes

be89f43

NO COPIED FROM IN MODULAR IT IS AUTOMATIC

9e60bd8

Cyrilvallez added 6 commits March 11, 2026 17:57

more more

280ca6f

idefics

d416b1b

more

450f699

t5 and the likes

04d98f1

fix and simplify the t5s

59f3377

fix random test

17fd094

huggingface deleted a comment from github-actions bot Mar 11, 2026

Merge branch 'main' into more-cache-pos

d3ed9b6

huggingface deleted a comment from github-actions bot Mar 12, 2026

vasqu approved these changes Mar 12, 2026

View reviewed changes

Cyrilvallez added 5 commits March 12, 2026 22:13

merge main and fix conflicts

8a54a54

review comments

fac7bda

those got added in between but were previously removed

1003c65

this random signature test again...

13fcb3f

CI

4e5fea8

Cyrilvallez merged commit ca960f0 into main Mar 12, 2026
29 checks passed

Cyrilvallez deleted the more-cache-pos branch March 12, 2026 22:38

Cyrilvallez mentioned this pull request Mar 16, 2026

Remove cache_position in more models (3) #44759

Merged

Conversation

Cyrilvallez commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 11, 2026

Uh oh!

Cyrilvallez commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

Cyrilvallez commented Mar 12, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cyrilvallez commented Mar 11, 2026 •

edited

Loading