🚨 Move `rotary_partial_emb` to RopeParams and delete unnecessary code 🔪 #42255

zucchini-nlp · 2025-11-18T10:24:56Z

What does this PR do?

To finalize the work on rope config, I am moving rotary_partial_emb to rope parameter dict as well. Along with it, I did some clean-up on standardization because we can make a few assumptions with the models we have

Note, PR is breaking BC completely and users will no longer have access to config.rope_theta since I pop it from config kwargs manually. That way is more clear imo than having two rope thetas in different places

HuggingFaceDocBuilderDev · 2025-11-18T10:34:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…e rope dict

zucchini-nlp · 2025-11-26T21:57:36Z

src/transformers/modeling_rope_utils.py

+def get_standardized_rope_params(config):
    """
    Helper to standardize the config's rope params field by ensuring the params are defined for each
    later type. For old model the fn will duplicate a single rope param in each layer type (backward compatibility)
    """
-    rope_parameters = getattr(config, "rope_parameters", None)
-    layer_types = getattr(config, "layer_types", None)
-    if rope_theta is None:
-        rope_theta = getattr(config, "rope_theta", None)
+    rope_parameters = getattr(config, "rope_parameters", {})



could have been simplified if we make a few assumption, and we can make assumptions because only 2 models have a nested rope parameterization

src/transformers/modeling_rope_utils.py

src/transformers/models/apertus/configuration_apertus.py

vasqu

Just my 2 cents 😄

src/transformers/modeling_rope_utils.py

src/transformers/models/apertus/configuration_apertus.py

src/transformers/models/efficientloftr/configuration_efficientloftr.py

ArthurZucker

In general if we can put stuff in PreTrainedConfig I am also happy, but fine this way as well

src/transformers/models/apertus/configuration_apertus.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

…loftr.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

ArthurZucker

RotaryEmbeddingConfigMixin is my Christmas gift! ty its a lot better

src/transformers/models/apertus/configuration_apertus.py

zucchini-nlp · 2025-11-28T10:54:07Z

run-slow: llama, gemma3, qwen2, qwen2_vl, mistral, mixtral, modernbert, llava

github-actions · 2025-11-28T10:54:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon, cohere, cohere2, csm, cwm, dbrx, deepseek_v2, deepseek_v3, dia, diffllama

github-actions · 2025-11-28T10:55:16Z

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/llama", "models/llava", "models/mistral", "models/mixtral", "models/modernbert", "models/qwen2", "models/qwen2_vl"]
quantizations: []

zucchini-nlp · 2025-11-28T10:57:37Z

Doc-builder and weight tying tests will fail but are not related

github-actions · 2025-11-28T11:23:24Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

tmp

b8f8dd8

zucchini-nlp added 8 commits November 18, 2025 16:02

batch push

2ee00d0

maybe better pop and break, and we'll have one theta per config in th…

b64791e

…e rope dict

update a few models?

a2b780b

fix tests that are easu first

ccc697a

dont overwrite if already present!!!

eb282d1

partial rotary factor

5a87125

more fixes to the god of fixes

6d07c32

rebase

dfa93a1

zucchini-nlp changed the title ~~[WIP] Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪~~ Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 Nov 26, 2025

zucchini-nlp commented Nov 26, 2025

View reviewed changes

src/transformers/modeling_rope_utils.py Outdated Show resolved Hide resolved

zucchini-nlp commented Nov 26, 2025

View reviewed changes

src/transformers/models/apertus/configuration_apertus.py Outdated Show resolved Hide resolved

zucchini-nlp added 2 commits November 27, 2025 09:36

setdefault

22f94e2

fix copies

6f4ed17

vasqu mentioned this pull request Nov 27, 2025

Add support for MiniMax-M2 #42028

Open

5 tasks

vasqu reviewed Nov 27, 2025

View reviewed changes

ArthurZucker approved these changes Nov 27, 2025

View reviewed changes

src/transformers/models/apertus/configuration_apertus.py Outdated Show resolved Hide resolved

zucchini-nlp and others added 6 commits November 27, 2025 16:19

Update src/transformers/modeling_rope_utils.py

b3fa5cf

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update src/transformers/models/efficientloftr/configuration_efficient…

b2ca2eb

…loftr.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

attempt one

32adaac

update all models

5bb12c4

fix tests

f5dd9d5

fix tests

a50598e

ArthurZucker approved these changes Nov 28, 2025

View reviewed changes

src/transformers/models/apertus/configuration_apertus.py Outdated Show resolved Hide resolved

zucchini-nlp added 4 commits November 28, 2025 09:37

oops

e4f2b82

fix slow tests with nested rope models

f9260c4

fix copies

a1dbf30

merge main

117732b

zucchini-nlp added 4 commits November 28, 2025 11:19

deal with circular import and move the mixin to base config class

80a1283

fix copies

0bb5402

fix a few tests

3e18fd3

update the migration guide

594523e

zucchini-nlp added the for_v5? label Nov 28, 2025

zucchini-nlp changed the title ~~Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪~~ 🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 Nov 28, 2025

ArthurZucker merged commit 078ff68 into huggingface:main Nov 28, 2025
20 of 25 checks passed

hmellor mentioned this pull request Nov 28, 2025

Fix RoPE failures in Transformers nightly vllm-project/vllm#29700

Open

🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 #42255

🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 #42255

Conversation

zucchini-nlp commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 18, 2025

Uh oh!

zucchini-nlp Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

zucchini-nlp commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

CI Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

🚨 Move `rotary_partial_emb` to RopeParams and delete unnecessary code 🔪 #42255

🚨 Move `rotary_partial_emb` to RopeParams and delete unnecessary code 🔪 #42255

zucchini-nlp commented Nov 18, 2025 •

edited

Loading