Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Nov 18, 2025

What does this PR do?

To finalize the work on rope config, I am moving rotary_partial_emb to rope parameter dict as well. Along with it, I did some clean-up on standardization because we can make a few assumptions with the models we have

Note, PR is breaking BC completely and users will no longer have access to config.rope_theta since I pop it from config kwargs manually. That way is more clear imo than having two rope thetas in different places

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp changed the title [WIP] Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 Nov 26, 2025
Comment on lines 85 to 91
def get_standardized_rope_params(config):
"""
Helper to standardize the config's rope params field by ensuring the params are defined for each
later type. For old model the fn will duplicate a single rope param in each layer type (backward compatibility)
"""
rope_parameters = getattr(config, "rope_parameters", None)
layer_types = getattr(config, "layer_types", None)
if rope_theta is None:
rope_theta = getattr(config, "rope_theta", None)
rope_parameters = getattr(config, "rope_parameters", {})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could have been simplified if we make a few assumption, and we can make assumptions because only 2 models have a nested rope parameterization

@vasqu vasqu mentioned this pull request Nov 27, 2025
5 tasks
Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2 cents 😄

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general if we can put stuff in PreTrainedConfig I am also happy, but fine this way as well

zucchini-nlp and others added 6 commits November 27, 2025 16:19
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…loftr.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RotaryEmbeddingConfigMixin is my Christmas gift! ty its a lot better

@zucchini-nlp
Copy link
Member Author

run-slow: llama, gemma3, qwen2, qwen2_vl, mistral, mixtral, modernbert, llava

@zucchini-nlp zucchini-nlp changed the title Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 Nov 28, 2025
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon, cohere, cohere2, csm, cwm, dbrx, deepseek_v2, deepseek_v3, dia, diffllama

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/llama", "models/llava", "models/mistral", "models/mixtral", "models/modernbert", "models/qwen2", "models/qwen2_vl"]
quantizations: []

@zucchini-nlp
Copy link
Member Author

Doc-builder and weight tying tests will fail but are not related

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@ArthurZucker ArthurZucker merged commit 078ff68 into huggingface:main Nov 28, 2025
20 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants