-
Notifications
You must be signed in to change notification settings - Fork 31.3k
🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪
#42255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
rotary_partial_emb to RopeParams and delete unnecessary code 🔪 rotary_partial_emb to RopeParams and delete unnecessary code 🔪
| def get_standardized_rope_params(config): | ||
| """ | ||
| Helper to standardize the config's rope params field by ensuring the params are defined for each | ||
| later type. For old model the fn will duplicate a single rope param in each layer type (backward compatibility) | ||
| """ | ||
| rope_parameters = getattr(config, "rope_parameters", None) | ||
| layer_types = getattr(config, "layer_types", None) | ||
| if rope_theta is None: | ||
| rope_theta = getattr(config, "rope_theta", None) | ||
| rope_parameters = getattr(config, "rope_parameters", {}) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could have been simplified if we make a few assumption, and we can make assumptions because only 2 models have a nested rope parameterization
vasqu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my 2 cents 😄
src/transformers/models/efficientloftr/configuration_efficientloftr.py
Outdated
Show resolved
Hide resolved
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general if we can put stuff in PreTrainedConfig I am also happy, but fine this way as well
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…loftr.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RotaryEmbeddingConfigMixin is my Christmas gift! ty its a lot better
|
run-slow: llama, gemma3, qwen2, qwen2_vl, mistral, mixtral, modernbert, llava |
rotary_partial_emb to RopeParams and delete unnecessary code 🔪 rotary_partial_emb to RopeParams and delete unnecessary code 🔪
|
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon, cohere, cohere2, csm, cwm, dbrx, deepseek_v2, deepseek_v3, dia, diffllama |
|
This comment contains models: ["models/gemma3", "models/llama", "models/llava", "models/mistral", "models/mixtral", "models/modernbert", "models/qwen2", "models/qwen2_vl"] |
|
Doc-builder and weight tying tests will fail but are not related |
CI Results✅ No failing test specific to this PR 🎉 ! |
What does this PR do?
To finalize the work on rope config, I am moving
rotary_partial_embto rope parameter dict as well. Along with it, I did some clean-up on standardization because we can make a few assumptions with the models we haveNote, PR is breaking BC completely and users will no longer have access to
config.rope_thetasince I pop it from config kwargs manually. That way is more clear imo than having two rope thetas in different places