-
Notifications
You must be signed in to change notification settings - Fork 30.6k
[RoPE] run RoPE tests when the model uses RoPE #40630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think we can infer it directly from a model no? I.e. based on the model's module names, we can get the class dynamically and reinstantiate directly from it
Would avoid setting it explicitly in all model testers. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Also noticed this when working on RoPE refactoring. I guess we can enable these tests without checking for rotary_embedding_layer
and instead have a heuristic to check if model has a layer called self.rotary_embedding
That way we'll be sure the test is run in all models. and if model is special it will skip the test. WDYT?
@Cyrilvallez @zucchini-nlp great PR comments, we can (and should!) automate test runs. The latest commit removes the manual |
tests/models/hunyuan_v1_dense/test_modeling_hunyuan_v1_dense.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the test suite better
# Retrieves the RoPE layer class from the base model class. Assumption: the RoPE layer is under a few | ||
# possible attribute names and is found in the base model class. In some (inconsistent) cases, it may be | ||
# found in the self_attention layer instead. | ||
base_model = self.model_tester.base_model_class(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might need to model.get_decoder
as well for models where the lm backbone is hidden inside the base model. Though ig these tests aren't yet used in multimodal models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Given that the tests are only run on decoder-only models for now, I'd rather leave as is (and upgrade when it's needed) 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice, thanks!
tests/causal_lm_tester.py
Outdated
for rope_attr in possible_rope_attributes: | ||
rope_class = getattr(base_model, rope_attr, None) # expected pattern | ||
if ( | ||
rope_class is None | ||
and hasattr(base_model, "layers") | ||
and hasattr(base_model.layers[0], "self_attention") | ||
): | ||
rope_class = getattr(base_model.layers[0].self_attention, rope_attr, None) # fallback | ||
if rope_class is not None: | ||
rope_class = type(rope_class) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make it a bit more general/catch more modules if we do something like
for name, module in model.named_modules():
if any(potential_name in name for potential_name in possible_rope_attributes):
rope_class = type(module)
break
-- it would avoid edge cases when layers are not named layers
, or attention is not self_attention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 👍
(and confirmed that it doesn't have a significative negative impact on test runtime)
[For maintainers] Suggested jobs to run (before merge) run-slow: arcee, dbrx, deepseek_v2, ernie4_5, ernie4_5_moe, hunyuan_v1_dense, hunyuan_v1_moe, llama, minimax, mistral, nemotron, phi, phi3, phimoe, recurrent_gemma, stablelm |
* enable rope tests * no manual rope test parameterization * Apply suggestions from code review * Update tests/models/hunyuan_v1_dense/test_modeling_hunyuan_v1_dense.py * PR comment: use generalist torch code to find the rope layer
What does this PR do?
The
CausalLMModelTest
mixin has RoPE tests, but it requires settingrotary_embedding_layer
in each model tester. In most models, it was unset, so they were not running RoPE-related tests -- exposing ourselves to easily preventable issues like #40461 😱This PR:
rotary_embedding_layer
from theCausalLMModelTest
mixin.CausalLMModelTest
, and uses it to enable RoPE tests on each model. No more manual errors 🤗