Skip to content

Conversation

gante
Copy link
Member

@gante gante commented Sep 2, 2025

What does this PR do?

The CausalLMModelTest mixin has RoPE tests, but it requires setting rotary_embedding_layer in each model tester. In most models, it was unset, so they were not running RoPE-related tests -- exposing ourselves to easily preventable issues like #40461 😱

This PR:

  • Removes rotary_embedding_layer from the CausalLMModelTest mixin.
  • Automates detection of RoPE-compatible models in CausalLMModelTest, and uses it to enable RoPE tests on each model. No more manual errors 🤗
  • Adds test skips where needed.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think we can infer it directly from a model no? I.e. based on the model's module names, we can get the class dynamically and reinstantiate directly from it
Would avoid setting it explicitly in all model testers. WDYT?

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Also noticed this when working on RoPE refactoring. I guess we can enable these tests without checking for rotary_embedding_layer and instead have a heuristic to check if model has a layer called self.rotary_embedding

That way we'll be sure the test is run in all models. and if model is special it will skip the test. WDYT?

@gante
Copy link
Member Author

gante commented Sep 3, 2025

@Cyrilvallez @zucchini-nlp great PR comments, we can (and should!) automate test runs.

The latest commit removes the manual rotary_embedding_layer -- we now automatically run RoPE tests on models with RoPE, and programatically find the RoPE class in the model. This should future-proof things for a while 🤗

@gante gante changed the title [RoPE] enable missing rope tests on many modern models [RoPE] run RoPE tests when the model uses RoPE Sep 3, 2025
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the test suite better

# Retrieves the RoPE layer class from the base model class. Assumption: the RoPE layer is under a few
# possible attribute names and is found in the base model class. In some (inconsistent) cases, it may be
# found in the self_attention layer instead.
base_model = self.model_tester.base_model_class(config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need to model.get_decoder as well for models where the lm backbone is hidden inside the base model. Though ig these tests aren't yet used in multimodal models

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Given that the tests are only run on decoder-only models for now, I'd rather leave as is (and upgrade when it's needed) 🤗

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice, thanks!

Comment on lines 375 to 385
for rope_attr in possible_rope_attributes:
rope_class = getattr(base_model, rope_attr, None) # expected pattern
if (
rope_class is None
and hasattr(base_model, "layers")
and hasattr(base_model.layers[0], "self_attention")
):
rope_class = getattr(base_model.layers[0].self_attention, rope_attr, None) # fallback
if rope_class is not None:
rope_class = type(rope_class)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make it a bit more general/catch more modules if we do something like

for name, module in model.named_modules():
    if any(potential_name in name for potential_name in possible_rope_attributes):
         rope_class = type(module)
         break

-- it would avoid edge cases when layers are not named layers, or attention is not self_attention

Copy link
Member Author

@gante gante Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 👍

(and confirmed that it doesn't have a significative negative impact on test runtime)

@gante gante enabled auto-merge (squash) September 9, 2025 16:02
Copy link
Contributor

github-actions bot commented Sep 9, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: arcee, dbrx, deepseek_v2, ernie4_5, ernie4_5_moe, hunyuan_v1_dense, hunyuan_v1_moe, llama, minimax, mistral, nemotron, phi, phi3, phimoe, recurrent_gemma, stablelm

@gante gante merged commit d33c189 into huggingface:main Sep 9, 2025
24 checks passed
@gante gante deleted the yarn_init_test branch September 9, 2025 16:14
vijayabhaskar-ev pushed a commit to vijayabhaskar-ev/transformers that referenced this pull request Oct 2, 2025
* enable rope tests

* no manual rope test parameterization

* Apply suggestions from code review

* Update tests/models/hunyuan_v1_dense/test_modeling_hunyuan_v1_dense.py

* PR comment: use generalist torch code to find the rope layer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants