Use attention interface in RoFormerSelfAttention by HamzaDogann · Pull Request #46147 · huggingface/transformers

HamzaDogann · 2026-05-21T20:30:41Z

RoFormer's self-attention was using a hardcoded eager implementation,
making it impossible to use alternative attention backends.

This PR replaces the hardcoded computation with ALL_ATTENTION_FUNCTIONS
dispatch and adds a local eager_attention_forward as the default fallback,
preserving existing behavior while enabling flash_attention_2, sdpa,
and custom attention implementations via _attn_implementation.

Closes #46144

Replace hardcoded eager attention computation with `ALL_ATTENTION_FUNCTIONS` dispatch, enabling users to plug in custom attention backends (e.g. `flash_attention_2`, `sdpa`) via the standard `_attn_implementation` config field. Adds a local `eager_attention_forward` function as the default fallback that preserves existing behavior. Closes huggingface#46144

Copilot

Pull request overview

This PR updates RoFormer’s self-attention to use the shared Transformers attention interface (ALL_ATTENTION_FUNCTIONS) instead of a hardcoded eager implementation, enabling alternative attention backends and custom attention implementations via config._attn_implementation.

Changes:

Added a local eager_attention_forward fallback and switched RoFormerSelfAttention to dispatch through ALL_ATTENTION_FUNCTIONS.get_interface(...).
Stored config and precomputed scaling in RoFormerSelfAttention for reuse by attention backends.

…lags - Pass an explicit `is_causal=False` and forward `output_attentions` to the attention interface, so `flash_attention_2`/`sdpa` no longer depend on a missing `module.is_causal` attribute (RoFormer precomputes a bidirectional mask, so no extra causal mask is needed). - Mark `eager_attention_forward` as `# Copied from` BERT so `make fixup` keeps it in sync with the shared implementation. - Declare `_supports_flash_attn`, `_supports_sdpa`, `_supports_flex_attn` and `_supports_attention_backend` on `RoFormerPreTrainedModel` so the new backends can actually be selected.

github-actions · 2026-05-21T21:12:01Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: roformer

Rocketknight1 · 2026-05-22T09:41:22Z

When the issue creator says

The fix is simple and I would like to create a PR for it.

This means you should not rush in and immediately vibe code a PR

Copilot AI review requested due to automatic review settings May 21, 2026 20:30

Copilot started reviewing on behalf of HamzaDogann May 21, 2026 20:31 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread src/transformers/models/roformer/modeling_roformer.py

Comment thread src/transformers/models/roformer/modeling_roformer.py

Comment thread src/transformers/models/roformer/modeling_roformer.py

Rocketknight1 closed this May 22, 2026

Rocketknight1 added the Code agent slop label May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use attention interface in RoFormerSelfAttention#46147

Use attention interface in RoFormerSelfAttention#46147
HamzaDogann wants to merge 2 commits into
huggingface:mainfrom
HamzaDogann:fix/roformer-attention-interface

HamzaDogann commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Rocketknight1 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HamzaDogann commented May 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Rocketknight1 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants