Skip to content

Allow Attention and Experts to be used as standalone modules#43622

Merged
Cyrilvallez merged 9 commits intomainfrom
experts-default
Jan 30, 2026
Merged

Allow Attention and Experts to be used as standalone modules#43622
Cyrilvallez merged 9 commits intomainfrom
experts-default

Conversation

@Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Jan 30, 2026

What does this PR do?

As per the title. Also fix a vllm regresion on experts. Allow Experts and Attention module to be used on their own, such as

import torch
from transformers import AutoConfig
from transformers.models.mixtral.modeling_mixtral import MixtralExperts
config = AutoConfig.from_pretrained("mistralai/mixtral-8x7b-v0.1")
experts = MixtralExperts(config)
experts.forward(torch.randn((1, 64, experts.config.hidden_size)))
 File "/home/harry/transformers/src/transformers/integrations/moe.py", line 317, in forward
    experts_forward = ALL_EXPERTS_FUNCTIONS[self.config._experts_implementation]
                      ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/harry/transformers/src/transformers/utils/generic.py", line 1070, in __getitem__
    return self._global_mapping[key]
           ~~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: None

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a small test so that we don't have a regression?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hmellor hmellor added the for patch Tag issues / labels that should be included in the next patch label Jan 30, 2026
@Cyrilvallez Cyrilvallez changed the title [moe] Keep the default forward when config attribute is None Allow Attention and Experts to be used as standalone modules Jan 30, 2026
Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just 2 nits

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, aimv2, albert, align, altclip, apertus, arcee, aria, audio_spectrogram_transformer, audioflamingo3, bamba, bart, bert, bert_generation, bigbird_pegasus, biogpt

@Cyrilvallez Cyrilvallez merged commit 255c62a into main Jan 30, 2026
24 of 26 checks passed
@Cyrilvallez Cyrilvallez deleted the experts-default branch January 30, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants