Skip to content

Fix loading of Qwen3 FP8#43494

Merged
vasqu merged 2 commits intohuggingface:mainfrom
githubnemo:issue/fp8-loading-exception
Jan 27, 2026
Merged

Fix loading of Qwen3 FP8#43494
vasqu merged 2 commits intohuggingface:mainfrom
githubnemo:issue/fp8-loading-exception

Conversation

@githubnemo
Copy link
Contributor

The Qwen3 MoE config was missing the mapping attribute for the num_expert_local config variable which made it impossible to load FP8 quantized models, due to the following exception:

Traceback (most recent call last):
  File ".../exps/train-qwen3-lora.py", line 4, in <module>
    base_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-30B-A3B-Thinking-2507-FP8')
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/models/auto/auto_factory.py", line 372, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/modeling_utils.py", line 4075, in from_pretrained
    hf_quantizer.preprocess_model(
  File ".../transformers/src/transformers/quantizers/base.py", line 167, in preprocess_model
    self._process_model_before_weight_loading(model, **kwargs)
  File ".../transformers/src/transformers/quantizers/quantizer_finegrained_fp8.py", line 106, in _process_model_before_weight_loading
    model = replace_with_fp8_linear(
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 617, in replace_with_fp8_linear
    new_module = FP8Expert(
                 ^^^^^^^^^^
  File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 496, in __init__
    self.num_experts = config.num_local_experts
                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/configuration_utils.py", line 164, in __getattribute__
    return super().__getattribute__(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3MoeConfig' object has no attribute 'num_local_experts'

A small reproducer is added in the form of a unit test.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

The Qwen3 MoE config was missing the mapping attribute for the num_expert_local
config variable which made it impossible to load FP8 quantized models, due
to the following exception:

```
Traceback (most recent call last):
  File ".../exps/train-qwen3-lora.py", line 4, in <module>
    base_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-30B-A3B-Thinking-2507-FP8')
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/models/auto/auto_factory.py", line 372, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/modeling_utils.py", line 4075, in from_pretrained
    hf_quantizer.preprocess_model(
  File ".../transformers/src/transformers/quantizers/base.py", line 167, in preprocess_model
    self._process_model_before_weight_loading(model, **kwargs)
  File ".../transformers/src/transformers/quantizers/quantizer_finegrained_fp8.py", line 106, in _process_model_before_weight_loading
    model = replace_with_fp8_linear(
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 617, in replace_with_fp8_linear
    new_module = FP8Expert(
                 ^^^^^^^^^^
  File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 496, in __init__
    self.num_experts = config.num_local_experts
                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/configuration_utils.py", line 164, in __getattribute__
    return super().__getattribute__(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3MoeConfig' object has no attribute 'num_local_experts'
```

A small reproducer is added in the form of a unit test.
@githubnemo githubnemo force-pushed the issue/fp8-loading-exception branch from a98c65f to efe56ef Compare January 26, 2026 11:58
@Rocketknight1
Copy link
Member

cc @MekkCyber for quants

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, see the fix I meant for proper fp8 usage in the tests but cc @MekkCyber @SunMarc for sanity checking if I'm wrong

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_moe, qwen3_omni_moe, finegrained_fp8

@vasqu
Copy link
Contributor

vasqu commented Jan 27, 2026

Merging since I verified locally that it works

@vasqu vasqu merged commit a1f63d5 into huggingface:main Jan 27, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants