Fix loading of Qwen3 FP8 by githubnemo · Pull Request #43494 · huggingface/transformers

githubnemo · 2026-01-26T11:34:05Z

The Qwen3 MoE config was missing the mapping attribute for the num_expert_local config variable which made it impossible to load FP8 quantized models, due to the following exception:

Traceback (most recent call last):
  File ".../exps/train-qwen3-lora.py", line 4, in <module>
    base_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-30B-A3B-Thinking-2507-FP8')
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/models/auto/auto_factory.py", line 372, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/modeling_utils.py", line 4075, in from_pretrained
    hf_quantizer.preprocess_model(
  File ".../transformers/src/transformers/quantizers/base.py", line 167, in preprocess_model
    self._process_model_before_weight_loading(model, **kwargs)
  File ".../transformers/src/transformers/quantizers/quantizer_finegrained_fp8.py", line 106, in _process_model_before_weight_loading
    model = replace_with_fp8_linear(
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 617, in replace_with_fp8_linear
    new_module = FP8Expert(
                 ^^^^^^^^^^
  File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 496, in __init__
    self.num_experts = config.num_local_experts
                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../transformers/src/transformers/configuration_utils.py", line 164, in __getattribute__
    return super().__getattribute__(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3MoeConfig' object has no attribute 'num_local_experts'

A small reproducer is added in the form of a unit test.

HuggingFaceDocBuilderDev · 2026-01-26T11:42:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

The Qwen3 MoE config was missing the mapping attribute for the num_expert_local config variable which made it impossible to load FP8 quantized models, due to the following exception: ``` Traceback (most recent call last): File ".../exps/train-qwen3-lora.py", line 4, in <module> base_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-30B-A3B-Thinking-2507-FP8') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../transformers/src/transformers/models/auto/auto_factory.py", line 372, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../transformers/src/transformers/modeling_utils.py", line 4075, in from_pretrained hf_quantizer.preprocess_model( File ".../transformers/src/transformers/quantizers/base.py", line 167, in preprocess_model self._process_model_before_weight_loading(model, **kwargs) File ".../transformers/src/transformers/quantizers/quantizer_finegrained_fp8.py", line 106, in _process_model_before_weight_loading model = replace_with_fp8_linear( ^^^^^^^^^^^^^^^^^^^^^^^^ File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 617, in replace_with_fp8_linear new_module = FP8Expert( ^^^^^^^^^^ File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 496, in __init__ self.num_experts = config.num_local_experts ^^^^^^^^^^^^^^^^^^^^^^^^ File ".../transformers/src/transformers/configuration_utils.py", line 164, in __getattribute__ return super().__getattribute__(key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Qwen3MoeConfig' object has no attribute 'num_local_experts' ``` A small reproducer is added in the form of a unit test.

Rocketknight1 · 2026-01-26T13:52:10Z

cc @MekkCyber for quants

vasqu

Lgtm, see the fix I meant for proper fp8 usage in the tests but cc @MekkCyber @SunMarc for sanity checking if I'm wrong

tests/quantization/finegrained_fp8/test_fp8.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

github-actions · 2026-01-26T13:56:32Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_moe, qwen3_omni_moe, finegrained_fp8

vasqu · 2026-01-27T09:56:18Z

Merging since I verified locally that it works

githubnemo force-pushed the issue/fp8-loading-exception branch from a98c65f to efe56ef Compare January 26, 2026 11:58

vasqu approved these changes Jan 26, 2026

View reviewed changes

tests/quantization/finegrained_fp8/test_fp8.py Outdated Show resolved Hide resolved

Update tests/quantization/finegrained_fp8/test_fp8.py

5aa60aa

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

vasqu merged commit a1f63d5 into huggingface:main Jan 27, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix loading of Qwen3 FP8#43494

Fix loading of Qwen3 FP8#43494
vasqu merged 2 commits intohuggingface:mainfrom
githubnemo:issue/fp8-loading-exception

githubnemo commented Jan 26, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 26, 2026

Uh oh!

Rocketknight1 commented Jan 26, 2026

Uh oh!

vasqu left a comment

Uh oh!

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

vasqu commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

githubnemo commented Jan 26, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 26, 2026

Uh oh!

Rocketknight1 commented Jan 26, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

vasqu commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants