Merged
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
The Qwen3 MoE config was missing the mapping attribute for the num_expert_local
config variable which made it impossible to load FP8 quantized models, due
to the following exception:
```
Traceback (most recent call last):
File ".../exps/train-qwen3-lora.py", line 4, in <module>
base_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-30B-A3B-Thinking-2507-FP8')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../transformers/src/transformers/models/auto/auto_factory.py", line 372, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../transformers/src/transformers/modeling_utils.py", line 4075, in from_pretrained
hf_quantizer.preprocess_model(
File ".../transformers/src/transformers/quantizers/base.py", line 167, in preprocess_model
self._process_model_before_weight_loading(model, **kwargs)
File ".../transformers/src/transformers/quantizers/quantizer_finegrained_fp8.py", line 106, in _process_model_before_weight_loading
model = replace_with_fp8_linear(
^^^^^^^^^^^^^^^^^^^^^^^^
File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 617, in replace_with_fp8_linear
new_module = FP8Expert(
^^^^^^^^^^
File ".../transformers/src/transformers/integrations/finegrained_fp8.py", line 496, in __init__
self.num_experts = config.num_local_experts
^^^^^^^^^^^^^^^^^^^^^^^^
File ".../transformers/src/transformers/configuration_utils.py", line 164, in __getattribute__
return super().__getattribute__(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3MoeConfig' object has no attribute 'num_local_experts'
```
A small reproducer is added in the form of a unit test.
a98c65f to
efe56ef
Compare
Member
|
cc @MekkCyber for quants |
vasqu
approved these changes
Jan 26, 2026
Contributor
vasqu
left a comment
There was a problem hiding this comment.
Lgtm, see the fix I meant for proper fp8 usage in the tests but cc @MekkCyber @SunMarc for sanity checking if I'm wrong
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen3_moe, qwen3_omni_moe, finegrained_fp8 |
Contributor
|
Merging since I verified locally that it works |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Qwen3 MoE config was missing the mapping attribute for the num_expert_local config variable which made it impossible to load FP8 quantized models, due to the following exception:
A small reproducer is added in the form of a unit test.