[model] Support Marco#9137
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the Qwen3.5 best practices documentation by adding links to the Megatron-LM repository and introduces support for Marco-Nano and Marco-Mini models. However, these dense models were incorrectly registered under a Mixture-of-Experts (MoE) architecture block, which will cause loading errors as they do not match the expected architecture.
| ModelGroup([ | ||
| Model('AIDC-AI/Marco-Nano-Base', 'AIDC-AI/Marco-Nano-Base'), | ||
| Model('AIDC-AI/Marco-Nano-Instruct', 'AIDC-AI/Marco-Nano-Instruct'), | ||
| Model('AIDC-AI/Marco-Mini-Base', 'AIDC-AI/Marco-Mini-Base'), | ||
| Model('AIDC-AI/Marco-Mini-Instruct', 'AIDC-AI/Marco-Mini-Instruct'), | ||
| Model('AIDC-AI/Marco-Mini-Global-Base', 'AIDC-AI/Marco-Mini-Global-Base'), | ||
| ], TemplateType.qwen3), |
There was a problem hiding this comment.
The Marco-Nano and Marco-Mini models are dense models, but they are currently registered within the LLMModelType.qwen3_moe block, which is configured with architectures=['Qwen3MoeForCausalLM']. Since these models do not have a Mixture-of-Experts architecture, loading them using this registration will fail. They should be moved to a dense model registration group, such as LLMModelType.qwen3 (which uses Qwen3ForCausalLM) or LLMModelType.qwen2 (which uses Qwen2ForCausalLM), depending on their underlying architecture.
#9132