[docs] support deepseek_v4 readme#9430
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds the --loss_scale ignore_empty_think parameter to several Megatron SFT training scripts and introduces a new README documentation file for DeepSeek-V4 fine-tuning support. The reviewer identified several issues in the newly added README, including grammatical errors, typos (such as 'paddind_free'), incorrect markdown code block closures, and multiple empty placeholder sections that need to be completed.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive training and fine-tuning support for DeepSeek-V4 in Megatron-SWIFT, including documentation in both Chinese and English, and updates various example scripts to use the --loss_scale ignore_empty_think flag. However, there are a couple of critical issues in the implementation: first, the configuration class reassignment in vllm_engine.py was accidentally nested under a model-specific check, which will break configuration loading for all other models; second, the layer-filtering logic in the documentation scripts only checks for layers. instead of model.layers., which could fail to filter layers properly and lead to out-of-memory errors when loading official weights.
| if k.startswith('layers.'): | ||
| idx = int(k[len('layers.'):].split('.', 1)[0]) | ||
| if idx >= 4: | ||
| continue |
There was a problem hiding this comment.
The official HuggingFace DeepSeek-V4 model keys typically start with model.layers. rather than layers.. If the keys are prefixed with model., the condition k.startswith('layers.') will evaluate to False, and the script will fail to filter out layers 4 and above. This will cause the script to load the entire model, potentially leading to out-of-memory (OOM) errors. Please update the condition to support both model.layers. and layers. prefixes.
| if k.startswith('layers.'): | |
| idx = int(k[len('layers.'):].split('.', 1)[0]) | |
| if idx >= 4: | |
| continue | |
| if k.startswith('model.layers.'): | |
| idx = int(k[len('model.layers.'):].split('.', 1)[0]) | |
| if idx >= 4: | |
| continue | |
| elif k.startswith('layers.'): | |
| idx = int(k[len('layers.'):].split('.', 1)[0]) | |
| if idx >= 4: | |
| continue |
| if k.startswith('layers.'): | ||
| idx = int(k[len('layers.'):].split('.', 1)[0]) | ||
| if idx >= 4: | ||
| continue |
There was a problem hiding this comment.
The official HuggingFace DeepSeek-V4 model keys typically start with model.layers. rather than layers.. If the keys are prefixed with model., the condition k.startswith('layers.') will evaluate to False, and the script will fail to filter out layers 4 and above. This will cause the script to load the entire model, potentially leading to out-of-memory (OOM) errors. Please update the condition to support both model.layers. and layers. prefixes.
| if k.startswith('layers.'): | |
| idx = int(k[len('layers.'):].split('.', 1)[0]) | |
| if idx >= 4: | |
| continue | |
| if k.startswith('model.layers.'): | |
| idx = int(k[len('model.layers.'):].split('.', 1)[0]) | |
| if idx >= 4: | |
| continue | |
| elif k.startswith('layers.'): | |
| idx = int(k[len('layers.'):].split('.', 1)[0]) | |
| if idx >= 4: | |
| continue |
No description provided.