From e19c29f98e486eb35c117037475fecf4c05fbda4 Mon Sep 17 00:00:00 2001 From: Jintao Huang Date: Thu, 30 Apr 2026 15:40:18 +0800 Subject: [PATCH 1/4] update docs --- README.md | 2 +- README_CN.md | 2 +- docs/source/GetStarted/SWIFT-installation.md | 2 +- docs/source/Megatron-SWIFT/Custom-Model.md | 2 +- docs/source/Megatron-SWIFT/Quick-start.md | 2 +- docs/source_en/GetStarted/SWIFT-installation.md | 2 +- docs/source_en/Megatron-SWIFT/Custom-Model.md | 2 +- docs/source_en/Megatron-SWIFT/Quick-start.md | 2 +- requirements/framework.txt | 4 ++-- requirements/install_all.sh | 2 +- 10 files changed, 11 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index b3b33d0672..935bf99bd4 100644 --- a/README.md +++ b/README.md @@ -145,7 +145,7 @@ Running Environment: | modelscope | >=1.23 | | | | peft | >=0.11,<0.20 | | | | flash_attn | | 2.8.3/3.0.0b1 | | -| trl | >=0.15,<0.30 | 0.29.1 | RLHF | +| trl | >=0.15,<1.0 | 0.29.1 | RLHF | | deepspeed | >=0.14 | 0.18.9 | Training | | vllm | >=0.5.1 | 0.11.0/0.19.1 | Inference/Deployment | | sglang | >=0.4.6 | | Inference/Deployment | diff --git a/README_CN.md b/README_CN.md index 7d192bdc55..84cd20a9b5 100644 --- a/README_CN.md +++ b/README_CN.md @@ -141,7 +141,7 @@ uv pip install -e . --torch-backend=auto | modelscope | >=1.23 | | | | peft | >=0.11,<0.20 | | | | flash_attn | | 2.8.3/3.0.0b1 | | -| trl | >=0.15,<0.30 | 0.29.1 | RLHF | +| trl | >=0.15,<1.0 | 0.29.1 | RLHF | | deepspeed | >=0.14 | 0.18.9 | 训练 | | vllm | >=0.5.1 | 0.11.0/0.19.1 | 推理/部署 | | sglang | >=0.4.6 | | 推理/部署 | diff --git a/docs/source/GetStarted/SWIFT-installation.md b/docs/source/GetStarted/SWIFT-installation.md index bbc4abd9b5..f406820212 100644 --- a/docs/source/GetStarted/SWIFT-installation.md +++ b/docs/source/GetStarted/SWIFT-installation.md @@ -151,7 +151,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2 | modelscope | >=1.23 | | | | peft | >=0.11,<0.20 | | | | flash_attn | | 2.8.3/3.0.0b1 | | -| trl | >=0.15,<0.30 | 0.29.1 | RLHF | +| trl | >=0.15,<1.0 | 0.29.1 | RLHF | | deepspeed | >=0.14 | 0.18.9 | 训练 | | vllm | >=0.5.1 | 0.11.0/0.19.1 | 推理/部署 | | sglang | >=0.4.6 | | 推理/部署 | diff --git a/docs/source/Megatron-SWIFT/Custom-Model.md b/docs/source/Megatron-SWIFT/Custom-Model.md index a28d864d40..d8cb38061a 100644 --- a/docs/source/Megatron-SWIFT/Custom-Model.md +++ b/docs/source/Megatron-SWIFT/Custom-Model.md @@ -1,7 +1,7 @@ # Megatron-SWIFT 自定义模型 -这里介绍如何在Mcore-Bridge中注册模型,以支持新模型在Megatron-SWIFT中的训练。我们将以MiniMax-M2.7为例子介绍。 +这里介绍如何在[Mcore-Bridge](https://github.com/modelscope/mcore-bridge)中注册模型,以支持新模型在Megatron-SWIFT中的训练。我们将以MiniMax-M2.7为例子介绍。 ## 下载模型 diff --git a/docs/source/Megatron-SWIFT/Quick-start.md b/docs/source/Megatron-SWIFT/Quick-start.md index ea39c5d766..d96d7613d8 100644 --- a/docs/source/Megatron-SWIFT/Quick-start.md +++ b/docs/source/Megatron-SWIFT/Quick-start.md @@ -78,7 +78,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2 | transformers | >=4.33 | 4.57.6/5.6.2 | | | modelscope | >=1.23 | | | | peft | >=0.11,<0.20 | | LoRA | -| trl | >=0.15,<0.30 | | RLHF | +| trl | >=0.15,<1.0 | | RLHF | ## 快速入门案例 diff --git a/docs/source_en/GetStarted/SWIFT-installation.md b/docs/source_en/GetStarted/SWIFT-installation.md index 3e98f55085..f5adbfdfa2 100644 --- a/docs/source_en/GetStarted/SWIFT-installation.md +++ b/docs/source_en/GetStarted/SWIFT-installation.md @@ -150,7 +150,7 @@ More images can be found [here](https://modelscope.cn/docs/intro/environment-set | modelscope | >=1.23 | | | | peft | >=0.11,<0.20 | | | | flash_attn | | 2.8.3/3.0.0b1 | | -| trl | >=0.15,<0.30 | 0.29.1 | RLHF | +| trl | >=0.15,<1.0 | 0.29.1 | RLHF | | deepspeed | >=0.14 | 0.18.9 | Training | | vllm | >=0.5.1 | 0.11.0/0.19.1 | Inference/Deployment | | sglang | >=0.4.6 | | Inference/Deployment | diff --git a/docs/source_en/Megatron-SWIFT/Custom-Model.md b/docs/source_en/Megatron-SWIFT/Custom-Model.md index 97ccc501bd..fa6cc0d40b 100644 --- a/docs/source_en/Megatron-SWIFT/Custom-Model.md +++ b/docs/source_en/Megatron-SWIFT/Custom-Model.md @@ -1,7 +1,7 @@ # Megatron-SWIFT Custom Model -This guide explains how to register a model in Mcore-Bridge to support training new models in Megatron-SWIFT. We will use MiniMax-M2.7 as an example. +This guide explains how to register a model in [Mcore-Bridge](https://github.com/modelscope/mcore-bridge) to support training new models in Megatron-SWIFT. We will use MiniMax-M2.7 as an example. ## Download the Model diff --git a/docs/source_en/Megatron-SWIFT/Quick-start.md b/docs/source_en/Megatron-SWIFT/Quick-start.md index e1531aa212..2ca0ec938f 100644 --- a/docs/source_en/Megatron-SWIFT/Quick-start.md +++ b/docs/source_en/Megatron-SWIFT/Quick-start.md @@ -78,7 +78,7 @@ Recommended Operating Environment: | transformers | >=4.33 | 4.57.6/5.6.2 | | | modelscope | >=1.23 | | | | peft | >=0.11,<0.20 | | LoRA | -| trl | >=0.15,<0.30 | | RLHF | +| trl | >=0.15,<1.0 | | RLHF | ## Quick Start Example diff --git a/requirements/framework.txt b/requirements/framework.txt index bb816f824a..a8f9dd302e 100644 --- a/requirements/framework.txt +++ b/requirements/framework.txt @@ -32,8 +32,8 @@ sortedcontainers>=1.5.9 tensorboard tiktoken tqdm -transformers>=4.33,<5.7.0 +transformers>=4.33,<5.8.0 transformers_stream_generator -trl>=0.15,<0.30 +trl>=0.15,<1.0 uvicorn zstandard diff --git a/requirements/install_all.sh b/requirements/install_all.sh index 21a2a9e80c..6abb7c702d 100644 --- a/requirements/install_all.sh +++ b/requirements/install_all.sh @@ -3,7 +3,7 @@ # pip install sglang -U pip install "vllm>=0.5.1" -U pip install "lmdeploy>=0.5,<0.10.2" -U --no-deps -pip install "transformers<5.7" "trl<0.30" peft -U +pip install "transformers<5.8" "trl<1.0" peft -U pip install auto_gptq optimum bitsandbytes "gradio<5.33" -U pip install git+https://github.com/modelscope/ms-swift.git#egg=ms-swift[all] pip install timm "deepspeed<0.19" -U From da592a1a9b82304c0342ff45455ef5ca3f61dae4 Mon Sep 17 00:00:00 2001 From: Jintao Huang Date: Thu, 30 Apr 2026 16:26:02 +0800 Subject: [PATCH 2/4] update --- swift/dataset/preprocessor/core.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/swift/dataset/preprocessor/core.py b/swift/dataset/preprocessor/core.py index 24fcd82aff..ab0a7654e5 100644 --- a/swift/dataset/preprocessor/core.py +++ b/swift/dataset/preprocessor/core.py @@ -262,8 +262,8 @@ def _new_init(self, schema=None, features=None, *args, **kwargs): }] features['messages'] = messages_feature_with_loss features['rejected_messages'] = messages_feature_with_loss - features['positive_messages'] = [messages_feature] - features['negative_messages'] = [messages_feature] + features['positive_messages'] = messages_feature + features['negative_messages'] = messages_feature features['images'] = [{'bytes': Value(dtype='binary'), 'path': Value(dtype='string')}] features['objects'] = { 'ref': Sequence(feature=Value(dtype='string'), length=-1), From ac575b48b56b850cc05699bc5803adc62a700495 Mon Sep 17 00:00:00 2001 From: Jintao Huang Date: Thu, 30 Apr 2026 16:31:18 +0800 Subject: [PATCH 3/4] update --- docs/source/Megatron-SWIFT/Custom-Model.md | 2 +- docs/source_en/Megatron-SWIFT/Custom-Model.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/Megatron-SWIFT/Custom-Model.md b/docs/source/Megatron-SWIFT/Custom-Model.md index d8cb38061a..1570c42be4 100644 --- a/docs/source/Megatron-SWIFT/Custom-Model.md +++ b/docs/source/Megatron-SWIFT/Custom-Model.md @@ -14,7 +14,7 @@ model_dir = safe_snapshot_download('MiniMax/MiniMax-M2.7', download_model=False) print(f'model_dir: {model_dir}') ``` -由于模型权重很大,为了加速支持模型的效率,我们采用懒下载的方式,并只下载`num_layers`层的权重,构建mini版本的模型,用于做接入测试。以MiniMax-M2.7为例,我们构建了一层的BF16版本的权重。若有些模型出现前3层为Dense,之后为MoE,则你可以构建4层的权重。 +由于模型权重很大,为了加速支持模型的效率,我们采用懒下载的方式,并只下载`num_layers`层的权重,构建mini版本的模型,用于做接入测试。以MiniMax-M2.7为例,我们构建了一层的BF16版本的权重。若有些模型出现前3层为Dense,之后为MoE,则你可以构建4层的权重。若出现Attention交替的情况,例如Qwen3.5采用linear-attention和full-attention交替,你也需要更多的层数。 ```python import os diff --git a/docs/source_en/Megatron-SWIFT/Custom-Model.md b/docs/source_en/Megatron-SWIFT/Custom-Model.md index fa6cc0d40b..718e51c0a7 100644 --- a/docs/source_en/Megatron-SWIFT/Custom-Model.md +++ b/docs/source_en/Megatron-SWIFT/Custom-Model.md @@ -14,7 +14,7 @@ model_dir = safe_snapshot_download('MiniMax/MiniMax-M2.7', download_model=False) print(f'model_dir: {model_dir}') ``` -Since model weights are very large, to speed up the model integration process, we use lazy downloading and only download weights for `num_layers` layers, building a mini version of the model for integration testing. Taking MiniMax-M2.7 as an example, we build a one-layer BF16 version of the weights. If some models have the first 3 layers as Dense and the rest as MoE, you can build 4 layers of weights. +Since model weights are very large, to speed up the model integration process, we use lazy downloading and only download weights for `num_layers` layers, building a mini version of the model for integration testing. Taking MiniMax-M2.7 as an example, we build a one-layer BF16 version of the weights. If some models have the first 3 layers as Dense and the rest as MoE, you can build 4 layers of weights. If alternating attention types are used, for example Qwen3.5 alternates between linear attention and full attention, you will also need more layers. ```python import os From bc71f095b89ca0f6baa53abc8ee813026d3712eb Mon Sep 17 00:00:00 2001 From: Jintao Huang Date: Thu, 30 Apr 2026 16:32:26 +0800 Subject: [PATCH 4/4] update --- docs/source/Megatron-SWIFT/Custom-Model.md | 2 +- docs/source_en/Megatron-SWIFT/Custom-Model.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/Megatron-SWIFT/Custom-Model.md b/docs/source/Megatron-SWIFT/Custom-Model.md index 1570c42be4..d517ec9758 100644 --- a/docs/source/Megatron-SWIFT/Custom-Model.md +++ b/docs/source/Megatron-SWIFT/Custom-Model.md @@ -1,4 +1,4 @@ -# Megatron-SWIFT 自定义模型 +# 自定义Megatron模型 这里介绍如何在[Mcore-Bridge](https://github.com/modelscope/mcore-bridge)中注册模型,以支持新模型在Megatron-SWIFT中的训练。我们将以MiniMax-M2.7为例子介绍。 diff --git a/docs/source_en/Megatron-SWIFT/Custom-Model.md b/docs/source_en/Megatron-SWIFT/Custom-Model.md index 718e51c0a7..74938d2d1a 100644 --- a/docs/source_en/Megatron-SWIFT/Custom-Model.md +++ b/docs/source_en/Megatron-SWIFT/Custom-Model.md @@ -1,4 +1,4 @@ -# Megatron-SWIFT Custom Model +# Custom Megatron Model This guide explains how to register a model in [Mcore-Bridge](https://github.com/modelscope/mcore-bridge) to support training new models in Megatron-SWIFT. We will use MiniMax-M2.7 as an example.