Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用


## 🎉 News
- 2023.1.15: Support yuan series: yuan-2b-instruct, **yuan-2b-janus-instruct**, yuan-51b-instruct, yuan-102b-instruct.
- 🔥2023.1.12: Support **deepseek-moe** series: deepseek-moe-16b, [deepseek-moe-16b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat).
- 🔥2023.1.4: Support for **VLLM deployment**, compatible with the **OpenAI API** style. For more details, please refer to [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署)
- 2023.1.4: Update [Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) to facilitate viewing the training speed and GPU memory required for different models.
Expand Down Expand Up @@ -140,6 +141,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
- mistral series: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-instruct](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary), [mistral-7b-instruct-v2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary), [mixtral-moe-7b](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary), [mixtral-moe-7b-instruct](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)
- baichuan series: [baichuan-7b](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary), [baichuan-13b](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary), [baichuan-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary), [baichuan2-7b](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary), [baichuan2-7b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary), [baichuan2-13b](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary), [baichuan2-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary), [baichuan2-7b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary), [baichuan2-13b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary)
- internlm series: [internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary), [internlm-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary), [internlm-7b-chat-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary), [internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary), [internlm-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary)
- yuan series: [yuan-2b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-2B-hf/summary), [yuan-2b-janus-instruct](https://modelscope.cn/models/YuanLLM/Yuan2-2B-Janus-hf/summary), [yuan-51b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-51B-hf/summary), [yuan-102b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-102B-hf/summary)
- xverse series: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary)
- bluelm series: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)
- zephyr series: [zephyr-7b-beta-chat](https://modelscope.cn/models/modelscope/zephyr-7b-beta/summary)
Expand Down Expand Up @@ -172,7 +174,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
- Custom Dataset
- Supported Templates:
- Text Generation: default-generation, default-generation-bos, chatglm-generation
- Chat: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, yi, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, codefuse-codellama, deepseek-coder, cogagent
- Chat: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, cogagent

## 🔥SCEdit

Expand Down
4 changes: 3 additions & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
用户可以查看 [SWIFT官方文档](docs/source/GetStarted/快速使用.md) 来了解详细信息。

## 🎉 新闻
- 2023.1.15: 支持yuan系列: yuan-2b-instruct, **yuan-2b-janus-instruct**, yuan-51b-instruct, yuan-102b-instruct.
- 🔥2023.1.12: 支持**deepseek-moe**系列: deepseek-moe-16b, [deepseek-moe-16b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat).
- 🔥2023.1.4: 支持**VLLM部署**, 兼容**OpenAI API**样式, 具体可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署).
- 2023.1.4: 更新[Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md), 方便查看不同模型训练的速度和所需显存.
Expand Down Expand Up @@ -138,6 +139,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
- mistral 系列: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-instruct](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary), [mistral-7b-instruct-v2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary), [mixtral-moe-7b](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary), [mixtral-moe-7b-instruct](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)
- baichuan 系列: [baichuan-7b](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary), [baichuan-13b](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary), [baichuan-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary), [baichuan2-7b](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary), [baichuan2-7b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary), [baichuan2-13b](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary), [baichuan2-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary), [baichuan2-7b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary), [baichuan2-13b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary)
- internlm 系列: [internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary), [internlm-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary), [internlm-7b-chat-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary), [internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary), [internlm-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary)
- yuan 系列: [yuan-2b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-2B-hf/summary), [yuan-2b-janus-instruct](https://modelscope.cn/models/YuanLLM/Yuan2-2B-Janus-hf/summary), [yuan-51b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-51B-hf/summary), [yuan-102b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-102B-hf/summary)
- xverse 系列: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary)
- bluelm 系列: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)
- zephyr 系列: [zephyr-7b-beta-chat](https://modelscope.cn/models/modelscope/zephyr-7b-beta/summary)
Expand Down Expand Up @@ -170,7 +172,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
- 自定义数据集
- 支持的对话模板:
- 文本生成: default-generation, default-generation-bos, chatglm-generation
- 对话: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, yi, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, codefuse-codellama, deepseek-coder, cogagent
- 对话: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, cogagent

## 🔥SCEdit

Expand Down
2 changes: 1 addition & 1 deletion docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
- `--lora_bias_trainable`: 默认为`'none'`, 可以选择的值: 'none', 'all'. 如果你要将bias全都设置为可训练, 你可以设置为`'all'`.
- `--lora_modules_to_save`: 默认为`[]`. 如果你想要训练embedding, lm_head, 或者layer_norm, 你可以设置此参数, 例如: `--lora_modules_to_save wte ln_1 ln_2 ln_f lm_head`.
- `--lora_dtype`: 默认为`'fp32'`, 指定lora模块的dtype类型. 如果是`AUTO`则跟随原始模块的dtype类型. 你可以选择的值: 'fp16', 'bf16', 'fp32', 'AUTO'.
- `--neftune_alpha`: `NEFTune`添加的噪声系数.
- `--neftune_noise_alpha`: `NEFTune`添加的噪声系数, 可以提升模型在指令微调中的性能, 默认为`None`. 通常可以设置为5, 10, 15. 你可以查看[相关论文](https://arxiv.org/abs/2310.05914).
- `--gradient_checkpointing`: 是否开启gradient checkpointing, 默认为`True`. 该参数可以用于节约显存, 虽然这会略微降低训练速度. 该参数在max_length较大, batch_size较大时作用显著.
- `--deepspeed_config_path`: 用于指定deepspeed的配置文件的路径, 默认为`None`, 即不开启deepspeed. deepspeed可以节约显存. 我们书写了默认的[ZeRO-2的配置文件](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/ds_config/zero2.json).
- `--batch_size`: 训练时的batch_size, 默认为`1`. 增大batch_size可以增加GPU的利用率, 但不一定会增加训练速度, 因为在一个batch中, 需要对较短的句子按该batch中最长句子的长度进行padding, 从而引入无效的计算量.
Expand Down
4 changes: 4 additions & 0 deletions docs/source/LLM/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@
|internlm-7b-chat-8k|[Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary)|q_proj, k_proj, v_proj|internlm|✘|✔||
|internlm-20b|[Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary)|q_proj, k_proj, v_proj|default-generation-bos|✘|✔||
|internlm-20b-chat|[Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary)|q_proj, k_proj, v_proj|internlm|✘|✔||
|yuan-2b-instruct|[YuanLLM/Yuan2.0-2B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-2B-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘||
|yuan-2b-janus-instruct|[YuanLLM/Yuan2-2B-Janus-hf](https://modelscope.cn/models/YuanLLM/Yuan2-2B-Janus-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘||
|yuan-51b-instruct|[YuanLLM/Yuan2.0-51B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-51B-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘||
|yuan-102b-instruct|[YuanLLM/Yuan2.0-102B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-102B-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘||
|xverse-7b|[xverse/XVERSE-7B](https://modelscope.cn/models/xverse/XVERSE-7B/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘||
|xverse-7b-chat|[xverse/XVERSE-7B-Chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary)|q_proj, k_proj, v_proj|xverse|✘|✘||
|xverse-13b|[xverse/XVERSE-13B](https://modelscope.cn/models/xverse/XVERSE-13B/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘||
Expand Down
23 changes: 21 additions & 2 deletions docs/source/LLM/自定义与拓展.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@

**格式1:**

Pretraining
Pre-Training

```csv
response
Expand Down Expand Up @@ -99,6 +99,24 @@ AAAAA,BBBBB,CCCCC
{"messages": [{"role": "user", "content": "AAAAA"}, {"role": "assistant", "content": "BBBBB"}, {"role": "user", "content": "CCCCC"}, {"role": "assistant", "content": "DDDDD"}]}
```

**Qwen-VL 系列**

输入格式兼容: [qwen-vl github](https://github.com/QwenLM/Qwen-VL#data-preparation), 且同样支持csv, json, jsonl格式.

```json
[{"conversations": [{"from": "user", "value": "Picture 1:<img>img_path</img>\n11111"}, {"from": "assistant", "value": "22222"}]},
{"conversations": [{"from": "user", "value": "Picture 1:<img>img_path</img>\nPicture 2:<img>img_path</img>\naaaaa"}, {"from": "assistant", "value": "bbbbb"}, {"from": "user", "value": "Picture 1:<img>img_path</img>\nccccc"}, {"from": "assistant", "value": "ddddd"}]},
{"conversations": [{"from": "user", "value": "AAAAA"}, {"from": "assistant", "value": "BBBBB"}, {"from": "user", "value": "CCCCC"}, {"from": "assistant", "value": "DDDDD"}]}]
```

**Qwen-Audio 系列**

```jsonl
{"conversations": [{"from": "user", "value": "Audio 1:<audio>audio_path</audio>\n11111"}, {"from": "assistant", "value": "22222"}]}
{"conversations": [{"from": "user", "value": "Audio 1:<audio>audio_path</audio>\nAudio 2:<audio>audio_path</audio>\naaaaa"}, {"from": "assistant", "value": "bbbbb"}, {"from": "user", "value": "Audio 1:<audio>audio_path</audio>\nccccc"}, {"from": "assistant", "value": "ddddd"}]}
{"conversations": [{"from": "user", "value": "AAAAA"}, {"from": "assistant", "value": "BBBBB"}, {"from": "user", "value": "CCCCC"}, {"from": "assistant", "value": "DDDDD"}]}
```

**强化学习(DPO)**

```jsonl
Expand Down Expand Up @@ -257,7 +275,8 @@ if __name__ == '__main__':
- `get_function`: 默认值为`None`. 获取model和tokenizer的函数. 如果传入None, 则使用修饰器方案进行模型注册. 如果传入一个函数, 则使用正常方案进行注册.
- `requires`: 默认为`[]`. 表示模型所需要的区别于其他模型的依赖. 该参数一般不需要设置.
- `torch_dtype`: 默认为`None`. 表示模型所推荐使用的torch_dtype. 该参数一般不需要设置.
- `revision`: 默认为`'master'`. 用于指定模型的版本号. 如果`model_id_or_path`是本地的模型目录, 则该参数失效. 该参数一般不需要设置.
- `use_hf`: 默认为`False`, 即设置为modelscope hub. 如果你要使用huggingface hub, 你可以设置为True.
- `revision`: 默认为`None`. 用于指定模型的版本号, 如果`use_hf`为False, 则设置为'master', 如果`use_hf`为True, 则设置为'main'. 如果`model_id_or_path`是本地的模型目录, 则该参数失效. 该参数一般不需要设置.
- `ignore_file_pattern`: 默认为`None`. 表示下载的时候需要忽略的文件名的正则pattern, 该参数会传递给`snapshot_download`. 例如`r'.+\.bin$'`, `r'.+\.savetensors$'`等. 该参数一般不需要设置.
- `**kwargs`: 其他用于注释模型能力的参数. 该参数一般不需要设置.

Expand Down
12 changes: 11 additions & 1 deletion swift/llm/sft.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import inspect
import os
from functools import partial
from typing import Any, Dict, Union
Expand Down Expand Up @@ -152,6 +153,14 @@ def llm_sft(args: SftArguments) -> Dict[str, Union[str, Any]]:
additional_saved_files = []
if args.sft_type == 'full':
additional_saved_files = get_additional_saved_files(args.model_type)

kwargs = {}
parameters = inspect.signature(
Seq2SeqTrainingArguments.__init__).parameters
for key in ['neftune_noise_alpha']:
if key in parameters:
kwargs[key] = getattr(args, key)

training_args = Seq2SeqTrainingArguments(
output_dir=args.output_dir,
evaluation_strategy=evaluation_strategy,
Expand Down Expand Up @@ -203,7 +212,8 @@ def llm_sft(args: SftArguments) -> Dict[str, Union[str, Any]]:
save_on_each_node=args.save_on_each_node,
acc_strategy=args.acc_strategy,
save_safetensors=args.save_safetensors,
logging_first_step=True)
logging_first_step=True,
**kwargs)

if args.gradient_checkpointing:
model.config.use_cache = False # fix transformers==4.36
Expand Down
7 changes: 5 additions & 2 deletions swift/llm/tuner.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Copyright (c) Alibaba, Inc. and its affiliates.

import torch
import transformers
from packaging import version

from swift.trainers import TrainerCallback
from swift.tuners import (LongLoRAConfig, LongLoRAModelType, LoraConfig,
Expand Down Expand Up @@ -83,8 +85,9 @@ def prepare_model(model, args: SftArguments):
else:
raise ValueError(f'args.sft_type: {args.sft_type}')

if args.neftune_alpha > 0.001:
neftune_config = NEFTuneConfig(noise_alpha=args.neftune_alpha)
if version.parse(transformers.__version__) < version.parse(
'4.35') and args.neftune_noise_alpha not in {None, 0.}:
neftune_config = NEFTuneConfig(noise_alpha=args.neftune_noise_alpha)
model = Swift.prepare_model(model, {'neftune': neftune_config})
logger.info(f'neftune_config: {neftune_config}')

Expand Down
7 changes: 5 additions & 2 deletions swift/llm/utils/argument.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ class SftArguments:
lora_modules_to_save: List[str] = field(default_factory=list)
lora_dtype: Literal['fp16', 'bf16', 'fp32', 'AUTO'] = 'fp32'

neftune_alpha: float = 0.0
neftune_noise_alpha: Optional[float] = None # e.g. 5, 10, 15

gradient_checkpointing: Optional[bool] = None
deepspeed_config_path: Optional[str] = None # e.g. 'ds_config/zero2.json'
Expand Down Expand Up @@ -170,6 +170,7 @@ class SftArguments:
per_device_eval_batch_size: Optional[int] = None
# compatibility. (Deprecated)
only_save_model: Optional[bool] = None
neftune_alpha: Optional[float] = None

def __post_init__(self) -> None:
handle_compatibility(self)
Expand Down Expand Up @@ -388,7 +389,7 @@ class InferArguments:
def __post_init__(self) -> None:
if self.ckpt_dir is not None and not self.check_ckpt_dir_correct(
self.ckpt_dir):
raise ValueError(
logger.warning(
f'The checkpoint dir {self.ckpt_dir} passed in is invalid, please make sure'
'the dir contains a `configuration.json` file.')
handle_compatibility(self)
Expand Down Expand Up @@ -623,6 +624,8 @@ def handle_compatibility(args: Union[SftArguments, InferArguments]) -> None:
if isinstance(args, SftArguments):
if args.only_save_model is not None:
args.save_only_model = args.only_save_model
if args.neftune_alpha is not None:
args.neftune_noise_alpha = args.neftune_alpha
if args.per_device_train_batch_size is not None:
args.batch_size = args.per_device_train_batch_size
if args.per_device_eval_batch_size is not None:
Expand Down
Loading