Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
- 2023.1.4: Support for **VLLM deployment**, compatible with the **OpenAI API** style. For more details, please refer to [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署)
- 2023.1.4: Update [Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) to facilitate viewing the training speed and GPU memory required for different models.
- 🔥 2023.12.29: Support web-ui for training and inference, use `swift web-ui` after the installation of ms-swift.
- 🔥 2023.12.29: Support DPO RLHF(Reinforcement Learning from Human Feedback) and two datasets: AI-ModelScope/stack-exchange-paired and AI-ModelScope/hh-rlhf for this task. Use [this script](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora/dpo.sh) to start training!
- 🔥 2023.12.29: Support DPO RLHF(Reinforcement Learning from Human Feedback) and two datasets: AI-ModelScope/stack-exchange-paired and AI-ModelScope/hh-rlhf for this task. Use [this script](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora_ddp_mp/dpo.sh) to start training!
- 🔥 2023.12.28: Support SCEdit! This framework can easily reduce memory usage in training and inference, and replace ControlNet for controllable image generating scenarios, view the following chapter for details.
- 2023.12.23: Support [codegeex2-6b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/codegeex2_6b).
- 2023.12.19: Support [phi2-3b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/phi2_3b).
Expand Down Expand Up @@ -113,7 +113,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
- Quickly perform **inference** on LLM and build a **Web-UI**, see the [LLM Inference Documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM推理文档.md).
- Rapidly **fine-tune** and perform inference on LLM, and build a Web-UI, see the [LLM Fine-tuning Documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM微调文档.md).
- Using **interface** to fine-tuning and perform inference, see the [WEB-UI Documentation](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
- **DPO training** supported, start by using [this script](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora/dpo.sh).
- **DPO training** supported, start by using [this script](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora_ddp_mp/dpo.sh).
- Utilize VLLM for **inference acceleration** and **deployment(OpenAI API)**. Please refer to [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md) for more information.
- View the models and datasets supported by Swift. You can check [supported models and datasets](https://github.com/modelscope/swift/blob/main/docs/source/LLM/支持的模型和数据集.md).
- Expand and customize models, datasets, and dialogue templates in Swift, see [Customization and Expansion](https://github.com/modelscope/swift/blob/main/docs/source/LLM/自定义与拓展.md).
Expand Down
4 changes: 2 additions & 2 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
- 2023.1.4: 支持**VLLM部署**, 兼容**OpenAI API**样式, 具体可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署).
- 2023.1.4: 更新[Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md), 方便查看不同模型训练的速度和所需显存.
- 🔥 2023.12.29: 支持web-ui进行sft训练和推理,安装ms-swift后使用`swift web-ui`开启
- 🔥 2023.12.29: 支持 DPO RLHF(Reinforcement Learning from Human Feedback) 和两个用于此任务的数据集: AI-ModelScope/stack-exchange-paired 以及 AI-ModelScope/hh-rlhf. 使用[这个脚本](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora/dpo.sh)开启训练!
- 🔥 2023.12.29: 支持 DPO RLHF(Reinforcement Learning from Human Feedback) 和两个用于此任务的数据集: AI-ModelScope/stack-exchange-paired 以及 AI-ModelScope/hh-rlhf. 使用[这个脚本](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora_ddp_mp/dpo.sh)开启训练!
- 🔥 2023.12.28: 支持SCEdit! 该tuner可显著降低U-Net中的显存占用,并支持低显存可控图像生成(取代ControlNet),阅读下面的章节来了解详细信息
- 2023.12.23: 支持[codegeex2-6b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/codegeex2_6b).
- 2023.12.19: 支持[phi2-3b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/phi2_3b).
Expand Down Expand Up @@ -111,7 +111,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
- 快速对LLM进行**推理**, 搭建**Web-UI**, 可以查看[LLM推理文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM推理文档.md).
- 快速对LLM进行**微调**, 推理并搭建Web-UI, 可以查看[LLM微调文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM微调文档.md).
- 使用**界面**方式进行微调和推理, 可以查看[WEB-UI文档](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
- 支持**DPO训练**, 使用[这个脚本](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora/dpo.sh)开启训练
- 支持**DPO训练**, 使用[这个脚本](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora_ddp_mp/dpo.sh)开启训练
- 使用VLLM进行**推理加速**和**部署(OpenAI API)**. 可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md).
- 查看swift支持的模型和数据集. 可以查看[支持的模型和数据集](https://github.com/modelscope/swift/blob/main/docs/source/LLM/支持的模型和数据集.md).
- 对swift中的模型, 数据集, 对话模板进行**拓展**, 可以查看[自定义与拓展](https://github.com/modelscope/swift/blob/main/docs/source/LLM/自定义与拓展.md).
Expand Down
5 changes: 5 additions & 0 deletions docs/source/GetStarted/界面训练推理.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,8 @@ swift web-ui
```

开启界面训练和推理。

web-ui没有传入参数,所有可控部分都在界面中。但是有几个环境变量可以使用:

> WEBUI_SHARE=1 控制gradio是否是share状态
> SWIFT_UI_LANG=en/zh 控制web-ui界面语言
97 changes: 97 additions & 0 deletions docs/source/LLM/LLM人类对齐训练文档.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# LLM人类对齐训练文档
## 目录
- [环境准备](#环境准备)
- [人类对齐训练](#人类对齐训练)

## 环境准备
GPU设备: A10, 3090, V100, A100均可,如果是显存<=24G的GPU最少需要双卡环境。由于人类对齐训练在一张卡上加载两个模型,因此比微调的显存多占用一个推理模型的显存使用量。
```bash
# 设置pip全局镜像
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# 安装ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]

# 环境对齐 (如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```

## 人类对齐训练
下面的shell脚本运行了一个人类对齐训练。首先需要切换到运行目录:

```shell
cd examples/pytorch/llm
```

运行下面的命令:

```shell
# Experimental environment: 4*A100
# Memory usage: 4 * 20G,双卡device_map * 2ddp
nproc_per_node=2

PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun \
--nproc_per_node=$nproc_per_node \
--master_port 29500 \
llm_dpo.py \
--model_type mistral-7b \
--ref_model_type mistral-7b \
--model_revision master \
--sft_type lora \
--tuner_backend swift \
--dtype AUTO \
--output_dir output \
--dataset hh-rlhf \
--train_dataset_sample -1 \
--truncation_strategy truncation_left \
--val_dataset_sample 2000 \
--num_train_epochs 3 \
--max_length 1024 \
--max_prompt_length 512 \
--check_dataset_strategy none \
--lora_rank 8 \
--lora_alpha 32 \
--lora_dropout_p 0.05 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--weight_decay 0.01 \
--learning_rate 5e-5 \
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
--max_grad_norm 1.0 \
--warmup_ratio 0.03 \
--eval_steps 2000 \
--save_steps 2000 \
--save_total_limit 2 \
--logging_steps 10 \
```

### sh脚本

sh脚本可以查看[这里](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/dpo)。

```bash
# 下面的脚本需要在此目录下执行
cd examples/pytorch/llm
```

**提示**:

- 我们默认在训练时设置`--gradient_checkpointing true`来**节约显存**, 这会略微降低训练速度.
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](./支持的模型和数据集.md#模型)
- 如果你需要断网进行训练, 请使用`--model_cache_dir`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](./命令行参数.md).
- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.

```bash
# dpo训练 mistral-7b max_length=1024,bs=1
# 推荐的实验环境: V100, A10, 3090,2卡4卡或8卡
bash scripts/dpo/lora_ddp_mp/dpo.sh
bash scripts/dpo/lora_ddp_mp/infer.sh
```

由于DPO训练后会得到一个完整模型或者adapter的weights,因此LoRA合并、推理的步骤和微调步骤相同,因此请参考[微调文档](./LLM微调文档#Merge LoRA)对应的步骤。
7 changes: 7 additions & 0 deletions docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,13 @@
- `--repetition_penalty`: 默认为`1.05`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--num_beams`: 默认为`1`. 该参数只有在`predict_with_generate`设置为True的时候才生效.

## DPO参数

DPO参数继承了上面的SFT参数,除此之外增加了以下参数:

- `--ref_model_type` 对比模型类型,可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`
- `--max_prompt_length` 最大的提示长度,该参数会传入DPOTrainer中,使prompt长度不超过该值的设置,默认值1024


## merge-lora infer app-ui 命令行参数
- `--model_type`: 默认值为`None`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
Expand Down
9 changes: 9 additions & 0 deletions docs/source/LLM/自定义与拓展.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,16 @@ AAAAA,BBBBB,CCCCC
{"messages": [{"role": "user", "content": "AAAAA"}, {"role": "assistant", "content": "BBBBB"}, {"role": "user", "content": "CCCCC"}, {"role": "assistant", "content": "DDDDD"}]}
```

**强化学习(DPO)**

```jsonl
{"query": "11111", "response": "22222", "rejected_response": "33333"}
{"query": "aaaaa", "response": "bbbbb", "rejected_response": "ccccc"}
{"query": "AAAAA", "response": "BBBBB", "rejected_response": "CCCCC"}
```

### 注册数据集的方式

以下是一个**注册数据集**的案例. 完整的py文件可以查看[custom.py](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/custom.py), sh脚本可以查看[custom](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/custom).

```python
Expand Down
6 changes: 3 additions & 3 deletions examples/pytorch/llm/scripts/dpo/lora/dpo.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Experimental environment: 8*A100
# Memory usage: 8 * 50G
# Experimental environment: 2*A100
# Memory usage: 2 * 20G
PYTHONPATH=../../.. \
accelerate launch llm_dpo.py \
python llm_dpo.py \
--model_type mistral-7b \
--ref_model_type mistral-7b \
--model_revision master \
Expand Down
40 changes: 40 additions & 0 deletions examples/pytorch/llm/scripts/dpo/lora_ddp_mp/dpo.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Experimental environment: 4*A100
# Memory usage: 4 * 20G
nproc_per_node=2

PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun \
--nproc_per_node=$nproc_per_node \
--master_port 29500 \
llm_dpo.py \
--model_type mistral-7b \
--ref_model_type mistral-7b \
--model_revision master \
--sft_type lora \
--tuner_backend swift \
--dtype AUTO \
--output_dir output \
--dataset hh-rlhf \
--train_dataset_sample -1 \
--truncation_strategy truncation_left \
--val_dataset_sample 2000 \
--num_train_epochs 3 \
--max_length 1024 \
--max_prompt_length 512 \
--check_dataset_strategy none \
--lora_rank 8 \
--lora_alpha 32 \
--lora_dropout_p 0.05 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--weight_decay 0.01 \
--learning_rate 5e-5 \
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
--max_grad_norm 1.0 \
--warmup_ratio 0.03 \
--eval_steps 2000 \
--save_steps 2000 \
--save_total_limit 2 \
--logging_steps 10 \
14 changes: 14 additions & 0 deletions examples/pytorch/llm/scripts/dpo/lora_ddp_mp/infer.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Experimental environment: A10, 3090
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0 \
python llm_infer.py \
--ckpt_dir output/mistral-7b/vx-xxx-xxx/checkpoint-xxx \
--load_dataset_config true \
--eval_human true \
--use_flash_attn false \
--max_new_tokens 1024 \
--temperature 0.3 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
4 changes: 3 additions & 1 deletion swift/llm/dpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def llm_dpo(args: DPOArguments) -> str:

# Loading Model and Tokenizer
model_kwargs = {'low_cpu_mem_usage': True}
if (is_dist() and not is_ddp_plus_mp()) or 'HF_ACCELERATOR' in os.environ:
if is_dist() and not is_ddp_plus_mp():
model_kwargs['device_map'] = {'': local_rank}
else:
model_kwargs['device_map'] = 'auto'
Expand Down Expand Up @@ -61,6 +61,8 @@ def llm_dpo(args: DPOArguments) -> str:
ref_model = deepcopy(model)

logger.info(f'model_config: {model.config}')
if hasattr(model, 'hf_device_map'):
logger.info(f'model device_map {model.hf_device_map}')
generation_config = GenerationConfig(
max_new_tokens=args.max_new_tokens,
temperature=args.temperature,
Expand Down
2 changes: 1 addition & 1 deletion swift/llm/tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def prepare_model(model, args: SftArguments):

if args.neftune_alpha > 0.001:
neftune_config = NEFTuneConfig(noise_alpha=args.neftune_alpha)
model = Swift.prepare_model(model, neftune_config)
model = Swift.prepare_model(model, {'neftune': neftune_config})
logger.info(f'neftune_config: {neftune_config}')

class TrainerAdapterCallback(TrainerCallback):
Expand Down
4 changes: 2 additions & 2 deletions swift/tuners/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def __init__(self,
new_adapters.append(DEFAULT_ADAPTER)
else:
logger.warn(
f'Adater {DEFAULT_ADAPTER} has been patched, skip.')
f'Adapter {DEFAULT_ADAPTER} has been patched, skip.')
elif isinstance(config, dict):
assert (all(isinstance(c, SwiftConfig) for c in config.values()))
for adapter_name, _config in config.items():
Expand All @@ -66,7 +66,7 @@ def __init__(self,
new_adapters.append(adapter_name)
else:
logger.warn(
f'Adater {adapter_name} has been patched, skip.')
f'Adapter {adapter_name} has been patched, skip.')
self.model = model

self.extra_state_keys = extra_state_keys or []
Expand Down
2 changes: 1 addition & 1 deletion swift/tuners/neftune.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def neftune_hook(module, args, output):
sub_module.nef_activated = True

def state_dict_callback(state_dict, adapter_name):
return state_dict
return {}

def mark_trainable_callback(model):
return
Expand Down