Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 15 additions & 9 deletions examples/pytorch/llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@

## Features
1. supported sft method: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), full(full parameter fine tuning), ...
2. supported models: [**qwen-7b**](https://github.com/QwenLM/Qwen-7B), baichuan-7b, baichuan-13b, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-13b, llama2-70b, openbuddy-llama2-13b, openbuddy-llama-65b, polylm-13b, ...
2. supported models: [**qwen-7b**](https://github.com/QwenLM/Qwen-7B), qwen-7b-chat, baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, polylm-13b, ...
3. supported feature: quantization, ddp, model parallelism(device map), gradient checkpoint, gradient accumulation steps, push to modelscope hub, custom datasets, ...
4. supported datasets: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, ...
5. supported templates: chatml(qwen), baichuan, chatglm2, llama, openbuddy_llama, default, ...

## Prepare the Environment
Experimental environment: A10, 3090, A100, ... (V100 does not support bf16, quantization)
Expand Down Expand Up @@ -58,20 +59,25 @@ cd swift/examples/pytorch/llm
# sft(qlora) and infer qwen-7b, Requires 16GB VRAM.
# If you want to use quantification, you need to `pip install bitsandbytes`
# If you want to push weights into modelscope hub during training, you need to set '--push_to_hub true'
bash scripts/qwen_7b/qlora/sft.sh
bash scripts/qwen_7b/qlora/infer.sh
bash scripts/qwen_7b_chat/qlora/sft.sh
bash scripts/qwen_7b_chat/qlora/infer.sh

# sft(qlora+ddp) and infer qwen-7b, Requires 4*16GB VRAM.
bash scripts/qwen_7b/qlora_ddp/sft.sh
bash scripts/qwen_7b/qlora_ddp/infer.sh
bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
bash scripts/qwen_7b_chat/qlora_ddp/infer.sh

# sft(lora+ddp) and infer qwen-7b, Requires 4*22GB VRAM.
bash scripts/qwen_7b_chat/lora_ddp/sft.sh
bash scripts/qwen_7b_chat/lora_ddp/infer.sh

# sft(full) and infer qwen-7b, Requires 95GB VRAM.
bash scripts/qwen_7b/full/sft.sh
bash scripts/qwen_7b/full/infer.sh
bash scripts/qwen_7b_chat/full/sft.sh
bash scripts/qwen_7b_chat/full/infer.sh

# For more scripts, please see `scripts/` folder
```

## Extend Datasets
1. If you need to extend the model, you can modify the `MODEL_MAPPING` in `utils/models.py`. `model_id` can be specified as a local path. In this case, `revision` doesn't work.
2. If you need to extend or customize the dataset, you can modify the `DATASET_MAPPING` in `utils/datasets.py`. You need to customize the `get_*_dataset` function, which returns a dataset with two columns: `instruction`, `output`.
1. If you need to extend the model, you can modify the `MODEL_MAPPING` in `utils/model.py`. `model_id` can be specified as a local path. In this case, `revision` doesn't work.
2. If you need to extend or customize the dataset, you can modify the `DATASET_MAPPING` in `utils/dataset.py`. You need to customize the `get_*_dataset` function, which returns a dataset with two columns: `query`, `response`.
3. If you need to extend the template, you can modify the `TEMPLATE_MAPPING` in `utils/preprocess.py`.
25 changes: 15 additions & 10 deletions examples/pytorch/llm/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@

## 特性
1. [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), 全参数微调, ...
2. 支持的模型: [**qwen-7b**](https://github.com/QwenLM/Qwen-7B), baichuan-7b, baichuan-13b, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-13b, llama2-70b, openbuddy-llama2-13b, openbuddy-llama-65b, polylm-13b, ...
2. 支持的模型: [**qwen-7b**](https://github.com/QwenLM/Qwen-7B), qwen-7b-chat, baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, polylm-13b, ...
3. 支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpoint, 梯度累加, 支持推送modelscope hub, 支持自定义数据集, ...
4. 支持的数据集: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, ...

5. 支持的template: chatml(qwen), baichuan, chatglm2, llama, openbuddy_llama, default, ...

## 准备实验环境
实验环境: A10, 3090, A100均可. (V100不支持bf16, 量化)
Expand Down Expand Up @@ -61,20 +61,25 @@ cd swift/examples/pytorch/llm
# 微调(qlora)+推理 qwen-7b, 需要16GB显存.
# 如果你想要使用量化, 你需要`pip install bitsandbytes`
# 如果你想在训练时, 将权重push到modelscope hub中, 你需要设置`--push_to_hub true`
bash scripts/qwen_7b/qlora/sft.sh
bash scripts/qwen_7b/qlora/infer.sh
bash scripts/qwen_7b_chat/qlora/sft.sh
bash scripts/qwen_7b_chat/qlora/infer.sh

# 微调(qlora+ddp)+推理 qwen-7b, 需要4卡*16GB显存.
bash scripts/qwen_7b/qlora_ddp/sft.sh
bash scripts/qwen_7b/qlora_ddp/infer.sh
bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
bash scripts/qwen_7b_chat/qlora_ddp/infer.sh

# 微调(lora+ddp)+推理 qwen-7b, 需要4卡*22GB显存.
bash scripts/qwen_7b_chat/lora_ddp/sft.sh
bash scripts/qwen_7b_chat/lora_ddp/infer.sh

# 微调(full)+推理 qwen-7b, 需要95G显存.
bash scripts/qwen_7b/full/sft.sh
bash scripts/qwen_7b/full/infer.sh
bash scripts/qwen_7b_chat/full/sft.sh
bash scripts/qwen_7b_chat/full/infer.sh

# 更多的scripts脚本, 可以看`scripts`文件夹
```

## 拓展数据集
1. 如果你想要拓展模型, 你可以修改`utils/models.py`文件中的`MODEL_MAPPING`. `model_id`可以指定为本地路径, 这种情况下, `revision`参数不起作用.
2. 如果你想要拓展或使用自定义数据集, 你可以修改`utils/datasets.py`文件中的`DATASET_MAPPING`. 你需要自定义`get_*_dataset`函数, 并返回包含`instruction`, `output`两列的数据集.
1. 如果你想要拓展模型, 你可以修改`utils/model.py`文件中的`MODEL_MAPPING`. `model_id`可以指定为本地路径, 这种情况下, `revision`参数不起作用.
2. 如果你想要拓展或使用自定义数据集, 你可以修改`utils/dataset.py`文件中的`DATASET_MAPPING`. 你需要自定义`get_*_dataset`函数, 并返回包含`query`, `response`两列的数据集.
3. 如果你想要拓展template, 你可以修改`utils/preprocess.py`文件中的`TEMPLATE_MAPPING`.
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# 12G
CUDA_VISIBLE_DEVICES=0 \
python src/llm_infer.py \
--model_type baichuan-13b \
--model_type baichuan-13b-chat \
--sft_type lora \
--ckpt_dir "runs/baichuan-13b/vx_xxx/checkpoint-xxx" \
--ckpt_dir "runs/baichuan-13b-chat/vx_xxx/checkpoint-xxx" \
--eval_human true \
--quantization_bit 4 \
--max_new_tokens 1024 \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ torchrun \
--nproc_per_node=$nproc_per_node \
--master_port 29500 \
src/llm_sft.py \
--model_type baichuan-13b \
--model_type baichuan-13b-chat \
--sft_type lora \
--output_dir runs \
--ddp_backend nccl \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# 40G
CUDA_VISIBLE_DEVICES=0,1 \
python src/llm_infer.py \
--model_type llama2-7b \
--model_type llama2-7b-chat \
--sft_type lora \
--ckpt_dir "runs/llama2-70b/vx_xxx/checkpoint-xxx" \
--ckpt_dir "runs/llama2-70b-chat/vx_xxx/checkpoint-xxx" \
--eval_human true \
--quantization_bit 4 \
--max_new_tokens 1024 \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
# llama2 is not good at Chinese
CUDA_VISIBLE_DEVICES=0,1 \
python src/llm_sft.py \
--model_type llama2-70b \
--model_type llama2-70b-chat \
--sft_type lora \
--output_dir runs \
--dataset alpaca-en,alpaca-zh \
--dataset alpaca-en \
--dataset_sample 20000 \
--num_train_epochs 1 \
--max_length 1024 \
Expand Down
2 changes: 2 additions & 0 deletions examples/pytorch/llm/scripts/qwen_7b/qlora_ddp/infer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@ CUDA_VISIBLE_DEVICES=0 \
python src/llm_infer.py \
--model_type qwen-7b \
--sft_type lora \
--template_type chatml \
--dtype bf16 \
--ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
--eval_human true \
--quantization_bit 4 \
--bnb_4bit_comp_dtype bf16 \
--max_new_tokens 1024 \
--temperature 0.9 \
--top_k 50 \
Expand Down
2 changes: 2 additions & 0 deletions examples/pytorch/llm/scripts/qwen_7b/qlora_ddp/sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ torchrun \
src/llm_sft.py \
--model_type qwen-7b \
--sft_type lora \
--template_type chatml \
--dtype bf16 \
--output_dir runs \
--ddp_backend nccl \
Expand All @@ -15,6 +16,7 @@ torchrun \
--num_train_epochs 1 \
--max_length 1024 \
--quantization_bit 4 \
--bnb_4bit_comp_dtype bf16 \
--lora_rank 64 \
--lora_alpha 32 \
--lora_dropout_p 0.05 \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# 19G
CUDA_VISIBLE_DEVICES=0 \
python src/llm_infer.py \
--model_type qwen-7b \
--model_type qwen-7b-chat \
--sft_type full \
--template_type chatml \
--dtype bf16 \
--ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
--ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
--eval_human true \
--max_new_tokens 1024 \
--temperature 0.9 \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
# Experimental environment: 8 * 3090
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
python src/llm_sft.py \
--model_type qwen-7b \
--model_type qwen-7b-chat \
--sft_type full \
--template_type chatml \
--dtype bf16 \
--output_dir runs \
--dataset alpaca-en,alpaca-zh \
Expand All @@ -22,6 +23,6 @@ python src/llm_sft.py \
--logging_steps 10 \
--use_flash_attn false \
--push_to_hub false \
--hub_model_id qwen-7b-full \
--hub_model_id qwen-7b-chat-full \
--hub_private_repo true \
--hub_token 'your-sdk-token' \
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# 19G
CUDA_VISIBLE_DEVICES=0 \
python src/llm_infer.py \
--model_type qwen-7b \
--model_type qwen-7b-chat \
--sft_type lora \
--template_type chatml \
--dtype bf16 \
--ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
--ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
--eval_human true \
--max_new_tokens 1024 \
--temperature 0.9 \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ torchrun \
--nproc_per_node=$nproc_per_node \
--master_port 29500 \
src/llm_sft.py \
--model_type qwen-7b \
--model_type qwen-7b-chat \
--sft_type lora \
--template_type chatml \
--dtype bf16 \
--output_dir runs \
--ddp_backend nccl \
Expand All @@ -29,6 +30,6 @@ torchrun \
--logging_steps 10 \
--use_flash_attn false \
--push_to_hub false \
--hub_model_id qwen-7b-lora \
--hub_model_id qwen-7b-chat-lora \
--hub_private_repo true \
--hub_token 'your-sdk-token' \
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# 10G
CUDA_VISIBLE_DEVICES=0 \
python src/llm_infer.py \
--model_type qwen-7b \
--model_type qwen-7b-chat \
--sft_type lora \
--template_type chatml \
--dtype bf16 \
--ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
--ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
--eval_human true \
--quantization_bit 4 \
--bnb_4bit_comp_dtype bf16 \
--max_new_tokens 1024 \
--temperature 0.9 \
--top_k 50 \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# 16GB VRAM
CUDA_VISIBLE_DEVICES=0 \
python src/llm_sft.py \
--model_type qwen-7b \
--model_type qwen-7b-chat \
--sft_type lora \
--template_type chatml \
--dtype bf16 \
--output_dir runs \
--dataset alpaca-en,alpaca-zh \
--dataset_sample -1 \
--num_train_epochs 1 \
--max_length 1024 \
--quantization_bit 4 \
--bnb_4bit_comp_dtype bf16 \
--lora_rank 64 \
--lora_alpha 32 \
--lora_dropout_p 0.05 \
Expand All @@ -26,6 +28,6 @@ python src/llm_sft.py \
--logging_steps 10 \
--use_flash_attn false \
--push_to_hub false \
--hub_model_id qwen-7b-qlora \
--hub_model_id qwen-7b-chat-qlora \
--hub_private_repo true \
--hub_token 'your-sdk-token' \
16 changes: 16 additions & 0 deletions examples/pytorch/llm/scripts/qwen_7b_chat/qlora_ddp/infer.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# 10G
CUDA_VISIBLE_DEVICES=0 \
python src/llm_infer.py \
--model_type qwen-7b-chat \
--sft_type lora \
--template_type chatml \
--dtype bf16 \
--ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
--eval_human true \
--quantization_bit 4 \
--bnb_4bit_comp_dtype bf16 \
--max_new_tokens 1024 \
--temperature 0.9 \
--top_k 50 \
--top_p 0.9 \
--do_sample true \
38 changes: 38 additions & 0 deletions examples/pytorch/llm/scripts/qwen_7b_chat/qlora_ddp/sft.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# 4 * 16GB VRAM
nproc_per_node=4
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun \
--nproc_per_node=$nproc_per_node \
--master_port 29500 \
src/llm_sft.py \
--model_type qwen-7b-chat \
--sft_type lora \
--template_type chatml \
--dtype bf16 \
--output_dir runs \
--ddp_backend nccl \
--dataset alpaca-en,alpaca-zh \
--dataset_sample -1 \
--num_train_epochs 1 \
--max_length 1024 \
--quantization_bit 4 \
--bnb_4bit_comp_dtype bf16 \
--lora_rank 64 \
--lora_alpha 32 \
--lora_dropout_p 0.05 \
--lora_target_modules ALL \
--batch_size 1 \
--weight_decay 0. \
--learning_rate 1e-4 \
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
--max_grad_norm 0.5 \
--warmup_ratio 0.03 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 10 \
--use_flash_attn false \
--push_to_hub false \
--hub_model_id qwen-7b-chat-qlora \
--hub_private_repo true \
--hub_token 'your-sdk-token' \
Loading