modelscope · tastelikefeet · Aug 22, 2023 · Aug 19, 2023 · Aug 20, 2023 · Aug 20, 2023
diff --git a/examples/pytorch/llm/README.md b/examples/pytorch/llm/README.md
@@ -16,9 +16,10 @@
 
 ## Features
 1. supported sft method: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), full(full parameter fine tuning), ...
-2. supported models: [**qwen-7b**](https://github.com/QwenLM/Qwen-7B), baichuan-7b, baichuan-13b, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-13b, llama2-70b, openbuddy-llama2-13b, openbuddy-llama-65b, polylm-13b, ...
+2. supported models: [**qwen-7b**](https://github.com/QwenLM/Qwen-7B), qwen-7b-chat, baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, polylm-13b, ...
 3. supported feature: quantization, ddp, model parallelism(device map), gradient checkpoint, gradient accumulation steps, push to modelscope hub, custom datasets, ...
 4. supported datasets: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, ...
+5. supported templates: chatml(qwen), baichuan, chatglm2, llama, openbuddy_llama, default, ...
 
 ## Prepare the Environment
 Experimental environment: A10, 3090, A100, ... (V100 does not support bf16, quantization)
@@ -58,20 +59,25 @@ cd swift/examples/pytorch/llm
 # sft(qlora) and infer qwen-7b, Requires 16GB VRAM.
 # If you want to use quantification, you need to `pip install bitsandbytes`
 # If you want to push weights into modelscope hub during training, you need to set '--push_to_hub true'
-bash scripts/qwen_7b/qlora/sft.sh
-bash scripts/qwen_7b/qlora/infer.sh
+bash scripts/qwen_7b_chat/qlora/sft.sh
+bash scripts/qwen_7b_chat/qlora/infer.sh
 
 # sft(qlora+ddp) and infer qwen-7b, Requires 4*16GB VRAM.
-bash scripts/qwen_7b/qlora_ddp/sft.sh
-bash scripts/qwen_7b/qlora_ddp/infer.sh
+bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
+bash scripts/qwen_7b_chat/qlora_ddp/infer.sh
+
+# sft(lora+ddp) and infer qwen-7b, Requires 4*22GB VRAM.
+bash scripts/qwen_7b_chat/lora_ddp/sft.sh
+bash scripts/qwen_7b_chat/lora_ddp/infer.sh
 
 # sft(full) and infer qwen-7b, Requires 95GB VRAM.
-bash scripts/qwen_7b/full/sft.sh
-bash scripts/qwen_7b/full/infer.sh
+bash scripts/qwen_7b_chat/full/sft.sh
+bash scripts/qwen_7b_chat/full/infer.sh
 
 # For more scripts, please see `scripts/` folder
 ```
 
 ## Extend Datasets
-1. If you need to extend the model, you can modify the `MODEL_MAPPING` in `utils/models.py`. `model_id` can be specified as a local path. In this case, `revision` doesn't work.
-2. If you need to extend or customize the dataset, you can modify the `DATASET_MAPPING` in `utils/datasets.py`. You need to customize the `get_*_dataset` function, which returns a dataset with two columns: `instruction`, `output`.
+1. If you need to extend the model, you can modify the `MODEL_MAPPING` in `utils/model.py`. `model_id` can be specified as a local path. In this case, `revision` doesn't work.
+2. If you need to extend or customize the dataset, you can modify the `DATASET_MAPPING` in `utils/dataset.py`. You need to customize the `get_*_dataset` function, which returns a dataset with two columns: `query`, `response`.
+3. If you need to extend the template, you can modify the `TEMPLATE_MAPPING` in `utils/preprocess.py`.
diff --git a/examples/pytorch/llm/README_CN.md b/examples/pytorch/llm/README_CN.md
@@ -17,10 +17,10 @@
 
 ## 特性
 1. [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), 全参数微调, ...
-2. 支持的模型: [**qwen-7b**](https://github.com/QwenLM/Qwen-7B), baichuan-7b, baichuan-13b, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-13b, llama2-70b, openbuddy-llama2-13b, openbuddy-llama-65b, polylm-13b, ...
+2. 支持的模型: [**qwen-7b**](https://github.com/QwenLM/Qwen-7B), qwen-7b-chat, baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, polylm-13b, ...
 3. 支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpoint, 梯度累加, 支持推送modelscope hub, 支持自定义数据集, ...
 4. 支持的数据集: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, ...
-
+5. 支持的template: chatml(qwen), baichuan, chatglm2, llama, openbuddy_llama, default, ...
 
 ## 准备实验环境
 实验环境: A10, 3090, A100均可. (V100不支持bf16, 量化)
@@ -61,20 +61,25 @@ cd swift/examples/pytorch/llm
 # 微调(qlora)+推理 qwen-7b, 需要16GB显存.
 # 如果你想要使用量化, 你需要`pip install bitsandbytes`
 # 如果你想在训练时, 将权重push到modelscope hub中, 你需要设置`--push_to_hub true`
-bash scripts/qwen_7b/qlora/sft.sh
-bash scripts/qwen_7b/qlora/infer.sh
+bash scripts/qwen_7b_chat/qlora/sft.sh
+bash scripts/qwen_7b_chat/qlora/infer.sh
 
 # 微调(qlora+ddp)+推理 qwen-7b, 需要4卡*16GB显存.
-bash scripts/qwen_7b/qlora_ddp/sft.sh
-bash scripts/qwen_7b/qlora_ddp/infer.sh
+bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
+bash scripts/qwen_7b_chat/qlora_ddp/infer.sh
+
+# 微调(lora+ddp)+推理 qwen-7b, 需要4卡*22GB显存.
+bash scripts/qwen_7b_chat/lora_ddp/sft.sh
+bash scripts/qwen_7b_chat/lora_ddp/infer.sh
 
 # 微调(full)+推理 qwen-7b, 需要95G显存.
-bash scripts/qwen_7b/full/sft.sh
-bash scripts/qwen_7b/full/infer.sh
+bash scripts/qwen_7b_chat/full/sft.sh
+bash scripts/qwen_7b_chat/full/infer.sh
 
 # 更多的scripts脚本, 可以看`scripts`文件夹
 ```
 
 ## 拓展数据集
-1. 如果你想要拓展模型, 你可以修改`utils/models.py`文件中的`MODEL_MAPPING`. `model_id`可以指定为本地路径, 这种情况下, `revision`参数不起作用.
-2. 如果你想要拓展或使用自定义数据集, 你可以修改`utils/datasets.py`文件中的`DATASET_MAPPING`. 你需要自定义`get_*_dataset`函数, 并返回包含`instruction`, `output`两列的数据集.
+1. 如果你想要拓展模型, 你可以修改`utils/model.py`文件中的`MODEL_MAPPING`. `model_id`可以指定为本地路径, 这种情况下, `revision`参数不起作用.
+2. 如果你想要拓展或使用自定义数据集, 你可以修改`utils/dataset.py`文件中的`DATASET_MAPPING`. 你需要自定义`get_*_dataset`函数, 并返回包含`query`, `response`两列的数据集.
+3. 如果你想要拓展template, 你可以修改`utils/preprocess.py`文件中的`TEMPLATE_MAPPING`.
diff --git a/...m/scripts/baichuan_13b/qlora_ddp/infer.sh → ...ipts/baichuan_13b_chat/qlora_ddp/infer.sh b/...m/scripts/baichuan_13b/qlora_ddp/infer.sh → ...ipts/baichuan_13b_chat/qlora_ddp/infer.sh
@@ -1,9 +1,9 @@
 # 12G
 CUDA_VISIBLE_DEVICES=0 \
 python src/llm_infer.py \
-    --model_type baichuan-13b \
+    --model_type baichuan-13b-chat \
     --sft_type lora \
-    --ckpt_dir "runs/baichuan-13b/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "runs/baichuan-13b-chat/vx_xxx/checkpoint-xxx" \
     --eval_human true \
     --quantization_bit 4 \
     --max_new_tokens 1024 \

diff --git a/...llm/scripts/baichuan_13b/qlora_ddp/sft.sh → ...cripts/baichuan_13b_chat/qlora_ddp/sft.sh b/...llm/scripts/baichuan_13b/qlora_ddp/sft.sh → ...cripts/baichuan_13b_chat/qlora_ddp/sft.sh
@@ -5,7 +5,7 @@ torchrun \
     --nproc_per_node=$nproc_per_node \
     --master_port 29500 \
     src/llm_sft.py \
-    --model_type baichuan-13b \
+    --model_type baichuan-13b-chat \
     --sft_type lora \
     --output_dir runs \
     --ddp_backend nccl \

diff --git a/...rch/llm/scripts/llama2_70b/qlora/infer.sh → ...lm/scripts/llama2_70b_chat/qlora/infer.sh b/...rch/llm/scripts/llama2_70b/qlora/infer.sh → ...lm/scripts/llama2_70b_chat/qlora/infer.sh
@@ -1,9 +1,9 @@
 # 40G
 CUDA_VISIBLE_DEVICES=0,1 \
 python src/llm_infer.py \
-    --model_type llama2-7b \
+    --model_type llama2-7b-chat \
     --sft_type lora \
-    --ckpt_dir "runs/llama2-70b/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "runs/llama2-70b-chat/vx_xxx/checkpoint-xxx" \
     --eval_human true \
     --quantization_bit 4 \
     --max_new_tokens 1024 \

diff --git a/...torch/llm/scripts/llama2_70b/qlora/sft.sh → .../llm/scripts/llama2_70b_chat/qlora/sft.sh b/...torch/llm/scripts/llama2_70b/qlora/sft.sh → .../llm/scripts/llama2_70b_chat/qlora/sft.sh
@@ -2,10 +2,10 @@
 # llama2 is not good at Chinese
 CUDA_VISIBLE_DEVICES=0,1 \
 python src/llm_sft.py \
-    --model_type llama2-70b \
+    --model_type llama2-70b-chat \
     --sft_type lora \
     --output_dir runs \
-    --dataset alpaca-en,alpaca-zh \
+    --dataset alpaca-en \
     --dataset_sample 20000 \
     --num_train_epochs 1 \
     --max_length 1024 \

diff --git a/examples/pytorch/llm/scripts/qwen_7b/qlora_ddp/infer.sh b/examples/pytorch/llm/scripts/qwen_7b/qlora_ddp/infer.sh
@@ -3,10 +3,12 @@ CUDA_VISIBLE_DEVICES=0 \
 python src/llm_infer.py \
     --model_type qwen-7b \
     --sft_type lora \
+    --template_type chatml \
     --dtype bf16 \
     --ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
     --eval_human true \
     --quantization_bit 4 \
+    --bnb_4bit_comp_dtype bf16 \
     --max_new_tokens 1024 \
     --temperature 0.9 \
     --top_k 50 \

diff --git a/examples/pytorch/llm/scripts/qwen_7b/qlora_ddp/sft.sh b/examples/pytorch/llm/scripts/qwen_7b/qlora_ddp/sft.sh
@@ -7,6 +7,7 @@ torchrun \
     src/llm_sft.py \
     --model_type qwen-7b \
     --sft_type lora \
+    --template_type chatml \
     --dtype bf16 \
     --output_dir runs \
     --ddp_backend nccl \
@@ -15,6 +16,7 @@ torchrun \
     --num_train_epochs 1 \
     --max_length 1024 \
     --quantization_bit 4 \
+    --bnb_4bit_comp_dtype bf16 \
     --lora_rank 64 \
     --lora_alpha 32 \
     --lora_dropout_p 0.05 \

diff --git a/...pytorch/llm/scripts/qwen_7b/full/infer.sh → ...ch/llm/scripts/qwen_7b_chat/full/infer.sh b/...pytorch/llm/scripts/qwen_7b/full/infer.sh → ...ch/llm/scripts/qwen_7b_chat/full/infer.sh
@@ -1,10 +1,11 @@
 # 19G
 CUDA_VISIBLE_DEVICES=0 \
 python src/llm_infer.py \
-    --model_type qwen-7b \
+    --model_type qwen-7b-chat \
     --sft_type full \
+    --template_type chatml \
     --dtype bf16 \
-    --ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
     --eval_human true \
     --max_new_tokens 1024 \
     --temperature 0.9 \

diff --git a/...s/pytorch/llm/scripts/qwen_7b/full/sft.sh → ...orch/llm/scripts/qwen_7b_chat/full/sft.sh b/...s/pytorch/llm/scripts/qwen_7b/full/sft.sh → ...orch/llm/scripts/qwen_7b_chat/full/sft.sh
@@ -2,8 +2,9 @@
 # Experimental environment: 8 * 3090
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
 python src/llm_sft.py \
-    --model_type qwen-7b \
+    --model_type qwen-7b-chat \
     --sft_type full \
+    --template_type chatml \
     --dtype bf16 \
     --output_dir runs \
     --dataset alpaca-en,alpaca-zh \
@@ -22,6 +23,6 @@ python src/llm_sft.py \
     --logging_steps 10 \
     --use_flash_attn false \
     --push_to_hub false \
-    --hub_model_id qwen-7b-full \
+    --hub_model_id qwen-7b-chat-full \
     --hub_private_repo true \
     --hub_token 'your-sdk-token' \
diff --git a/...rch/llm/scripts/qwen_7b/lora_ddp/infer.sh → ...lm/scripts/qwen_7b_chat/lora_ddp/infer.sh b/...rch/llm/scripts/qwen_7b/lora_ddp/infer.sh → ...lm/scripts/qwen_7b_chat/lora_ddp/infer.sh
@@ -1,10 +1,11 @@
 # 19G
 CUDA_VISIBLE_DEVICES=0 \
 python src/llm_infer.py \
-    --model_type qwen-7b \
+    --model_type qwen-7b-chat \
     --sft_type lora \
+    --template_type chatml \
     --dtype bf16 \
-    --ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
     --eval_human true \
     --max_new_tokens 1024 \
     --temperature 0.9 \

diff --git a/...torch/llm/scripts/qwen_7b/lora_ddp/sft.sh → .../llm/scripts/qwen_7b_chat/lora_ddp/sft.sh b/...torch/llm/scripts/qwen_7b/lora_ddp/sft.sh → .../llm/scripts/qwen_7b_chat/lora_ddp/sft.sh
@@ -5,8 +5,9 @@ torchrun \
     --nproc_per_node=$nproc_per_node \
     --master_port 29500 \
     src/llm_sft.py \
-    --model_type qwen-7b \
+    --model_type qwen-7b-chat \
     --sft_type lora \
+    --template_type chatml \
     --dtype bf16 \
     --output_dir runs \
     --ddp_backend nccl \
@@ -29,6 +30,6 @@ torchrun \
     --logging_steps 10 \
     --use_flash_attn false \
     --push_to_hub false \
-    --hub_model_id qwen-7b-lora \
+    --hub_model_id qwen-7b-chat-lora \
     --hub_private_repo true \
     --hub_token 'your-sdk-token' \
diff --git a/...ytorch/llm/scripts/qwen_7b/qlora/infer.sh → ...h/llm/scripts/qwen_7b_chat/qlora/infer.sh b/...ytorch/llm/scripts/qwen_7b/qlora/infer.sh → ...h/llm/scripts/qwen_7b_chat/qlora/infer.sh
@@ -1,12 +1,14 @@
 # 10G
 CUDA_VISIBLE_DEVICES=0 \
 python src/llm_infer.py \
-    --model_type qwen-7b \
+    --model_type qwen-7b-chat \
     --sft_type lora \
+    --template_type chatml \
     --dtype bf16 \
-    --ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
+    --ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
     --eval_human true \
     --quantization_bit 4 \
+    --bnb_4bit_comp_dtype bf16 \
     --max_new_tokens 1024 \
     --temperature 0.9 \
     --top_k 50 \

diff --git a/.../pytorch/llm/scripts/qwen_7b/qlora/sft.sh → ...rch/llm/scripts/qwen_7b_chat/qlora/sft.sh b/.../pytorch/llm/scripts/qwen_7b/qlora/sft.sh → ...rch/llm/scripts/qwen_7b_chat/qlora/sft.sh
@@ -1,15 +1,17 @@
 # 16GB VRAM
 CUDA_VISIBLE_DEVICES=0 \
 python src/llm_sft.py \
-    --model_type qwen-7b \
+    --model_type qwen-7b-chat \
     --sft_type lora \
+    --template_type chatml \
     --dtype bf16 \
     --output_dir runs \
     --dataset alpaca-en,alpaca-zh \
     --dataset_sample -1 \
     --num_train_epochs 1 \
     --max_length 1024 \
     --quantization_bit 4 \
+    --bnb_4bit_comp_dtype bf16 \
     --lora_rank 64 \
     --lora_alpha 32 \
     --lora_dropout_p 0.05 \
@@ -26,6 +28,6 @@ python src/llm_sft.py \
     --logging_steps 10 \
     --use_flash_attn false \
     --push_to_hub false \
-    --hub_model_id qwen-7b-qlora \
+    --hub_model_id qwen-7b-chat-qlora \
     --hub_private_repo true \
     --hub_token 'your-sdk-token' \
diff --git a/examples/pytorch/llm/scripts/qwen_7b_chat/qlora_ddp/infer.sh b/examples/pytorch/llm/scripts/qwen_7b_chat/qlora_ddp/infer.sh
@@ -0,0 +1,16 @@
+# 10G
+CUDA_VISIBLE_DEVICES=0 \
+python src/llm_infer.py \
+    --model_type qwen-7b-chat \
+    --sft_type lora \
+    --template_type chatml \
+    --dtype bf16 \
+    --ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
+    --eval_human true \
+    --quantization_bit 4 \
+    --bnb_4bit_comp_dtype bf16 \
+    --max_new_tokens 1024 \
+    --temperature 0.9 \
+    --top_k 50 \
+    --top_p 0.9 \
+    --do_sample true \
diff --git a/examples/pytorch/llm/scripts/qwen_7b_chat/qlora_ddp/sft.sh b/examples/pytorch/llm/scripts/qwen_7b_chat/qlora_ddp/sft.sh
@@ -0,0 +1,38 @@
+# 4 * 16GB VRAM
+nproc_per_node=4
+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+torchrun \
+    --nproc_per_node=$nproc_per_node \
+    --master_port 29500 \
+    src/llm_sft.py \
+    --model_type qwen-7b-chat \
+    --sft_type lora \
+    --template_type chatml \
+    --dtype bf16 \
+    --output_dir runs \
+    --ddp_backend nccl \
+    --dataset alpaca-en,alpaca-zh \
+    --dataset_sample -1 \
+    --num_train_epochs 1 \
+    --max_length 1024 \
+    --quantization_bit 4 \
+    --bnb_4bit_comp_dtype bf16 \
+    --lora_rank 64 \
+    --lora_alpha 32 \
+    --lora_dropout_p 0.05 \
+    --lora_target_modules ALL \
+    --batch_size 1 \
+    --weight_decay 0. \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 50 \
+    --save_steps 50 \
+    --save_total_limit 2 \
+    --logging_steps 10 \
+    --use_flash_attn false \
+    --push_to_hub false \
+    --hub_model_id qwen-7b-chat-qlora \
+    --hub_private_repo true \
+    --hub_token 'your-sdk-token' \