diff --git a/README.md b/README.md index d618793efb..7bfe020f36 100644 --- a/README.md +++ b/README.md @@ -41,26 +41,27 @@ Users can check the [documentation of Swift](docs/source/GetStarted/Introduction - 🔥 2023.11.07: Support the finetuning of yi-6b model, scripts can be found at: `scripts/yi_6b`. - 🔥 2023.10.30: Support QA-LoRA and LongLoRA to decrease memory usage in training. - 🔥 2023.10.30: Support ROME(Rank One Model Editing) to add/modify knowledges, training is not needed! -- 🔥 2023.10.27: Support for chatglm3 series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. The corresponding shell script can be found in `scripts/chatglm3_6b_32k`. -- 🔥 2023.10.17: Supported int4, int8 models: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8. The corresponding shell script can be found at `scripts/qwen_7b_chat_int4`, `scripts/qwen_14b_chat_int4`, `scripts/qwen_vl_chat_int4`, `scripts/qwen_7b_chat_int8`, `scripts/qwen_14b_chat_int8`. -- 2023.10.15: Supported ziya2-13b model series: ziya2-13b, ziya2-13b-chat. The corresponding shell script can be found at `scripts/ziya2_13b_chat`. -- 2023.10.12: Supported mistral-7b model series: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat. The corresponding shell script can be found at `scripts/openbuddy_mistral_7b_chat`, `scripts/mistral_7b_chat`. -- 🔥 2023.10.7: Supported DeepSpeed ZeRO-2, enabling LoRA (not just QLoRA) to run DDP on 2*A10. The corresponding shell script can be found at `scripts/qwen_7b_chat/lora_ddp_ds/sft.sh`. -- 🔥 2023.9.25: Supported qwen-14b model series: qwen-14b, qwen-14b-chat. The corresponding shell script can be found at `scripts/qwen_14b`, `scripts/qwen_14b_chat`. -- 2023.9.12: Supported training with MP+DDP to accelerate full-parameter fine-tuning speed. The corresponding shell script can be found at `scripts/qwen_7b_chat/full_mp_ddp/sft.sh`. +- 🔥 2023.10.27: Support for chatglm3 series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. +- 🔥 2023.10.17: Supported int4, int8 models: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8. +- 2023.10.15: Supported ziya2-13b model series: ziya2-13b, ziya2-13b-chat. +- 2023.10.12: Supported mistral-7b model series: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat. +- 🔥 2023.10.7: Supported DeepSpeed ZeRO-2, enabling LoRA (not just QLoRA) to run DDP on 2*A10. +- 🔥 2023.9.25: Supported qwen-14b model series: qwen-14b, qwen-14b-chat. +- 2023.9.12: Supported training with MP+DDP to accelerate full-parameter fine-tuning speed. ## ✨ LLM SFT Example Press [this link](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm) to view the detail documentation of these examples. ### Basic Usage Quickly fine-tune, infer with LLM, and build a Web-UI. -#### Run using Python + ```bash git clone https://github.com/modelscope/swift.git cd swift pip install .[llm] ``` +#### Run using Python ```python # Experimental environment: A10, 3090, A100, ... # 16GB GPU memory @@ -129,14 +130,15 @@ CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx' - Supported SFT Methods: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), full(full parameter fine-tuning) - Supported Features: quantization, DDP, model parallelism, gradient checkpointing, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ... - Supported Models: - - 🔥 qwen series: [qwen-7b](https://modelscope.cn/models/qwen/Qwen-7B/summary), [qwen-7b-chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary), [qwen-14b](https://modelscope.cn/models/qwen/Qwen-14B/summary), [qwen-14b-chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), [qwen-7b-chat-int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary), [qwen-14b-chat-int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary), [qwen-7b-chat-int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary), [qwen-14b-chat-int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary) - - 🔥 qwen-vl series: [qwen-vl](https://modelscope.cn/models/qwen/Qwen-VL/summary), [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary), [qwen-vl-chat-int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary) + - qwen series: [qwen-7b](https://modelscope.cn/models/qwen/Qwen-7B/summary), [qwen-7b-chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary), [qwen-14b](https://modelscope.cn/models/qwen/Qwen-14B/summary), [qwen-14b-chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), [qwen-7b-chat-int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary), [qwen-14b-chat-int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary), [qwen-7b-chat-int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary), [qwen-14b-chat-int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary) + - qwen-vl series: [qwen-vl](https://modelscope.cn/models/qwen/Qwen-VL/summary), [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary), [qwen-vl-chat-int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary) - baichuan series: [baichuan-7b](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary), [baichuan-13b](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary), [baichuan-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary), [baichuan2-7b](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary), [baichuan2-7b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary), [baichuan2-13b](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary), [baichuan2-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary), [baichuan2-7b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary), [baichuan2-13b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary) - chatglm series: [chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary), [chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary), [chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary), [chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary), [chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary) - llama series: [llama2-7b](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary), [llama2-7b-chat](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary), [llama2-13b](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary), [llama2-13b-chat](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary), [llama2-70b](https://modelscope.cn/models/modelscope/Llama-2-70b-ms/summary), [llama2-70b-chat](https://modelscope.cn/models/modelscope/Llama-2-70b-chat-ms/summary) - openbuddy series: [openbuddy-llama2-13b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary), [openbuddy-llama-65b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama-65b-v8-bf16/summary), [openbuddy-llama2-70b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16/summary), [openbuddy-mistral-7b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-mistral-7b-v13.1/summary) - internlm series: [internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary), [internlm-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary), [internlm-7b-chat-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary), [internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary), [internlm-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary) - xverse series: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary) + - bluelm series: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary) - mistral series: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary) - ziya series: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary) - skywork series: [skywork-13b](https://modelscope.cn/models/skywork/Skywork-13B-base/summary), [skywork-13b-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary) diff --git a/README_CN.md b/README_CN.md index c226f9b461..ec58dc1c0e 100644 --- a/README_CN.md +++ b/README_CN.md @@ -37,28 +37,29 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展 ## 🎉 新闻 - 🔥 2023.11.08: 支持xverse-65b模型的训练和推理流程,脚本在`scripts/xverse_65b`. - 🔥 2023.11.07: 支持yi-6b模型的训练和推理流程,脚本在`scripts/yi_6b`. -- 🔥 2023.10.30: 支持 QA-LoRA 和 LongLoRA两种新的tuners +- 🔥 2023.10.30: 支持 QA-LoRA 和 LongLoRA两种新的tuners. - 🔥 2023.10.30: 支持使用ROME(Rank One Model Editing)来编辑模型,在无需训练的情况下即可给模型灌注新知识! -- 🔥 2023.10.27: 支持chatglm3系列模型: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. 对应的sh脚本可以查看`scripts/chatglm3_6b_32k`. -- 🔥 2023.10.17: 支持int4, int8模型的SFT: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8. 对应的sh脚本可以查看`scripts/qwen_7b_chat_int4`, `scripts/qwen_14b_chat_int4`, `scripts/qwen_vl_chat_int4`, `scripts/qwen_7b_chat_int8`, `scripts/qwen_14b_chat_int8`. -- 2023.10.15: 支持ziya2-13b系列模型: ziya2-13b, ziya2-13b-chat. 对应的sh脚本可以查看`scripts/ziya2_13b_chat`. -- 2023.10.12: 支持mistral-7b系列模型: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat. 对应的sh脚本可以查看`scripts/openbuddy_mistral_7b_chat`, `scripts/mistral_7b_chat`. -- 🔥 2023.10.7: 支持DeepSpeed ZeRO-2, 使得lora(不仅仅是qlora)可以在双卡A10上运行DDP. 对应的sh脚本可以查看`scripts/qwen_7b_chat/lora_ddp_ds/sft.sh`. -- 🔥 2023.9.25: 支持**qwen-14b**系列模型: qwen-14b, qwen-14b-chat. 对应的sh脚本可以查看`scripts/qwen_14b`, `scripts/qwen_14b_chat`. -- 2023.9.12: 支持MP+DDP的方式训练, 加快全参数微调的速度, 对应的sh脚本可以查看`scripts/qwen_7b_chat/full_mp_ddp/sft.sh`. +- 🔥 2023.10.27: 支持chatglm3系列模型: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. +- 🔥 2023.10.17: 支持int4, int8模型的SFT: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8. +- 2023.10.15: 支持ziya2-13b系列模型: ziya2-13b, ziya2-13b-chat. +- 2023.10.12: 支持mistral-7b系列模型: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat. +- 🔥 2023.10.7: 支持DeepSpeed ZeRO-2, 使得lora(不仅仅是qlora)可以在双卡A10上运行DDP. +- 🔥 2023.9.25: 支持**qwen-14b**系列模型: qwen-14b, qwen-14b-chat. +- 2023.9.12: 支持MP+DDP的方式训练, 加快全参数微调的速度. ## ✨ 大模型微调的例子 可以[在这里](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm) 查看LLM微调的使用文档。 ### 简单使用 快速对LLM进行微调, 推理并搭建Web-UI. -#### 使用python运行 + ```bash git clone https://github.com/modelscope/swift.git cd swift pip install .[llm] ``` +#### 使用python运行 ```python # Experimental environment: A10, 3090, A100, ... # 16GB GPU memory @@ -128,14 +129,15 @@ CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx' - 支持的SFT方法: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), 全参数微调 - 支持的特性: 模型量化, DDP, 模型并行, gradient checkpointing, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, ... - 支持的模型 - - 🔥 qwen 系列: [qwen-7b](https://modelscope.cn/models/qwen/Qwen-7B/summary), [qwen-7b-chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary), [qwen-14b](https://modelscope.cn/models/qwen/Qwen-14B/summary), [qwen-14b-chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), [qwen-7b-chat-int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary), [qwen-14b-chat-int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary), [qwen-7b-chat-int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary), [qwen-14b-chat-int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary) - - 🔥 qwen-vl 系列: [qwen-vl](https://modelscope.cn/models/qwen/Qwen-VL/summary), [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary), [qwen-vl-chat-int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary) + - qwen 系列: [qwen-7b](https://modelscope.cn/models/qwen/Qwen-7B/summary), [qwen-7b-chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary), [qwen-14b](https://modelscope.cn/models/qwen/Qwen-14B/summary), [qwen-14b-chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), [qwen-7b-chat-int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary), [qwen-14b-chat-int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary), [qwen-7b-chat-int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary), [qwen-14b-chat-int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary) + - qwen-vl 系列: [qwen-vl](https://modelscope.cn/models/qwen/Qwen-VL/summary), [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary), [qwen-vl-chat-int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary) - baichuan 系列: [baichuan-7b](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary), [baichuan-13b](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary), [baichuan-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary), [baichuan2-7b](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary), [baichuan2-7b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary), [baichuan2-13b](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary), [baichuan2-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary), [baichuan2-7b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary), [baichuan2-13b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary) - chatglm 系列: [chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary), [chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary), [chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary), [chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary), [chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary) - llama 系列: [llama2-7b](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary), [llama2-7b-chat](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary), [llama2-13b](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary), [llama2-13b-chat](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary), [llama2-70b](https://modelscope.cn/models/modelscope/Llama-2-70b-ms/summary), [llama2-70b-chat](https://modelscope.cn/models/modelscope/Llama-2-70b-chat-ms/summary) - openbuddy 系列: [openbuddy-llama2-13b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary), [openbuddy-llama-65b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama-65b-v8-bf16/summary), [openbuddy-llama2-70b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16/summary), [openbuddy-mistral-7b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-mistral-7b-v13.1/summary) - internlm 系列: [internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary), [internlm-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary), [internlm-7b-chat-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary), [internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary), [internlm-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary) - xverse 系列: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary) + - bluelm 系列: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary) - mistral 系列: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary) - ziya 系列: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary) - skywork 系列: [skywork-13b](https://modelscope.cn/models/skywork/Skywork-13B-base/summary), [skywork-13b-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary) diff --git a/examples/pytorch/llm/README.md b/examples/pytorch/llm/README.md index 2928ec8543..29f5a20a50 100644 --- a/examples/pytorch/llm/README.md +++ b/examples/pytorch/llm/README.md @@ -16,6 +16,7 @@ ## 🎉 News +- 🔥 2023.10.15: Support for **bluelm** series models: bluelm-7b, bluelm-7b-chat, bluelm-7b-32k, bluelm-7b-chat-32k. The corresponding shell script can be found in `scripts/bluelm_7b_chat`. - 2023.10.31: Support Web UI. Run command: python app.py. - 2023.10.30: Support for **skywork-13b** series models: skywork-13b, skywork-13b-chat. The corresponding shell script can be found in `scripts/skywork_13b`. - 🔥 2023.10.27: Support for **chatglm3** series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. The corresponding shell script can be found in `scripts/chatglm3_6b_32k`. @@ -37,14 +38,15 @@ - Supported SFT Methods: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), full(full parameter fine-tuning) - Supported Features: quantization, DDP, model parallelism, gradient checkpointing, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ... - Supported Models: - - 🔥 qwen series: [qwen-7b](https://modelscope.cn/models/qwen/Qwen-7B/summary), [qwen-7b-chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary), [qwen-14b](https://modelscope.cn/models/qwen/Qwen-14B/summary), [qwen-14b-chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), [qwen-7b-chat-int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary), [qwen-14b-chat-int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary), [qwen-7b-chat-int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary), [qwen-14b-chat-int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary) - - 🔥 qwen-vl series: [qwen-vl](https://modelscope.cn/models/qwen/Qwen-VL/summary), [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary), [qwen-vl-chat-int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary) + - qwen series: [qwen-7b](https://modelscope.cn/models/qwen/Qwen-7B/summary), [qwen-7b-chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary), [qwen-14b](https://modelscope.cn/models/qwen/Qwen-14B/summary), [qwen-14b-chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), [qwen-7b-chat-int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary), [qwen-14b-chat-int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary), [qwen-7b-chat-int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary), [qwen-14b-chat-int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary) + - qwen-vl series: [qwen-vl](https://modelscope.cn/models/qwen/Qwen-VL/summary), [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary), [qwen-vl-chat-int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary) - baichuan series: [baichuan-7b](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary), [baichuan-13b](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary), [baichuan-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary), [baichuan2-7b](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary), [baichuan2-7b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary), [baichuan2-13b](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary), [baichuan2-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary), [baichuan2-7b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary), [baichuan2-13b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary) - chatglm series: [chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary), [chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary), [chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary), [chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary), [chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary) - llama series: [llama2-7b](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary), [llama2-7b-chat](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary), [llama2-13b](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary), [llama2-13b-chat](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary), [llama2-70b](https://modelscope.cn/models/modelscope/Llama-2-70b-ms/summary), [llama2-70b-chat](https://modelscope.cn/models/modelscope/Llama-2-70b-chat-ms/summary) - openbuddy series: [openbuddy-llama2-13b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary), [openbuddy-llama-65b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama-65b-v8-bf16/summary), [openbuddy-llama2-70b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16/summary), [openbuddy-mistral-7b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-mistral-7b-v13.1/summary) - internlm series: [internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary), [internlm-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary), [internlm-7b-chat-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary), [internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary), [internlm-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary) - xverse series: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary) + - bluelm series: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary) - mistral series: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary) - ziya series: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary) - skywork series: [skywork-13b](https://modelscope.cn/models/skywork/Skywork-13B-base/summary), [skywork-13b-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary) @@ -255,7 +257,7 @@ We support two methods for **customizing datasets**. 1. [Recommended] **Command line arguments**: It is **more convenient for supporting local custom datasets**. 2. **Registering datasets**: It is more flexible and allows for **further extension and development of swift**, but it requires some programming skills. Method 1 relies on Method 2 for implementation. -#### 📌[Recommended] Command Line Arguments +#### 📌 [Recommended] Command Line Arguments Explanation of command line arguments: 1. `--custom_train_dataset_path`: The default value is `None`, which means no custom dataset is used. You can specify it in the following format: `--custom_train_dataset_path alpaca.csv` or specify multiple training datasets like `--custom_train_dataset_path alpaca.csv chatml.jsonl swift.jsonl`. The script will automatically preprocess and concatenate them. diff --git a/examples/pytorch/llm/README_CN.md b/examples/pytorch/llm/README_CN.md index c75906d5f3..8b81d7cd6d 100644 --- a/examples/pytorch/llm/README_CN.md +++ b/examples/pytorch/llm/README_CN.md @@ -16,6 +16,7 @@ ## 🎉 新闻 +- 🔥 2023.10.15: 支持**bluelm**系列模型: bluelm-7b, bluelm-7b-chat, bluelm-7b-32k, bluelm-7b-chat-32k. 对应的sh脚本可以查看`scripts/bluelm_7b_chat`. - 2023.10.31: 支持Web UI. 运行命令: `python app.py`. - 2023.10.30: 支持**skywork-13b**系列模型: skywork-13b, skywork-13b-chat. 对应的sh脚本可以查看`scripts/skywork_13b`. - 🔥 2023.10.27: 支持**chatglm3**系列模型: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. 对应的sh脚本可以查看`scripts/chatglm3_6b_32k`. @@ -37,14 +38,15 @@ - 支持的SFT方法: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), 全参数微调 - 支持的特性: 模型量化, DDP, 模型并行, gradient checkpointing, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, ... - 支持的模型 - - 🔥 qwen 系列: [qwen-7b](https://modelscope.cn/models/qwen/Qwen-7B/summary), [qwen-7b-chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary), [qwen-14b](https://modelscope.cn/models/qwen/Qwen-14B/summary), [qwen-14b-chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), [qwen-7b-chat-int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary), [qwen-14b-chat-int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary), [qwen-7b-chat-int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary), [qwen-14b-chat-int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary) - - 🔥 qwen-vl 系列: [qwen-vl](https://modelscope.cn/models/qwen/Qwen-VL/summary), [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary), [qwen-vl-chat-int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary) + - qwen 系列: [qwen-7b](https://modelscope.cn/models/qwen/Qwen-7B/summary), [qwen-7b-chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary), [qwen-14b](https://modelscope.cn/models/qwen/Qwen-14B/summary), [qwen-14b-chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary), [qwen-7b-chat-int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary), [qwen-14b-chat-int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary), [qwen-7b-chat-int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary), [qwen-14b-chat-int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary) + - qwen-vl 系列: [qwen-vl](https://modelscope.cn/models/qwen/Qwen-VL/summary), [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary), [qwen-vl-chat-int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary) - baichuan 系列: [baichuan-7b](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary), [baichuan-13b](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary), [baichuan-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary), [baichuan2-7b](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary), [baichuan2-7b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary), [baichuan2-13b](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary), [baichuan2-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary), [baichuan2-7b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary), [baichuan2-13b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary) - chatglm 系列: [chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary), [chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary), [chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary), [chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary), [chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary) - llama 系列: [llama2-7b](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary), [llama2-7b-chat](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary), [llama2-13b](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary), [llama2-13b-chat](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary), [llama2-70b](https://modelscope.cn/models/modelscope/Llama-2-70b-ms/summary), [llama2-70b-chat](https://modelscope.cn/models/modelscope/Llama-2-70b-chat-ms/summary) - openbuddy 系列: [openbuddy-llama2-13b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary), [openbuddy-llama-65b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama-65b-v8-bf16/summary), [openbuddy-llama2-70b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16/summary), [openbuddy-mistral-7b-chat](https://modelscope.cn/models/OpenBuddy/openbuddy-mistral-7b-v13.1/summary) - internlm 系列: [internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary), [internlm-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary), [internlm-7b-chat-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary), [internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary), [internlm-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary) - xverse 系列: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary) + - bluelm 系列: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary) - mistral 系列: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary) - ziya 系列: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary) - skywork 系列: [skywork-13b](https://modelscope.cn/models/skywork/Skywork-13B-base/summary), [skywork-13b-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary) @@ -251,12 +253,12 @@ bash scripts/qwen_7b_chat/qlora_ddp_ds/infer.sh ## 📝 使用文档 -### 📌自定义数据集 +### 自定义数据集 我们支持两种**自定义数据集**的方法. 1. 【推荐】**命令行参数**的形式: **更加方便支持本地自定义数据集**. 2. **注册数据集**的方式: 更加灵活, 可以对swift**进一步拓展和开发**, 但需要一定的编程门槛. 方法一在实现上借助了方法二. -#### 【推荐】命令行参数的形式 +#### 📌 【推荐】命令行参数的形式 命令行参数含义介绍: 1. `--custom_train_dataset_path`: 默认值为`None`, 表示不使用自定义数据集. 你可以像如下形式进行指定: `--custom_train_dataset_path alpaca.csv`或者指定多个训练数据集`--custom_train_dataset_path alpaca.csv chatml.jsonl swift.jsonl`, 脚本会进行自动的预处理和拼接. @@ -458,7 +460,7 @@ if __name__ == '__main__': - `--model_type`: 表示你选择的模型类型, 默认是`None`, 即如果没有指定`model_id_or_path`, 则选择`'qwen-7b-chat'`, 如果指定了, 则会根据`model_id_or_path`以及`MODEL_MAPPING`推断`model_type`. 这两个参数不能同时指定. 可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`. - `--model_id_or_path`: 表示模型在ModelScope Hub中的`model_id`, 或者是本地的模型目录`model_dir`, 不区分大小写, 默认为`None`. 如果`--model_id_or_path`未被注册, 则会抛出异常. 你可以使用`model_type`的方式指定模型类型, 也可以通过`model_id_or_path`的方式指定模型类型. - `--model_revision`: 表示模型在ModelScope Hub中对应`model_id`的版本号, 默认为`None`. 如果`model_id_or_path`使用本地的模型目录, 则该参数失效. model_revision指定为None, 则使用注册在`MODEL_MAPPING`中的revision. 否则强制使用model_revision. -- `model_cache_dir`: 默认为`None`. 如果模型在本地已经有缓存, 且缓存路径并非ModelScope默认cache路径, 可以通过指定该参数从cache_dir中导入model和tokenizer. +- `--model_cache_dir`: 默认为`None`. 如果模型在本地已经有缓存, 且缓存路径并非ModelScope默认cache路径, 可以通过指定该参数从cache_dir中导入model和tokenizer. - `--sft_type`: 表示微调的方式, 默认是`'lora'`. 你可以选择的值包括: 'lora', 'full'. 如果你要使用lora或qlora, 你需要选择`--sft_type lora`. qlora需额外设置`--quantization_bit 4`. 如果你要使用全参数微调, 则需选择`--sft_type full`. - `--tuner_backend`: 表示lora, qlora的后端支持, 默认是`'swift'`. 你可以选择的值包括: 'swift', 'peft'. - `--template_type`: 表示使用的对话模板的类型, 默认是`None`, 即根据`model_type`查找`MODEL_MAPPING`中的`template`. 可以选择的`template_type`可以查看`TEMPLATE_MAPPING.keys()`. diff --git a/examples/pytorch/llm/scripts/bluelm_7b_chat/lora/infer.sh b/examples/pytorch/llm/scripts/bluelm_7b_chat/lora/infer.sh new file mode 100644 index 0000000000..4055c4e1e0 --- /dev/null +++ b/examples/pytorch/llm/scripts/bluelm_7b_chat/lora/infer.sh @@ -0,0 +1,15 @@ +# Experimental environment: A10, 3090 +PYTHONPATH=../../.. \ +CUDA_VISIBLE_DEVICES=0 \ +python llm_infer.py \ + --ckpt_dir "output/bluelm-7b-chat/vx_xxx/checkpoint-xxx" \ + --load_args_from_ckpt_dir true \ + --eval_human false \ + --max_length 2048 \ + --max_new_tokens 2048 \ + --temperature 0.9 \ + --top_k 20 \ + --top_p 0.9 \ + --repetition_penalty 1.05 \ + --do_sample true \ + --merge_lora_and_save false \ diff --git a/examples/pytorch/llm/scripts/bluelm_7b_chat/lora/sft.sh b/examples/pytorch/llm/scripts/bluelm_7b_chat/lora/sft.sh new file mode 100644 index 0000000000..d75226c4a2 --- /dev/null +++ b/examples/pytorch/llm/scripts/bluelm_7b_chat/lora/sft.sh @@ -0,0 +1,36 @@ +# Experimental environment: A10, 3090 +# 17GB GPU memory +PYTHONPATH=../../.. \ +CUDA_VISIBLE_DEVICES=0 \ +python llm_sft.py \ + --model_id_or_path vivo-ai/BlueLM-7B-Chat \ + --model_revision master \ + --sft_type lora \ + --tuner_backend swift \ + --template_type bluelm \ + --dtype bf16 \ + --output_dir output \ + --dataset blossom-math-zh \ + --train_dataset_sample -1 \ + --num_train_epochs 1 \ + --max_length 2048 \ + --check_dataset_strategy warning \ + --lora_rank 8 \ + --lora_alpha 32 \ + --lora_dropout_p 0.05 \ + --lora_target_modules AUTO \ + --gradient_checkpointing true \ + --batch_size 1 \ + --weight_decay 0.01 \ + --learning_rate 1e-4 \ + --gradient_accumulation_steps 16 \ + --max_grad_norm 0.5 \ + --warmup_ratio 0.03 \ + --eval_steps 100 \ + --save_steps 100 \ + --save_total_limit 2 \ + --logging_steps 10 \ + --push_to_hub false \ + --hub_model_id bluelm-7b-chat-lora \ + --hub_private_repo true \ + --hub_token 'your-sdk-token' \ diff --git a/examples/pytorch/llm/scripts/yi_6b/lora/infer.sh b/examples/pytorch/llm/scripts/yi_6b/lora/infer.sh index fa33b46012..696c04c810 100644 --- a/examples/pytorch/llm/scripts/yi_6b/lora/infer.sh +++ b/examples/pytorch/llm/scripts/yi_6b/lora/infer.sh @@ -5,8 +5,8 @@ python llm_infer.py \ --ckpt_dir "output/yi-6b/vx_xxx/checkpoint-xxx" \ --load_args_from_ckpt_dir true \ --eval_human false \ - --max_length 256 \ - --max_new_tokens 256 \ + --max_length 2048 \ + --max_new_tokens 2048 \ --temperature 0.9 \ --top_k 20 \ --top_p 0.9 \ diff --git a/examples/pytorch/llm/scripts/yi_6b/lora/sft.sh b/examples/pytorch/llm/scripts/yi_6b/lora/sft.sh index 28bc7dfb97..9847890285 100644 --- a/examples/pytorch/llm/scripts/yi_6b/lora/sft.sh +++ b/examples/pytorch/llm/scripts/yi_6b/lora/sft.sh @@ -31,6 +31,6 @@ python llm_sft.py \ --save_total_limit 2 \ --logging_steps 10 \ --push_to_hub false \ - --hub_model_id yi-6b-qlora \ + --hub_model_id yi-6b-lora \ --hub_private_repo true \ --hub_token 'your-sdk-token' \ diff --git a/swift/llm/utils/__init__.py b/swift/llm/utils/__init__.py index c36c613795..b6f353e4af 100644 --- a/swift/llm/utils/__init__.py +++ b/swift/llm/utils/__init__.py @@ -1,13 +1,16 @@ # Copyright (c) Alibaba, Inc. and its affiliates. from .argument import InferArguments, RomeArguments, SftArguments -from .dataset import (DATASET_MAPPING, AlpacaPreprocessor, - ConversationsPreprocessor, DatasetName, - GetDatasetFunction, get_dataset, get_dataset_from_repo, - register_dataset) +from .dataset import (DATASET_MAPPING, DatasetName, GetDatasetFunction, + get_dataset, get_dataset_from_repo, register_dataset) from .model import (MODEL_MAPPING, GetModelTokenizerFunction, LoRATM, ModelType, get_model_tokenizer, get_model_tokenizer_from_repo, get_model_tokenizer_from_sdk, register_model) +from .preprocess import (AlpacaPreprocessor, ClsPreprocessor, + ComposePreprocessor, ConversationsPreprocessor, + PreprocessFunc, RenameColumnsPreprocessor, + SmartPreprocessor, SwiftPreprocessor, + TextGenerationPreprocessor) from .template import (DEFAULT_SYSTEM, TEMPLATE_MAPPING, History, Prompt, Template, TemplateType, get_template, register_template) from .utils import (data_collate_fn, dataset_map, download_dataset, diff --git a/swift/llm/utils/dataset.py b/swift/llm/utils/dataset.py index 1e24382b73..db6c851bef 100644 --- a/swift/llm/utils/dataset.py +++ b/swift/llm/utils/dataset.py @@ -699,13 +699,15 @@ def _check_dataset( def get_dataset( - dataset_name_list: List[str], + dataset_name_list: Union[List[str], str], dataset_test_ratio: float = 0., dataset_seed: Union[RandomState, int] = 42, check_dataset_strategy: Literal['none', 'discard', 'error', 'warning'] = 'none' ) -> Tuple[HfDataset, Optional[HfDataset]]: """Returns train_dataset and val_dataset""" + if isinstance(dataset_name_list, str): + dataset_name_list = [dataset_name_list] train_dataset_list: List[HfDataset] = [] val_dataset_list: List[HfDataset] = [] random_state = dataset_seed diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py index c2368fde5d..8f6743c6ae 100644 --- a/swift/llm/utils/model.py +++ b/swift/llm/utils/model.py @@ -1,4 +1,5 @@ # Copyright (c) Alibaba, Inc. and its affiliates. +import inspect import os from functools import partial from types import MethodType @@ -7,11 +8,14 @@ import torch import torch.distributed as dist import torch.nn.functional as F +import transformers from modelscope import (AutoConfig, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig, GPTQConfig, Model, read_config, snapshot_download) +from packaging import version from torch import Tensor from torch import dtype as Dtype +from torch.nn import Module from transformers import (PretrainedConfig, PreTrainedModel, PreTrainedTokenizerBase) from transformers.models.auto.auto_factory import _BaseAutoModelClass @@ -81,6 +85,11 @@ class ModelType: xverse_13b = 'xverse-13b' xverse_13b_chat = 'xverse-13b-chat' xverse_65b = 'xverse-65b' + # vivo + bluelm_7b = 'bluelm-7b' + bluelm_7b_32k = 'bluelm-7b-32k' + bluelm_7b_chat = 'bluelm-7b-chat' + bluelm_7b_chat_32k = 'bluelm-7b-chat-32k' # mistral mistral_7b = 'mistral-7b' mistral_7b_chat = 'mistral-7b-chat' @@ -110,6 +119,7 @@ class LoRATM(NamedTuple): mistral = ['q_proj', 'k_proj', 'v_proj'] ziya = ['q_proj', 'k_proj', 'v_proj'] yi = ['q_proj', 'k_proj', 'v_proj'] + bluelm = ['q_proj', 'k_proj', 'v_proj'] GetModelTokenizerFunction = Callable[..., Tuple[Optional[PreTrainedModel], @@ -118,7 +128,7 @@ class LoRATM(NamedTuple): def register_model( model_type: str, - model_id_or_path: str, + model_id_or_path: Optional[str], lora_target_modules: Optional[List[str]] = None, template: str = TemplateType.default, get_function: Optional[GetModelTokenizerFunction] = None, @@ -173,6 +183,14 @@ def _register_model( return _register_model +@register_model(ModelType.bluelm_7b_chat_32k, 'vivo-ai/BlueLM-7B-Chat-32K', + LoRATM.bluelm, TemplateType.bluelm) +@register_model(ModelType.bluelm_7b_chat, 'vivo-ai/BlueLM-7B-Chat', + LoRATM.bluelm, TemplateType.bluelm) +@register_model(ModelType.bluelm_7b_32k, 'vivo-ai/BlueLM-7B-Base-32K', + LoRATM.bluelm, TemplateType.default_generation) +@register_model(ModelType.bluelm_7b, 'vivo-ai/BlueLM-7B-Base', LoRATM.bluelm, + TemplateType.default_generation) @register_model(ModelType.yi_34b, '01ai/Yi-34B', LoRATM.yi, TemplateType.default_generation) @register_model(ModelType.yi_6b, '01ai/Yi-6B', LoRATM.yi, @@ -679,6 +697,15 @@ def get_skywork_model_tokenizer(model_dir: str, return model, tokenizer +def fix_transformers_upgrade(module: PreTrainedModel) -> None: + # from 4.35.0, transformers changes its arguments of _set_gradient_checkpointing + if version.parse(transformers.__version__) >= version.parse('4.35.0'): + if isinstance(module, PreTrainedModel) and hasattr(module, '_set_gradient_checkpointing') \ + and 'value' in inspect.signature(module._set_gradient_checkpointing).parameters.keys(): + module._set_gradient_checkpointing = MethodType( + PreTrainedModel._set_gradient_checkpointing, module) + + def get_model_tokenizer( model_type: str, torch_dtype: Optional[Dtype] = None, @@ -711,7 +738,8 @@ def get_model_tokenizer( if is_dist() and not is_local_master(): dist.barrier() model_dir = model_id_or_path - if not os.path.exists(model_id_or_path): + if model_id_or_path is not None and not os.path.exists( + model_id_or_path): revision = model_info['revision'] model_dir = snapshot_download( model_id_or_path, @@ -726,10 +754,12 @@ def get_model_tokenizer( kwargs['automodel_class'] = model_info['automodel_class'] model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, load_model, **kwargs) + if model is not None: + fix_transformers_upgrade(model) assert tokenizer.eos_token is not None if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token - if model is not None: + if model is not None and model_dir is not None: generation_config_path = os.path.join(model_dir, 'generation_config.json') generation_config = getattr(model, 'generation_config', None) diff --git a/swift/llm/utils/template.py b/swift/llm/utils/template.py index d5a2a49f4d..ff145e9afd 100644 --- a/swift/llm/utils/template.py +++ b/swift/llm/utils/template.py @@ -23,6 +23,7 @@ class TemplateType: ziya = 'ziya' skywork = 'skywork' yi = 'yi' + bluelm = 'bluelm' Prompt = List[Union[str, List[Union[str, int]]]] @@ -142,6 +143,10 @@ def register_template(template_type: str, template: Template) -> None: TemplateType.skywork, Template([], ['[USER]{{QUERY}}[SEP][BOT]'], None, ['[SEP]'])) +register_template( + TemplateType.bluelm, + Template([['bos_token_id']], ['[|Human|]:{{QUERY}}[|AI|]:'], [], + [['eos_token_id']])) Context = Union[str, List[int]] diff --git a/swift/llm/utils/utils.py b/swift/llm/utils/utils.py index 9105db1a3a..d43229a5cc 100644 --- a/swift/llm/utils/utils.py +++ b/swift/llm/utils/utils.py @@ -33,7 +33,7 @@ from swift.hub import ModelScopeConfig from swift.utils import (get_dist_setting, get_logger, is_ddp_plus_mp, is_dist, is_local_master, is_master, lower_bound, parse_args, - upper_bound) + stat_array, upper_bound) from .template import History, Template logger = get_logger() @@ -199,16 +199,11 @@ def x_main(argv: Union[List[str], _TArgsClass, NoneType] = None, def stat_dataset(dataset: HfDataset) -> None: """Statistical analysis was performed on the dataset""" _token_len = [] - for d in dataset: - _token_len.append(len(d['input_ids'])) - _token_len = np.array(_token_len) - mean = _token_len.mean().item() - std = _token_len.std().item() - min_ = _token_len.min().item() - max_ = _token_len.max().item() - logger.info( - f'Dataset Token Length: {mean:.6f}±{std:.6f}, min={min_:.6f}, max={max_:.6f}, size={_token_len.shape[0]}' - ) + input_ids = dataset['input_ids'] + for i in range(len(dataset)): + _token_len.append(len(input_ids[i])) + _, stat_str = stat_array(_token_len) + logger.info(f'Dataset Token Length: {stat_str}') def data_collate_fn(batch: List[Dict[str, Any]], @@ -297,12 +292,13 @@ def find_all_linear_for_lora(model: Module, quantization_bit: int, def sort_by_max_length(dataset: HfDataset, num_dataset: int) -> HfDataset: - dataset_len = [len(d['input_ids']) for d in tqdm(dataset)] + input_ids = dataset['input_ids'] + dataset_len = [len(input_ids[i]) for i in range(len(dataset))] idx = heapq.nlargest( num_dataset, range(len(dataset_len)), key=lambda i: dataset_len[i]) input_ids = [] labels = [] - for i in tqdm(idx): + for i in idx: input_ids.append(dataset[i]['input_ids']) labels.append(dataset[i]['labels']) return HfDataset.from_dict({'input_ids': input_ids, 'labels': labels}) diff --git a/swift/trainers/mixin.py b/swift/trainers/mixin.py index a31cc20384..3d399b2530 100644 --- a/swift/trainers/mixin.py +++ b/swift/trainers/mixin.py @@ -522,8 +522,9 @@ def _maybe_log_save_evaluate(self, tr_loss, model, trial, epoch, self.control.should_log = False logs: Dict[str, float] = {} metrics_log = {'loss': tr_loss} # loss first - metrics_log.update(self._custom_metrics) - self._custom_metrics = {} + if hasattr(self, '_custom_metrics'): + metrics_log.update(self._custom_metrics) + self._custom_metrics = {} for k, v in metrics_log.items(): # all_gather + mean() to get average loss over all processes v_scalar = self._nested_gather(v).mean().item() diff --git a/swift/utils/__init__.py b/swift/utils/__init__.py index af24703c49..b4885fb10b 100644 --- a/swift/utils/__init__.py +++ b/swift/utils/__init__.py @@ -4,7 +4,7 @@ from .logger import get_logger from .metric import (compute_acc_metrics, compute_nlg_metrics, preprocess_logits_for_metrics) -from .np_utils import get_seed, transform_jsonl_to_df +from .np_utils import get_seed, stat_array, transform_jsonl_to_df from .tb_utils import (TB_COLOR, TB_COLOR_SMOOTH, plot_images, read_tensorboard_file, tensorboard_smoothing) from .torch_utils import (broadcast_string, get_dist_setting, is_ddp_plus_mp, diff --git a/swift/utils/np_utils.py b/swift/utils/np_utils.py index 086a67b6cd..70efa1b5e3 100644 --- a/swift/utils/np_utils.py +++ b/swift/utils/np_utils.py @@ -1,9 +1,11 @@ # Copyright (c) Alibaba, Inc. and its affiliates. -from typing import Any, Dict, List +from typing import Any, Dict, List, Tuple, Union import numpy as np +from numpy import ndarray from numpy.random import RandomState from pandas import DataFrame +from torch import Tensor def transform_jsonl_to_df(dict_list: List[Dict[str, Any]]) -> DataFrame: @@ -21,3 +23,16 @@ def get_seed(random_state: RandomState) -> int: seed_max = np.iinfo(np.int32).max seed = random_state.randint(0, seed_max) return seed + + +def stat_array( + array: Union[ndarray, List[int], + Tensor]) -> Tuple[Dict[str, float], str]: + if isinstance(array, list): + array = np.array(array) + mean = array.mean().item() + std = array.std().item() + min_ = array.min().item() + max_ = array.max().item() + string = f'{mean:.6f}±{std:.6f}, min={min_:.6f}, max={max_:.6f}, size={array.shape[0]}' + return {'mean': mean, 'std': std, 'min': min_, 'max': max_}, string diff --git a/tests/llm/test_template.py b/tests/llm/test_template.py index bdc6d1f156..c169a37b6b 100644 --- a/tests/llm/test_template.py +++ b/tests/llm/test_template.py @@ -46,10 +46,10 @@ def test_template(self): <|endoftext|>""" self.assertTrue(result == text) + @unittest.skip( + 'To avoid excessive testing time caused by downloading models and ' + 'to prevent OOM (Out of Memory) errors.') def test_chatglm3_template(self): - if not __name__ == '__main__': - # avoid ci test - return model_type = ModelType.chatglm3_6b template_type = TemplateType.chatglm3 model, tokenizer = get_model_tokenizer(model_type, load_model=True) @@ -70,10 +70,10 @@ def test_chatglm3_template(self): response = model.chat(tokenizer, query, max_length=None)[0] print(f'official response: {response}') + @unittest.skip( + 'To avoid excessive testing time caused by downloading models and ' + 'to prevent OOM (Out of Memory) errors.') def test_qwen_template(self): - if not __name__ == '__main__': - # avoid ci test - return model_type = ModelType.qwen_7b_chat template_type = TemplateType.chatml model, tokenizer = get_model_tokenizer(model_type, load_model=True) @@ -87,10 +87,10 @@ def test_qwen_template(self): response = model.chat(tokenizer, query, None, max_length=None)[0] print(f'official response: {response}') + @unittest.skip( + 'To avoid excessive testing time caused by downloading models and ' + 'to prevent OOM (Out of Memory) errors.') def test_llama_template(self): - if not __name__ == '__main__': - # avoid ci test - return model_type = ModelType.llama2_7b_chat template_type = TemplateType.llama _, tokenizer = get_model_tokenizer(model_type, load_model=False) @@ -117,10 +117,10 @@ def test_llama_template(self): response = model.chat({'text': query}, tokenizer)['response'] print(f'official response: {response}') + @unittest.skip( + 'To avoid excessive testing time caused by downloading models and ' + 'to prevent OOM (Out of Memory) errors.') def test_baichuan_template(self): - if not __name__ == '__main__': - # avoid ci test - return model_type = ModelType.baichuan2_7b_chat template_type = TemplateType.baichuan model, tokenizer = get_model_tokenizer(model_type, load_model=True) @@ -132,10 +132,10 @@ def test_baichuan_template(self): response = model.chat(tokenizer, [{'role': 'user', 'content': query}]) print(f'official response: {response}') + @unittest.skip( + 'To avoid excessive testing time caused by downloading models and ' + 'to prevent OOM (Out of Memory) errors.') def test_chatglm2_template(self): - if not __name__ == '__main__': - # avoid ci test - return model_type = ModelType.chatglm2_6b template_type = TemplateType.chatglm2 model, tokenizer = get_model_tokenizer(model_type, load_model=True) @@ -156,10 +156,10 @@ def test_chatglm2_template(self): response = model.chat(tokenizer, query)[0] print(f'official response: {response}') + @unittest.skip( + 'To avoid excessive testing time caused by downloading models and ' + 'to prevent OOM (Out of Memory) errors.') def test_internlm_template(self): - if not __name__ == '__main__': - # avoid ci test - return model_type = ModelType.internlm_20b_chat template_type = TemplateType.internlm model, tokenizer = get_model_tokenizer(model_type, load_model=True) @@ -180,6 +180,30 @@ def test_internlm_template(self): response = model.chat(tokenizer, query)[0] print(f'official response: {response}') + @unittest.skip( + 'To avoid excessive testing time caused by downloading models and ' + 'to prevent OOM (Out of Memory) errors.') + def test_bluelm_template(self): + model_type = ModelType.bluelm_7b_chat + template_type = TemplateType.bluelm + model, tokenizer = get_model_tokenizer(model_type, load_model=True) + template = get_template(template_type, tokenizer) + model.generation_config = GenerationConfig( + max_new_tokens=128, + temperature=0.9, + top_k=20, + top_p=0.9, + repetition_penalt=1.05, + do_sample=True, + eos_token_id=tokenizer.eos_token_id, + pad_token_id=tokenizer.eos_token_id) + query = '12345+234=?' + print(f'query: {query}') + response, _ = inference(model, template, query, verbose=False) + print(f'swift response: {response}') + response = model.chat(tokenizer, query)[0] + print(f'official response: {response}') + if __name__ == '__main__': unittest.main()