diff --git "a/docs/source/Customization/\346\226\260\345\242\236\346\250\241\345\236\213.md" "b/docs/source/Customization/\346\226\260\345\242\236\346\250\241\345\236\213.md" index 194d51e2be..0493c34e2e 100644 --- "a/docs/source/Customization/\346\226\260\345\242\236\346\250\241\345\236\213.md" +++ "b/docs/source/Customization/\346\226\260\345\242\236\346\250\241\345\236\213.md" @@ -7,3 +7,8 @@ swift sft --model my-model --model_type llama --template chatml --dataset xxx ``` 如果需要新增model_type和template请给我们提交issue,如果您阅读了我们的源代码,也可以在llm/template和llm/model中添加新的类型。 + + +## 模型注册 + +请参考[examples](https://github.com/modelscope/swift/blob/main/examples/custom/model.py)中示例代码. 你可以通过指定`--custom_register_path xxx.py`对注册的内容进行解析. diff --git "a/docs/source/Instruction/\344\275\277\347\224\250tuners.md" "b/docs/source/Instruction/\344\275\277\347\224\250tuners.md" index 9b47c57f55..c84ca6fe0c 100644 --- "a/docs/source/Instruction/\344\275\277\347\224\250tuners.md" +++ "b/docs/source/Instruction/\344\275\277\347\224\250tuners.md" @@ -18,9 +18,9 @@ tuner是指附加在模型上的额外结构部分,用于减少训练参数量 - Res-Tuning: [Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone](https://arxiv.org/abs/2310.19859) < [arXiv](https://arxiv.org/abs/2310.19859) | [Project Page](https://res-tuning.github.io/) | [Usage](ResTuning.md) > - [PEFT](https://github.com/huggingface/peft)提供的tuners, 如AdaLoRA、DoRA、Fourierft等 -# 接口列表 +## 接口列表 -## Swift类静态接口 +### Swift类静态接口 - `Swift.prepare_model(model, config, **kwargs)` - 接口作用:加载某个tuner到模型上,如果是PeftConfig的子类,则使用Peft库的对应接口加载tuner。在使用SwiftConfig的情况下,本接口可以传入SwiftModel实例并重复调用,此时和config传入字典的效果相同。 @@ -78,7 +78,7 @@ tuner是指附加在模型上的额外结构部分,用于减少训练参数量 - adapter_name:`str`或`List[str]`或`Dict[str, str]`类型或`None`,待加载tuner目录中的tuner名称,如果为`None`则加载所有名称的tuners,如果是`str`或`List[str]`则只加载某些具体的tuner,如果是`Dict`,则将`key`指代的tuner加载起来后换成`value`的名字 - revision: 如果model_id是魔搭的id,则revision可以指定对应版本号 -## SwiftModel接口 +### SwiftModel接口 下面列出用户可能调用的接口列表,其他内部接口或不推荐使用的接口可以通过`make docs`命令查看API Doc文档。 diff --git "a/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" "b/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" index 53adbe43e0..5472a8d0da 100644 --- "a/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" +++ "b/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" @@ -11,6 +11,7 @@ - load_dataset_config: 当指定resume_from_checkpoint/ckpt_dir会读取保存文件中的`args.json`,将默认为None的参数进行赋值(可通过手动传入进行覆盖). 如果将该参数设置为True, 则会额外读取数据参数. 默认为False - use_hf: 默认为False. 控制模型下载、数据集下载、模型push的hub - hub_token: hub token. modelscope的hub token可以查看[这里](https://modelscope.cn/my/myaccesstoken) +- custom_register_path: 自定义模型、对话模板和数据集注册的`.py`文件路径 ### 模型参数 @@ -37,7 +38,6 @@ - 🔥model_name: 仅用于自我认知任务,传入模型中文名和英文名,以空格分隔 - 🔥model_author: 仅用于自我认知任务,传入模型作者的中文名和英文名,以空格分隔 - custom_dataset_info: 自定义简单数据集注册,参考[新增数据集](../Customization/新增数据集.md) -- custom_register_path: 自定义复杂数据集注册,参考[新增数据集](../Customization/新增数据集.md) ### 模板参数 - 🔥template: 模板类型,默认使用model对应的template类型。如果为自定义模型,请参考[支持的模型和数据集](./支持的模型和数据集.md)手动传入这个字段 diff --git "a/docs/source/Instruction/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" "b/docs/source/Instruction/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" index 8787e327c0..5f99f037f1 100644 --- "a/docs/source/Instruction/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" +++ "b/docs/source/Instruction/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" @@ -212,24 +212,24 @@ |[swift/Meta-Llama-3-70B-Instruct-AWQ](https://modelscope.cn/models/swift/Meta-Llama-3-70B-Instruct-AWQ)|llama3|llama3|-|-|[study-hjt/Meta-Llama-3-70B-Instruct-AWQ](https://huggingface.co/study-hjt/Meta-Llama-3-70B-Instruct-AWQ)| |[ChineseAlpacaGroup/llama-3-chinese-8b-instruct](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct)|llama3|llama3|-|-|[hfl/llama-3-chinese-8b-instruct](https://huggingface.co/hfl/llama-3-chinese-8b-instruct)| |[ChineseAlpacaGroup/llama-3-chinese-8b](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b)|llama3|llama3|-|-|[hfl/llama-3-chinese-8b](https://huggingface.co/hfl/llama-3-chinese-8b)| -|[LLM-Research/Meta-Llama-3.1-8B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct)| -|[LLM-Research/Meta-Llama-3.1-8B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)| -|[LLM-Research/Meta-Llama-3.1-70B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B)| -|[LLM-Research/Meta-Llama-3.1-405B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-70B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct-FP8)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-405B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8)| -|[LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit)|llama3|llama3|-|-|[unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4)| -|[LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)| -|[AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF](https://modelscope.cn/models/AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF)|llama3|llama3|-|-|[nvidia/Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF)| +|[LLM-Research/Meta-Llama-3.1-8B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct)| +|[LLM-Research/Meta-Llama-3.1-8B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)| +|[LLM-Research/Meta-Llama-3.1-70B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B)| +|[LLM-Research/Meta-Llama-3.1-405B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct-FP8)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8)| +|[LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit)|llama3_1|llama3_2|transformers>=4.43|-|[unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4)| +|[LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)| +|[AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF](https://modelscope.cn/models/AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF)|llama3_1|llama3_2|transformers>=4.43|-|[nvidia/Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF)| |[LLM-Research/Llama-3.2-1B](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B)|llama3_2|llama3_2|transformers>=4.45|-|[meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)| |[LLM-Research/Llama-3.2-3B](https://modelscope.cn/models/LLM-Research/Llama-3.2-3B)|llama3_2|llama3_2|transformers>=4.45|-|[meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)| |[LLM-Research/Llama-3.2-1B-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B-Instruct)|llama3_2|llama3_2|transformers>=4.45|-|[meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)| diff --git a/docs/source_en/Customization/New-model.md b/docs/source_en/Customization/New-model.md index 6a44b8af7c..81010683af 100644 --- a/docs/source_en/Customization/New-model.md +++ b/docs/source_en/Customization/New-model.md @@ -7,3 +7,6 @@ swift sft --model my-model --model_type llama --template chatml --dataset xxx ``` If you need to add a new `model_type` or `template`, please submit an issue to us. If you have read our source code, you can also add new types in `llm/template` and `llm/model`. + +## Model Registration +Please refer to the example code in [examples](https://github.com/modelscope/swift/blob/main/examples/custom/model.py). You can parse the registered content by specifying `--custom_register_path xxx.py`. diff --git a/docs/source_en/Instruction/Command-line-parameters.md b/docs/source_en/Instruction/Command-line-parameters.md index 9afddf595d..4ff6d2e4bc 100644 --- a/docs/source_en/Instruction/Command-line-parameters.md +++ b/docs/source_en/Instruction/Command-line-parameters.md @@ -11,6 +11,7 @@ The introduction to command line parameters will cover base arguments, atomic ar - load_dataset_config: When specifying resume_from_checkpoint/ckpt_dir, it will read the `args.json` in the saved file and assign values to any parameters that are None (can be overridden by manual input). If this parameter is set to True, it will read the data parameters as well. Default is False. - use_hf: Default is False. Controls model and dataset downloading, and model pushing to the hub. - hub_token: Hub token. You can check the modelscope hub token [here](https://modelscope.cn/my/myaccesstoken). +- custom_register_path: The file path for the custom model, chat template, and dataset registration `.py` files. ### Model Arguments @@ -38,7 +39,6 @@ The introduction to command line parameters will cover base arguments, atomic ar - 🔥model_name: For self-awareness tasks, input the model's Chinese and English names separated by space. - 🔥model_author: For self-awareness tasks, input the model author's Chinese and English names separated by space. - custom_dataset_info: Custom simple dataset registration, refer to [Add New Dataset](../Customization/New-dataset.md). -- custom_register_path: Custom complex dataset registration, refer to [Add New Dataset](../Customization/New-dataset.md). ### Template Arguments diff --git a/docs/source_en/Instruction/Supported-models-and-datasets.md b/docs/source_en/Instruction/Supported-models-and-datasets.md index 92eea802b1..5324a1ac61 100644 --- a/docs/source_en/Instruction/Supported-models-and-datasets.md +++ b/docs/source_en/Instruction/Supported-models-and-datasets.md @@ -212,24 +212,24 @@ The table below introduces the models integrated with ms-swift: |[swift/Meta-Llama-3-70B-Instruct-AWQ](https://modelscope.cn/models/swift/Meta-Llama-3-70B-Instruct-AWQ)|llama3|llama3|-|-|[study-hjt/Meta-Llama-3-70B-Instruct-AWQ](https://huggingface.co/study-hjt/Meta-Llama-3-70B-Instruct-AWQ)| |[ChineseAlpacaGroup/llama-3-chinese-8b-instruct](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct)|llama3|llama3|-|-|[hfl/llama-3-chinese-8b-instruct](https://huggingface.co/hfl/llama-3-chinese-8b-instruct)| |[ChineseAlpacaGroup/llama-3-chinese-8b](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b)|llama3|llama3|-|-|[hfl/llama-3-chinese-8b](https://huggingface.co/hfl/llama-3-chinese-8b)| -|[LLM-Research/Meta-Llama-3.1-8B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct)| -|[LLM-Research/Meta-Llama-3.1-8B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)| -|[LLM-Research/Meta-Llama-3.1-70B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B)| -|[LLM-Research/Meta-Llama-3.1-405B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-70B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct-FP8)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8)|llama3|llama3|-|-|[meta-llama/Meta-Llama-3.1-405B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8)| -|[LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit)|llama3|llama3|-|-|[unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4)| -|[LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)| -|[LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)|llama3|llama3|-|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)| -|[AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF](https://modelscope.cn/models/AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF)|llama3|llama3|-|-|[nvidia/Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF)| +|[LLM-Research/Meta-Llama-3.1-8B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct)| +|[LLM-Research/Meta-Llama-3.1-8B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)| +|[LLM-Research/Meta-Llama-3.1-70B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B)| +|[LLM-Research/Meta-Llama-3.1-405B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct-FP8)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8)|llama3_1|llama3_2|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8)| +|[LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit)|llama3_1|llama3_2|transformers>=4.43|-|[unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4)| +|[LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)| +|[LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)|llama3_1|llama3_2|transformers>=4.43|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)| +|[AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF](https://modelscope.cn/models/AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF)|llama3_1|llama3_2|transformers>=4.43|-|[nvidia/Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF)| |[LLM-Research/Llama-3.2-1B](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B)|llama3_2|llama3_2|transformers>=4.45|-|[meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)| |[LLM-Research/Llama-3.2-3B](https://modelscope.cn/models/LLM-Research/Llama-3.2-3B)|llama3_2|llama3_2|transformers>=4.45|-|[meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)| |[LLM-Research/Llama-3.2-1B-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B-Instruct)|llama3_2|llama3_2|transformers>=4.45|-|[meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)| diff --git a/docs/source_en/Instruction/Use-tuners.md b/docs/source_en/Instruction/Use-tuners.md index 4835ef9400..f960591893 100644 --- a/docs/source_en/Instruction/Use-tuners.md +++ b/docs/source_en/Instruction/Use-tuners.md @@ -18,9 +18,9 @@ Tuners refer to additional structural components attached to a model, aimed at r - Res-Tuning: [Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone](https://arxiv.org/abs/2310.19859) < [arXiv](https://arxiv.org/abs/2310.19859) | [Project Page](https://res-tuning.github.io/) | [Usage](ResTuning.md) > - Tuners provided by [PEFT](https://github.com/huggingface/peft), such as AdaLoRA, DoRA, Fourierft, etc. -# Interface List +## Interface List -## Swift Class Static Interfaces +### Swift Class Static Interfaces - `Swift.prepare_model(model, config, **kwargs)` - Function: Loads a tuner into a model. If it is a subclass of `PeftConfig`, it uses the corresponding interface from the Peft library to load the tuner. When using `SwiftConfig`, this interface can accept `SwiftModel` instances and can be called repeatedly, functioning similarly to passing a dictionary of configs. @@ -67,7 +67,7 @@ Tuners refer to additional structural components attached to a model, aimed at r - `adapter_name`: Can be of type `str`, `List[str]`, `Dict[str, str]`, or `None`. If `None`, all tuners in the specified directory will be loaded. If it is a `str` or `List[str]`, only specific tuners will be loaded. If it is a `Dict`, the key represents the tuner to load, which will be renamed to the corresponding value. - `revision`: If `model_id` is an ID from the model hub, `revision` can specify the corresponding version number. -## SwiftModel Interfaces +### SwiftModel Interfaces Below is a list of interfaces that users may call. Other internal or less recommended interfaces can be viewed by running the `make docs` command to access the API Doc. diff --git a/examples/custom/dataset.py b/examples/custom/dataset.py index a57f91ad7d..21c42e016e 100644 --- a/examples/custom/dataset.py +++ b/examples/custom/dataset.py @@ -1,23 +1,30 @@ +# Copyright (c) Alibaba, Inc. and its affiliates. from typing import Any, Dict, Optional -from swift.llm import AlpacaPreprocessor, register_dataset -from swift.llm.dataset.register import DatasetMeta +from swift.llm import DatasetMeta, ResponsePreprocessor, load_dataset, register_dataset -class CustomPreprocessor(AlpacaPreprocessor): +class CustomPreprocessor(ResponsePreprocessor): + prompt = """Task: Based on the given two sentences, provide a similarity score between 0.0 and 5.0. +Sentence 1: {text1} +Sentence 2: {text2} +Similarity score: """ def preprocess(self, row: Dict[str, Any]) -> Optional[Dict[str, Any]]: - response = row['response'] - prefix_prompt = 'Answer: ' - if response and response.startswith(prefix_prompt): - response = response[len(prefix_prompt):].strip() - row['output'] = response - return super().preprocess(row) + return super().preprocess({ + 'query': self.prompt.format(text1=row['text1'], text2=row['text2']), + 'response': f"{row['label']:.1f}" + }) register_dataset( DatasetMeta( - ms_dataset_id='AI-ModelScope/LongAlpaca-12k', - hf_dataset_id='Yukang/LongAlpaca-12k', + ms_dataset_id='swift/stsb', + hf_dataset_id='SetFit/stsb', preprocess_func=CustomPreprocessor(), )) + +if __name__ == '__main__': + dataset = load_dataset(['swift/stsb'])[0] + print(f'dataset: {dataset}') + print(f'dataset[0]: {dataset[0]}') diff --git a/examples/custom/model.py b/examples/custom/model.py new file mode 100644 index 0000000000..771a5150e7 --- /dev/null +++ b/examples/custom/model.py @@ -0,0 +1,29 @@ +# Copyright (c) Alibaba, Inc. and its affiliates. +from swift.llm import (InferRequest, Model, ModelGroup, ModelMeta, PtEngine, RequestConfig, TemplateMeta, + get_model_tokenizer_with_flash_attn, register_model, register_template) + +register_template( + TemplateMeta('custom', ['System\n{{SYSTEM}}\n'], + ['User\n{{QUERY}}\nAssistant\n'], ['\n'])) + +register_model( + ModelMeta( + model_type='custom', + model_groups=[ + ModelGroup([Model('AI-ModelScope/Nemotron-Mini-4B-Instruct', 'nvidia/Nemotron-Mini-4B-Instruct')]) + ], + template='custom', + get_function=get_model_tokenizer_with_flash_attn, + ignore_file_pattern=['nemo'])) + +if __name__ == '__main__': + infer_request = InferRequest(messages=[{'role': 'user', 'content': '你是谁'}]) + request_config = RequestConfig(max_tokens=512, temperature=0) + engine = PtEngine('AI-ModelScope/Nemotron-Mini-4B-Instruct') + response = engine.infer([infer_request], request_config) + swift_response = response[0].choices[0].message.content + + engine.default_template.template_backend = 'jinja' + response = engine.infer([infer_request], request_config) + jinja_response = response[0].choices[0].message.content + assert swift_response == jinja_response, (f'swift_response: {swift_response}\njinja_response: {jinja_response}') diff --git a/swift/llm/__init__.py b/swift/llm/__init__.py index 83cb53eb73..5eda55a59c 100644 --- a/swift/llm/__init__.py +++ b/swift/llm/__init__.py @@ -25,13 +25,14 @@ RLHFArguments, WebUIArguments, BaseArguments) from .template import (TEMPLATE_MAPPING, Template, Word, get_template, TemplateType, register_template, TemplateInputs, Messages, TemplateMeta, get_template_meta, InferRequest) - from .model import (MODEL_MAPPING, ModelType, get_model_tokenizer, safe_snapshot_download, HfConfigFactory, - ModelInfo, ModelMeta, ModelKeys, register_model_arch, MultiModelKeys, ModelArch, get_model_arch, - MODEL_ARCH_MAPPING, get_model_info_meta, get_model_name) + from .model import (register_model, MODEL_MAPPING, ModelType, get_model_tokenizer, safe_snapshot_download, + HfConfigFactory, ModelInfo, ModelMeta, ModelKeys, register_model_arch, MultiModelKeys, + ModelArch, get_model_arch, MODEL_ARCH_MAPPING, get_model_info_meta, get_model_name, ModelGroup, + Model, get_model_tokenizer_with_flash_attn, get_model_tokenizer_multimodal) from .dataset import (AlpacaPreprocessor, ResponsePreprocessor, MessagesPreprocessor, AutoPreprocessor, DATASET_MAPPING, MediaResource, register_dataset, register_dataset_info, EncodePreprocessor, LazyLLMDataset, ConstantLengthDataset, standard_keys, load_dataset, DATASET_TYPE, - sample_dataset, RowPreprocessor) + sample_dataset, RowPreprocessor, DatasetMeta) from .utils import (deep_getattr, to_device, History, history_to_messages, messages_to_history, Processor, save_checkpoint, ProcessorMixin) from .base import SwiftPipeline @@ -66,13 +67,14 @@ 'model': [ 'MODEL_MAPPING', 'ModelType', 'get_model_tokenizer', 'safe_snapshot_download', 'HfConfigFactory', 'ModelInfo', 'ModelMeta', 'ModelKeys', 'register_model_arch', 'MultiModelKeys', 'ModelArch', - 'MODEL_ARCH_MAPPING', 'get_model_arch', 'get_model_info_meta', 'get_model_name' + 'MODEL_ARCH_MAPPING', 'get_model_arch', 'get_model_info_meta', 'get_model_name', 'register_model', + 'ModelGroup', 'Model', 'get_model_tokenizer_with_flash_attn', 'get_model_tokenizer_multimodal' ], 'dataset': [ 'AlpacaPreprocessor', 'ClsPreprocessor', 'ComposePreprocessor', 'MessagesPreprocessor', 'DATASET_MAPPING', 'MediaResource', 'register_dataset', 'register_dataset_info', 'EncodePreprocessor', 'LazyLLMDataset', 'ConstantLengthDataset', 'standard_keys', 'load_dataset', 'DATASET_TYPE', 'sample_dataset', - 'RowPreprocessor', 'ResponsePreprocessor' + 'RowPreprocessor', 'ResponsePreprocessor', 'DatasetMeta' ], 'utils': [ 'deep_getattr', 'to_device', 'History', 'history_to_messages', 'messages_to_history', 'Processor', diff --git a/swift/llm/argument/base_args/base_args.py b/swift/llm/argument/base_args/base_args.py index b8cae50a1b..6f4f2cd499 100644 --- a/swift/llm/argument/base_args/base_args.py +++ b/swift/llm/argument/base_args/base_args.py @@ -1,5 +1,6 @@ # Copyright (c) Alibaba, Inc. and its affiliates. import os +import sys from dataclasses import dataclass, field, fields from typing import Any, Dict, Literal, Optional @@ -15,6 +16,7 @@ from .model_args import ModelArguments from .quant_args import QuantizeArguments from .template_args import TemplateArguments +from .utils import to_abspath logger = get_logger() @@ -38,6 +40,7 @@ class BaseArguments(GenerationArguments, QuantizeArguments, DataArguments, Templ load_dataset_config (bool): Flag to determine if dataset configuration should be loaded. Default is False. use_hf (bool): Flag to determine if Hugging Face should be used. Default is False. hub_token (Optional[str]): SDK token for authentication. Default is None. + custom_register_path (Optional[str]): Path to custom .py file for dataset registration. Default is None. ignore_args_error (bool): Flag to ignore argument errors for notebook compatibility. Default is False. use_swift_lora (bool): Use swift lora, a compatible argument """ @@ -52,15 +55,27 @@ class BaseArguments(GenerationArguments, QuantizeArguments, DataArguments, Templ # None: use env var `MODELSCOPE_API_TOKEN` hub_token: Optional[str] = field( default=None, metadata={'help': 'SDK token can be found in https://modelscope.cn/my/myaccesstoken'}) + custom_register_path: Optional[str] = None # .py # extra ignore_args_error: bool = False # True: notebook compatibility use_swift_lora: bool = False # True for using tuner_backend == swift, don't specify this unless you know what you are doing # noqa + def _init_custom_register(self) -> None: + """Register custom .py file to datasets""" + if self.custom_register_path is None: + return + self.custom_register_path = to_abspath(self.custom_register_path, True) + folder, fname = os.path.split(self.custom_register_path) + sys.path.append(folder) + __import__(fname.rstrip('.py')) + logger.info(f'Successfully registered `{self.custom_register_path}`') + def __post_init__(self): if self.use_hf or use_hf_hub(): self.use_hf = True os.environ['USE_HF'] = '1' + self._init_custom_register() self._init_model_kwargs() self.rank, self.local_rank, world_size, self.local_world_size = get_dist_setting() # The Seq2SeqTrainingArguments has a property called world_size, which cannot be assigned a value. diff --git a/swift/llm/argument/base_args/data_args.py b/swift/llm/argument/base_args/data_args.py index 3bca47747c..fc6cc4b399 100644 --- a/swift/llm/argument/base_args/data_args.py +++ b/swift/llm/argument/base_args/data_args.py @@ -1,12 +1,9 @@ # Copyright (c) Alibaba, Inc. and its affiliates. -import os -import sys from dataclasses import dataclass, field from typing import List, Literal, Optional from swift.llm import DATASET_MAPPING, register_dataset_info from swift.utils import get_logger -from .utils import to_abspath logger = get_logger() @@ -28,7 +25,6 @@ class DataArguments: model_name (List[str]): List containing Chinese and English names of the model. Default is [None, None]. model_author (List[str]): List containing Chinese and English names of the model author. Default is [None, None]. - custom_register_path (Optional[str]): Path to custom .py file for dataset registration. Default is None. custom_dataset_info (Optional[str]): Path to custom dataset_info.json file. Default is None. """ # dataset_id or dataset_dir or dataset_path @@ -50,18 +46,8 @@ class DataArguments: model_author: List[str] = field( default_factory=lambda: [None, None], metadata={'help': "e.g. ['魔搭', 'ModelScope']"}) - custom_register_path: Optional[str] = None # .py custom_dataset_info: Optional[str] = None # .json - def _init_custom_register(self) -> None: - """Register custom .py file to datasets""" - if self.custom_register_path is None: - return - self.custom_register_path = to_abspath(self.custom_register_path, True) - folder, fname = os.path.split(self.custom_register_path) - sys.path.append(folder) - __import__(fname.rstrip('.py')) - def _init_custom_dataset_info(self): """register custom dataset_info.json to datasets""" if self.custom_dataset_info is None: @@ -78,7 +64,6 @@ def __post_init__(self): else: msg = 'args.streaming is True' logger.info(f'Because {msg}, setting split_dataset_ratio: {self.split_dataset_ratio}') - self._init_custom_register() self._init_custom_dataset_info() def get_dataset_kwargs(self): diff --git a/swift/llm/dataset/__init__.py b/swift/llm/dataset/__init__.py index 5722e97b18..340a378fe9 100644 --- a/swift/llm/dataset/__init__.py +++ b/swift/llm/dataset/__init__.py @@ -8,7 +8,7 @@ from .media import MediaResource from .preprocessor import (AlpacaPreprocessor, AutoPreprocessor, MessagesPreprocessor, ResponsePreprocessor, RowPreprocessor, standard_keys) -from .register import DATASET_MAPPING, register_dataset, register_dataset_info +from .register import DATASET_MAPPING, DatasetMeta, register_dataset, register_dataset_info from .utils import (ConstantLengthDataset, EncodePreprocessor, GetLengthPreprocessor, LazyLLMDataset, PackingPreprocessor, sample_dataset) diff --git a/swift/llm/model/__init__.py b/swift/llm/model/__init__.py index 6b2c834c9c..6361aace10 100644 --- a/swift/llm/model/__init__.py +++ b/swift/llm/model/__init__.py @@ -4,5 +4,6 @@ from .model_arch import MODEL_ARCH_MAPPING, ModelArch, ModelKeys, MultiModelKeys, get_model_arch, register_model_arch from .register import (MODEL_MAPPING, Model, ModelGroup, ModelMeta, fix_do_sample_warning, get_default_device_map, get_default_torch_dtype, get_model_info_meta, get_model_name, get_model_tokenizer, - get_model_tokenizer_from_local, get_model_tokenizer_with_flash_attn, get_model_with_value_head) + get_model_tokenizer_multimodal, get_model_tokenizer_with_flash_attn, get_model_with_value_head, + register_model) from .utils import HfConfigFactory, ModelInfo, safe_snapshot_download diff --git a/swift/llm/model/constant.py b/swift/llm/model/constant.py index f674122318..3fb6ba3c86 100644 --- a/swift/llm/model/constant.py +++ b/swift/llm/model/constant.py @@ -16,6 +16,7 @@ class LLMModelType: llama = 'llama' llama3 = 'llama3' + llama3_1 = 'llama3_1' llama3_2 = 'llama3_2' reflection = 'reflection' yi = 'yi' diff --git a/swift/llm/model/model/llama.py b/swift/llm/model/model/llama.py index 3bec7388e2..3036f20a92 100644 --- a/swift/llm/model/model/llama.py +++ b/swift/llm/model/model/llama.py @@ -97,61 +97,64 @@ def get_model_tokenizer_llama(model_dir: str, Model('ChineseAlpacaGroup/llama-3-chinese-8b-instruct', 'hfl/llama-3-chinese-8b-instruct'), Model('ChineseAlpacaGroup/llama-3-chinese-8b', 'hfl/llama-3-chinese-8b'), ]), + ], + TemplateType.llama3, + get_model_tokenizer_with_flash_attn, + architectures=['LlamaForCausalLM'], + model_arch=ModelArch.llama, + )) + +register_model( + ModelMeta( + LLMModelType.llama3_1, + [ # llama3.1 - ModelGroup( - [ - # chat - Model('LLM-Research/Meta-Llama-3.1-8B-Instruct', 'meta-llama/Meta-Llama-3.1-8B-Instruct'), - Model('LLM-Research/Meta-Llama-3.1-70B-Instruct', 'meta-llama/Meta-Llama-3.1-70B-Instruct'), - Model('LLM-Research/Meta-Llama-3.1-405B-Instruct', 'meta-llama/Meta-Llama-3.1-405B-Instruct'), - # base - Model('LLM-Research/Meta-Llama-3.1-8B', 'meta-llama/Meta-Llama-3.1-8B'), - Model('LLM-Research/Meta-Llama-3.1-70B', 'meta-llama/Meta-Llama-3.1-70B'), - Model('LLM-Research/Meta-Llama-3.1-405B', 'meta-llama/Meta-Llama-3.1-405B'), - # fp8 - Model('LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8', 'meta-llama/Meta-Llama-3.1-70B-Instruct-FP8'), - Model('LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8', - 'meta-llama/Meta-Llama-3.1-405B-Instruct-FP8'), - ], - requires=['transformers>=4.43']), + ModelGroup([ + # chat + Model('LLM-Research/Meta-Llama-3.1-8B-Instruct', 'meta-llama/Meta-Llama-3.1-8B-Instruct'), + Model('LLM-Research/Meta-Llama-3.1-70B-Instruct', 'meta-llama/Meta-Llama-3.1-70B-Instruct'), + Model('LLM-Research/Meta-Llama-3.1-405B-Instruct', 'meta-llama/Meta-Llama-3.1-405B-Instruct'), + # base + Model('LLM-Research/Meta-Llama-3.1-8B', 'meta-llama/Meta-Llama-3.1-8B'), + Model('LLM-Research/Meta-Llama-3.1-70B', 'meta-llama/Meta-Llama-3.1-70B'), + Model('LLM-Research/Meta-Llama-3.1-405B', 'meta-llama/Meta-Llama-3.1-405B'), + # fp8 + Model('LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8', 'meta-llama/Meta-Llama-3.1-70B-Instruct-FP8'), + Model('LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8', 'meta-llama/Meta-Llama-3.1-405B-Instruct-FP8'), + ]), # llama3.1-quant - ModelGroup( - [ - # bnb-nf4 - Model('LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4', - 'hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4'), - Model('LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit', - 'unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit'), - Model('LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4', - 'hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4'), - # gptq-int4 - Model('LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4', - 'hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4'), - Model('LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4', - 'hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4'), - Model('LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4', - 'hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4'), - # awq-int4 - Model('LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', - 'hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4'), - Model('LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', - 'hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4'), - Model('LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4', - 'hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4'), - ], - requires=['transformers>=4.43']), + ModelGroup([ + # bnb-nf4 + Model('LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4', + 'hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4'), + Model('LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit', + 'unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit'), + Model('LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4', + 'hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4'), + # gptq-int4 + Model('LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4', + 'hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4'), + Model('LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4', + 'hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4'), + Model('LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4', + 'hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4'), + # awq-int4 + Model('LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', + 'hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4'), + Model('LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', + 'hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4'), + Model('LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4', + 'hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4'), + ]), # nvidia Nemotron - ModelGroup( - [ - Model('AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF', - 'nvidia/Llama-3.1-Nemotron-70B-Instruct-HF'), - ], - requires=['transformers>=4.43'], - ) + ModelGroup([ + Model('AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF', 'nvidia/Llama-3.1-Nemotron-70B-Instruct-HF'), + ]) ], - TemplateType.llama3, + TemplateType.llama3_2, get_model_tokenizer_with_flash_attn, architectures=['LlamaForCausalLM'], + requires=['transformers>=4.43'], model_arch=ModelArch.llama, )) diff --git a/swift/llm/model/model/telechat.py b/swift/llm/model/model/telechat.py index 9de58f9d1f..561c839175 100644 --- a/swift/llm/model/model/telechat.py +++ b/swift/llm/model/model/telechat.py @@ -19,7 +19,7 @@ TemplateType.telechat, get_model_tokenizer_with_flash_attn, model_arch=ModelArch.telechat, - architectures=['TelechatForCausalLM'], + architectures=['TelechatForCausalLM', 'TeleChatForCausalLM'], )) register_model( @@ -34,7 +34,7 @@ TemplateType.telechat2_115b, get_model_tokenizer_with_flash_attn, model_arch=ModelArch.telechat, - architectures=['TelechatForCausalLM'], + architectures=['TelechatForCausalLM', 'TeleChatForCausalLM'], )) register_model( diff --git a/swift/llm/model/register.py b/swift/llm/model/register.py index e49a1ecc98..96a84626bc 100644 --- a/swift/llm/model/register.py +++ b/swift/llm/model/register.py @@ -39,6 +39,10 @@ class ModelGroup: requires: Optional[List[str]] = None tags: List[str] = field(default_factory=list) + def __post_init__(self): + if not isinstance(self.models, (tuple, list)): + self.models = [self.models] + @dataclass class ModelMeta: @@ -62,6 +66,10 @@ class ModelMeta: requires: List[str] = field(default_factory=list) tags: List[str] = field(default_factory=list) + def __post_init__(self): + if not isinstance(self.model_groups, (list, tuple)): + self.model_groups = [self.model_groups] + def get_matched_model_group(self, model_name: str) -> Optional[ModelGroup]: for model_group in self.model_groups: for model in model_group.models: diff --git a/tests/test_align/test_template/test_llm.py b/tests/test_align/test_template/test_llm.py index e2909c7323..1312e8133d 100644 --- a/tests/test_align/test_template/test_llm.py +++ b/tests/test_align/test_template/test_llm.py @@ -25,7 +25,7 @@ def _infer_model(pt_engine, system=None, messages=None): response = resp[0].choices[0].message.content messages += [{'role': 'assistant', 'content': response}] logger.info(f'model: {pt_engine.model_info.model_name}, messages: {messages}') - return messages + return response def test_qwen2_5(): @@ -50,13 +50,6 @@ def test_glm4(): _infer_model(pt_engine) -def test_llama(): - pt_engine = PtEngine('LLM-Research/Llama-3.2-1B-Instruct') - _infer_model(pt_engine) - pt_engine.default_template.template_backend = 'jinja' - _infer_model(pt_engine) - - def test_qwq(): pt_engine = PtEngine('Qwen/QwQ-32B-Preview') _infer_model(pt_engine) @@ -117,6 +110,18 @@ def test_glm_edge(): _infer_model(pt_engine) +def test_llama(): + # pt_engine = PtEngine('LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4') + # pt_engine = PtEngine('LLM-Research/Meta-Llama-3.1-8B-Instruct') + # pt_engine = PtEngine('LLM-Research/Meta-Llama-3-8B-Instruct') + # pt_engine = PtEngine('LLM-Research/Llama-3.2-1B-Instruct') + pt_engine = PtEngine('AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF') + res = _infer_model(pt_engine) + pt_engine.default_template.template_backend = 'jinja' + res2 = _infer_model(pt_engine) + assert res == res2, f'res: {res}, res2: {res2}' + + if __name__ == '__main__': from swift.llm import PtEngine, RequestConfig, get_template, get_model_tokenizer from swift.utils import get_logger, seed_everything @@ -131,6 +136,6 @@ def test_glm_edge(): # test_deepseek_moe() # test_codegeex4() # test_glm4() - # test_llama() # test_telechat() - test_glm_edge() + # test_glm_edge() + test_llama()