Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/LLM/LLM人类对齐训练文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,4 +94,4 @@ bash scripts/dpo/lora_ddp_mp/dpo.sh
bash scripts/dpo/lora_ddp_mp/infer.sh
```

由于DPO训练后会得到一个完整模型或者adapter的weights,因此LoRA合并、推理的步骤和微调步骤相同,因此请参考[微调文档](./LLM微调文档#Merge LoRA)对应的步骤。
由于DPO训练后会得到一个完整模型或者adapter的weights,因此LoRA合并、推理的步骤和微调步骤相同,因此请参考[微调文档](./LLM微调文档.md#merge-lora)对应的步骤。
6 changes: 6 additions & 0 deletions docs/source/LLM/LLM微调文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
## 目录
- [环境准备](#环境准备)
- [微调](#微调)
- [DPO](#dpo)
- [Merge LoRA](#merge-lora)
- [推理](#推理)
- [Web-UI](#web-ui)
Expand Down Expand Up @@ -33,6 +34,8 @@ pip install -r requirements/llm.txt -U
```

## 微调
如果你要使用界面的方式进行微调与推理, 可以查看[界面训练与推理文档](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).

### 使用python
```python
# Experimental environment: A10, 3090, V100, ...
Expand Down Expand Up @@ -215,6 +218,9 @@ bash scripts/qwen_7b_chat/qlora_ddp_ds/sft.sh
bash scripts/qwen_7b_chat/qlora_ddp_ds/infer.sh
```

## DPO
如果你要使用DPO进行人类对齐, 你可以查看[人类对齐微调文档](./LLM人类对齐训练文档.md).

## Merge LoRA
提示: **暂时**不支持bnb和auto_gptq量化模型的merge lora, 这会产生较大的精度损失.
```bash
Expand Down
6 changes: 3 additions & 3 deletions docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,10 @@

## DPO参数

DPO参数继承了上面的SFT参数除此之外增加了以下参数
DPO参数继承了上面的SFT参数, 除此之外增加了以下参数:

- `--ref_model_type` 对比模型类型可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`
- `--max_prompt_length` 最大的提示长度,该参数会传入DPOTrainer中使prompt长度不超过该值的设置默认值1024
- `--ref_model_type` 对比模型类型, 可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`.
- `--max_prompt_length` 最大的提示长度, 该参数会传入DPOTrainer中, 使prompt长度不超过该值的设置, 默认值1024.


## merge-lora infer app-ui 命令行参数
Expand Down