Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/source/LLM/LLM微调文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,16 @@ swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged'
```

**人工**评估:
```bash
# 直接推理
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx' --eval_human true

# Merge LoRA增量权重并推理
swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged' --eval_human true
```

## Web-UI
如果你要使用VLLM进行部署并提供**API**接口, 可以查看[VLLM推理加速与部署](./VLLM推理加速与部署.md#部署)

Expand Down
12 changes: 10 additions & 2 deletions docs/source/LLM/LLM推理文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,14 @@ history: [('浙江的省会在哪里?', '浙江的省会是杭州。'), ('这
"""
```

### 使用CLI
```bash
# qwen
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-7b-chat
# yi
CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-6b-chat
```

### 微调后模型
如果你要使用微调后模型进行推理, 可以查看[LLM微调文档](./LLM微调文档.md#微调后模型)

Expand All @@ -397,7 +405,7 @@ history: [('浙江的省会在哪里?', '浙江的省会是杭州。'), ('这
### qwen-7b-chat
使用CLI:
```bash
CUDA_VISIBLE_DEVICES=0 swift app-ui --model_id_or_path qwen/Qwen-7B-Chat
CUDA_VISIBLE_DEVICES=0 swift app-ui --model_type qwen-7b-chat
```

使用python:
Expand Down Expand Up @@ -425,7 +433,7 @@ app_ui_main(infer_args)
### qwen-7b
使用CLI:
```bash
CUDA_VISIBLE_DEVICES=0 swift app-ui --model_id_or_path qwen/Qwen-7B
CUDA_VISIBLE_DEVICES=0 swift app-ui --model_type qwen-7b
```

使用python:
Expand Down
26 changes: 19 additions & 7 deletions docs/source/LLM/VLLM推理加速与部署.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,13 @@ pip install -e .[llm]
# vllm与cuda版本有对应关系,请按照`https://docs.vllm.ai/en/latest/getting_started/installation.html`选择版本
pip install vllm -U

# 如果你想要使用基于auto_gptq的模型进行推理.
# 使用auto_gptq的模型: `https://github.com/modelscope/swift/blob/main/docs/source/LLM/支持的模型和数据集.md#模型`
# auto_gptq和cuda版本有对应关系,请按照`https://github.com/PanQiWei/AutoGPTQ#quick-installation`选择版本
pip install auto_gptq -U

# 环境对齐 (如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```

## 推理加速
vllm不支持bnb和auto_gptq量化的模型. vllm支持的模型可以查看[支持的模型](./支持的模型和数据集.md#模型).

### qwen-7b-chat
```python
Expand Down Expand Up @@ -164,6 +160,14 @@ history: [('浙江的省会在哪?', '浙江的省会是杭州。'), ('这有
"""
```

### 使用CLI
```bash
# qwen
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-7b-chat --infer_backend vllm
# yi
CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-6b-chat --infer_backend vllm
```

### 微调后的模型

**单样本推理**:
Expand Down Expand Up @@ -194,18 +198,26 @@ print(f"response: {resp['response']}")
print(f"history: {resp['history']}")
```

使用**数据集**评估:
**使用CLI**:
```bash
# merge LoRA增量权重并使用vllm进行推理加速
swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'

# 使用数据集评估
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged' --infer_backend vllm
# 人工评估
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged' \
--infer_backend vllm \
--eval_human true \
```

## Web-UI加速

### 原始模型
```bash
CUDA_VISIBLE_DEVICES=0 swift app-ui --model_id_or_path qwen/Qwen-7B-Chat --infer_backend vllm
CUDA_VISIBLE_DEVICES=0 swift app-ui --model_type qwen-7b-chat --infer_backend vllm
```

### 微调后模型
Expand Down
2 changes: 1 addition & 1 deletion docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@
- `--model_cache_dir`: 默认值为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--sft_type`: 默认值为`'lora'`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--template_type`: 默认值为`'AUTO'`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--infer_backend`: 你可以选择'AUTO', 'vllm', 'pt'. 默认使用'AUTO', 进行智能选择, 即如果没有传入`ckpt_dir`或使用全参数微调, 并且安装了vllm且模型支持vllm则使用vllm引擎, 否则使用原生torch进行推理. vllm环境准备可以参考[VLLM推理加速与部署](./VLLM推理加速与部署.md#环境准备).
- `--infer_backend`: 你可以选择'AUTO', 'vllm', 'pt'. 默认使用'AUTO', 进行智能选择, 即如果没有传入`ckpt_dir`或使用全参数微调, 并且安装了vllm且模型支持vllm则使用vllm引擎, 否则使用原生torch进行推理. vllm环境准备可以参考[VLLM推理加速与部署](./VLLM推理加速与部署.md#环境准备), vllm支持的模型可以查看[支持的模型](./支持的模型和数据集.md#模型).
- `--ckpt_dir`: 必填项, 值为SFT阶段保存的checkpoint路径, e.g. `'/path/to/your/vx_xxx/checkpoint-xxx'`.
- `--load_args_from_ckpt_dir`: 是否从`ckpt_dir`的`sft_args.json`文件中读取配置信息. 默认是`True`.
- `--load_dataset_config`: 该参数只有在`--load_args_from_ckpt_dir true`时才生效. 即是否从`ckpt_dir`的`sft_args.json`文件中读取数据集相关的配置信息. 默认为`True`.
Expand Down
6 changes: 3 additions & 3 deletions docs/source/LLM/自我认知微调最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ My name is QianWen, developed by Alibaba Cloud. I am designed to answer various

使用CLI:
```bash
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-7b-chat --eval_human true
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-7b-chat
```

## 微调
Expand Down Expand Up @@ -273,11 +273,11 @@ result = app_ui_main(infer_args)
使用CLI:
```bash
# 直接使用app-ui
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'qwen-7b-chat/vx-xxx/checkpoint-xxx' --eval_human true
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'qwen-7b-chat/vx-xxx/checkpoint-xxx'

# Merge LoRA增量权重并使用app-ui
swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'qwen-7b-chat/vx-xxx/checkpoint-xxx-merged' --eval_human true
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'qwen-7b-chat/vx-xxx/checkpoint-xxx-merged'
```

## 了解更多
Expand Down
7 changes: 3 additions & 4 deletions swift/llm/app_ui.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ def clear_session() -> History:

def gradio_generation_demo(args: InferArguments) -> None:
import gradio as gr
if args.merge_lora_and_save:
merge_lora(args)
if args.infer_backend == 'vllm':
from swift.llm import prepare_vllm_engine_template, inference_stream_vllm, inference_vllm
llm_engine, template = prepare_vllm_engine_template(args)
Expand Down Expand Up @@ -49,8 +47,6 @@ def model_generation(query: str) -> str:

def gradio_chat_demo(args: InferArguments) -> None:
import gradio as gr
if args.merge_lora_and_save:
merge_lora(args)
if args.infer_backend == 'vllm':
from swift.llm import prepare_vllm_engine_template, inference_stream_vllm
llm_engine, template = prepare_vllm_engine_template(args)
Expand Down Expand Up @@ -93,6 +89,9 @@ def model_chat(query: str, history: History) -> Tuple[str, History]:


def llm_app_ui(args: InferArguments) -> None:
args.eval_human = True
if args.merge_lora_and_save:
merge_lora(args)
if args.template_type.endswith('generation'):
gradio_generation_demo(args)
else:
Expand Down
5 changes: 4 additions & 1 deletion swift/llm/utils/argument.py
Original file line number Diff line number Diff line change
Expand Up @@ -428,14 +428,17 @@ def __post_init__(self) -> None:
logger.warning('Setting overwrite_generation_config: False')
if self.ckpt_dir is None:
self.sft_type = 'full'
support_vllm = MODEL_MAPPING[self.model_type].get(
'support_vllm', False)
if self.infer_backend == 'AUTO':
if self.sft_type == 'full' and is_vllm_available(
) and MODEL_MAPPING[self.model_type].get('support_vllm', False):
) and support_vllm:
self.infer_backend = 'vllm'
else:
self.infer_backend = 'pytorch'
if self.infer_backend == 'vllm':
assert self.quantization_bit == 0, 'not support bnb'
assert support_vllm, f'vllm not support `{self.model_type}`'
if self.sft_type == 'lora':
assert self.merge_lora_and_save is True, 'please set `--merge_lora_and_save true`'

Expand Down
3 changes: 3 additions & 0 deletions swift/llm/utils/vllm_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ def get_vllm_engine(model_type: str,
if engine_kwargs is None:
engine_kwargs = {}
model_info = MODEL_MAPPING[model_type]
support_vllm = model_info.get('support_vllm', False)
if not support_vllm:
raise ValueError(f'vllm not support `{model_type}`')
model_id_or_path = model_info['model_id_or_path']
ignore_file_pattern = model_info['ignore_file_pattern']
model_dir = kwargs.get('model_dir', None)
Expand Down
8 changes: 5 additions & 3 deletions tests/llm/test_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,15 +88,17 @@ def test_loss_matching(self):
infer_main([
'--ckpt_dir', best_model_checkpoint, '--show_dataset_sample',
'2', '--max_new_tokens', '100', '--use_flash_attn',
str(bool_var), '--use_vllm',
str(bool_var), '--verbose',
str(bool_var), '--infer_backend', {
True: 'vllm',
False: 'pytorch'
}[bool_var], '--verbose',
str(bool_var), '--merge_lora_and_save',
str(bool_var)
])
loss = output['log_history'][-1]['train_loss']
losses.append(loss)
torch.cuda.empty_cache()
self.assertTrue(abs(losses[0] - losses[1]) < 1e-4)
self.assertTrue(abs(losses[0] - losses[1]) < 2e-4)

def test_vl_audio(self):
output_dir = 'output'
Expand Down