Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用


## 🎉 News
- 2023.1.4: Support for **VLLM deployment**, compatible with the OpenAI API style. For more details, please refer to [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署)
- 2023.1.4: Support for **VLLM deployment**, compatible with the **OpenAI API** style. For more details, please refer to [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署)
- 2023.1.4: Update [Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) to facilitate viewing the training speed and GPU memory required for different models.
- 🔥 2023.12.29: Support web-ui for training and inference, use `swift web-ui` after the installation of ms-swift.
- 🔥 2023.12.29: Support DPO RLHF(Reinforcement Learning from Human Feedback) and two datasets: AI-ModelScope/stack-exchange-paired and AI-ModelScope/hh-rlhf for this task. Use [this script](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora/dpo.sh) to start training!
Expand Down Expand Up @@ -113,7 +113,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
- Quickly perform **inference** on LLM and build a **Web-UI**, see the [LLM Inference Documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM推理文档.md).
- Rapidly **fine-tune** and perform inference on LLM, and build a Web-UI. See the [LLM Fine-tuning Documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM微调文档.md) and [WEB-UI Documentation](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
- **DPO training** supported, start by using [this script](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora/dpo.sh).
- Utilize VLLM for **inference acceleration** and **deployment(openai API)**. Please refer to [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md) for more information.
- Utilize VLLM for **inference acceleration** and **deployment(OpenAI API)**. Please refer to [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md) for more information.
- View the models and datasets supported by Swift. You can check [supported models and datasets](https://github.com/modelscope/swift/blob/main/docs/source/LLM/支持的模型和数据集.md).
- Expand and customize models, datasets, and dialogue templates in Swift, see [Customization and Expansion](https://github.com/modelscope/swift/blob/main/docs/source/LLM/自定义与拓展.md).
- Check command-line parameters for fine-tuning and inference, see [Command-Line parameters](https://github.com/modelscope/swift/blob/main/docs/source/LLM/命令行参数.md).
Expand Down
4 changes: 2 additions & 2 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
用户可以查看 [SWIFT官方文档](docs/source/GetStarted/快速使用.md) 来了解详细信息。

## 🎉 新闻
- 2023.1.4: 支持**VLLM部署**, 兼容openai API样式, 具体可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署).
- 2023.1.4: 支持**VLLM部署**, 兼容**OpenAI API**样式, 具体可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署).
- 2023.1.4: 更新[Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md), 方便查看不同模型训练的速度和所需显存.
- 🔥 2023.12.29: 支持web-ui进行sft训练和推理,安装ms-swift后使用`swift web-ui`开启
- 🔥 2023.12.29: 支持 DPO RLHF(Reinforcement Learning from Human Feedback) 和两个用于此任务的数据集: AI-ModelScope/stack-exchange-paired 以及 AI-ModelScope/hh-rlhf. 使用[这个脚本](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora/dpo.sh)开启训练!
Expand Down Expand Up @@ -111,7 +111,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
- 快速对LLM进行**推理**, 搭建**Web-UI**, 可以查看[LLM推理文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM推理文档.md).
- 快速对LLM进行**微调**, 推理并搭建Web-UI. 可以查看[LLM微调文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM微调文档.md) 和 [WEB-UI文档](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
- 支持**DPO训练**, 使用[这个脚本](https://github.com/modelscope/swift/blob/v1.5.0/examples/pytorch/llm/scripts/dpo/lora/dpo.sh)开启训练
- 使用VLLM进行**推理加速**和**部署(openai API)**. 可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md).
- 使用VLLM进行**推理加速**和**部署(OpenAI API)**. 可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md).
- 查看swift支持的模型和数据集. 可以查看[支持的模型和数据集](https://github.com/modelscope/swift/blob/main/docs/source/LLM/支持的模型和数据集.md).
- 对swift中的模型, 数据集, 对话模板进行**拓展**, 可以查看[自定义与拓展](https://github.com/modelscope/swift/blob/main/docs/source/LLM/自定义与拓展.md).
- 查询微调和推理的命令行参数, 可以查看[命令行参数](https://github.com/modelscope/swift/blob/main/docs/source/LLM/命令行参数.md).
Expand Down
120 changes: 103 additions & 17 deletions docs/source/LLM/VLLM推理加速与部署.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ pip install -e .[llm]

# vllm与cuda版本有对应关系,请按照`https://docs.vllm.ai/en/latest/getting_started/installation.html`选择版本
pip install vllm -U
pip install openai -U

# 环境对齐 (如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
pip install -r requirements/framework.txt -U
Expand Down Expand Up @@ -239,14 +240,50 @@ swift使用VLLM作为推理后端, 并兼容openai的API样式.
客户端的openai的API参数可以参考: https://platform.openai.com/docs/api-reference/introduction.

### 原始模型
**qwen-7b-chat**
#### qwen-7b-chat

服务端:
**服务端:**
```bash
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-7b-chat
```

客户端:
**客户端:**

使用swift:
```python
from swift.llm import get_model_list_client, XRequest, inference_client

model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')

query = '浙江的省会在哪里?'
request_kwargs = XRequest(model=model_type, seed=42)
resp = inference_client(query, request_kwargs=request_kwargs)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')

history = [(query, response)]
query = '这有什么好吃的?'
request_kwargs = XRequest(model=model_type, stream=True, seed=42)
stream_resp = inference_client(query, history, request_kwargs=request_kwargs)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()

"""Out[0]
model_type: qwen-7b-chat
query: 浙江的省会在哪里?
response: 浙江省的省会是杭州市。
query: 这有什么好吃的?
response: 杭州有许多美食,例如西湖醋鱼、东坡肉、龙井虾仁、叫化童子鸡等。此外,杭州还有许多特色小吃,如西湖藕粉、杭州小笼包、杭州油条等。
"""
```

使用openai:
```python
from openai import OpenAI
client = OpenAI(
Expand All @@ -263,7 +300,8 @@ messages = [{
}]
resp = client.chat.completions.create(
model=model_type,
messages=messages)
messages=messages,
seed=42)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
Expand All @@ -272,14 +310,15 @@ print(f'response: {response}')
messages.append({'role': 'assistant', 'content': response})
query = '这有什么好吃的?'
messages.append({'role': 'user', 'content': query})
stream = client.chat.completions.create(
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True)
stream=True,
seed=42)

print(f'query: {query}')
print('response: ', end='')
for chunk in stream:
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()

Expand All @@ -288,19 +327,67 @@ model_type: qwen-7b-chat
query: 浙江的省会在哪里?
response: 浙江省的省会是杭州市。
query: 这有什么好吃的?
response:
浙江省是一个美食天堂,有着丰富多样的美食,如新鲜海鲜、麻糍、竹筒饭、西湖醋鱼、小吃等。至于具体哪个更好吃,可能还要看您个人的口味。
response: 杭州有许多美食,例如西湖醋鱼、东坡肉、龙井虾仁、叫化童子鸡等。此外,杭州还有许多特色小吃,如西湖藕粉、杭州小笼包、杭州油条等。
"""
```

**qwen-7b**
#### qwen-7b

服务端:
**服务端:**
```bash
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-7b
```

客户端:
**客户端:**

使用swift:
```python
from swift.llm import get_model_list_client, XRequest, inference_client

model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')

query = '浙江 -> 杭州\n安徽 -> 合肥\n四川 ->'
request_kwargs = XRequest(model=model_type, max_tokens=32, temperature=0.1, seed=42)
resp = inference_client(query, request_kwargs=request_kwargs)
response = resp.choices[0].text
print(f'query: {query}')
print(f'response: {response}')

request_kwargs.stream = True
stream_resp = inference_client(query, request_kwargs=request_kwargs)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].text, end='', flush=True)
print()

"""Out[0]
model_type: qwen-7b
query: 浙江 -> 杭州
安徽 -> 合肥
四川 ->
response: 成都
广东 -> 广州
江苏 -> 南京
浙江 -> 杭州
安徽 -> 合肥
四川 -> 成都

query: 浙江 -> 杭州
安徽 -> 合肥
四川 ->
response: 成都
广东 -> 广州
江苏 -> 南京
浙江 -> 杭州
安徽 -> 合肥
四川 -> 成都
"""
```

使用openai:
```python
from openai import OpenAI
client = OpenAI(
Expand All @@ -311,20 +398,19 @@ model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')

query = '浙江 -> 杭州\n安徽 -> 合肥\n四川 ->'
kwargs = {'model': model_type, 'prompt': query, 'seed': 42, 'temperature': 0., 'max_tokens': 32}
kwargs = {'model': model_type, 'prompt': query, 'seed': 42, 'temperature': 0.1, 'max_tokens': 32}

resp = client.completions.create(**kwargs)
response = resp.choices[0].text
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = '浙江 -> 杭州\n安徽 -> 合肥\n四川 ->'
stream = client.completions.create(stream=True, **kwargs)
stream_resp = client.completions.create(stream=True, **kwargs)
response = resp.choices[0].text
print(f'query: {query}')
print('response: ', end='')
for chunk in stream:
for chunk in stream_resp:
print(chunk.choices[0].text, end='', flush=True)
print()

Expand Down Expand Up @@ -360,4 +446,4 @@ swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged'
```

客户端代码示例同原始模型.
客户端示例代码同原始模型.
2 changes: 1 addition & 1 deletion docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- `--model_id_or_path`: 表示模型在ModelScope Hub中的`model_id`, 不区分大小写, 默认为`None`. 如果`--model_id_or_path`未被注册, 则会抛出异常. 你可以使用`model_type`的方式指定模型类型, 也可以通过`model_id_or_path`的方式指定模型类型.
- `--model_revision`: 表示模型在ModelScope Hub中对应`model_id`的版本号, 默认为`None`. `model_revision`指定为`None`, 则使用注册在`MODEL_MAPPING`中的revision. 否则强制使用命令行传入的`model_revision`.
- `--model_cache_dir`: 默认为`None`. 如果模型在本地已经有缓存, 且缓存路径并非ModelScope默认cache路径, 可以通过指定该参数从cache_dir中导入model和tokenizer.
- `--sft_type`: 表示微调的方式, 默认是`'lora'`. 你可以选择的值包括: 'lora', 'full'. 如果你要使用qlora, 你需设置`--sft_type lora --quantization_bit 4`.
- `--sft_type`: 表示微调的方式, 默认是`'lora'`. 你可以选择的值包括: 'lora', 'full', 'longlora', 'qalora'. 如果你要使用qlora, 你需设置`--sft_type lora --quantization_bit 4`.
- `--freeze_parameters`: 当sft_type指定为'full'时, 将模型最底部的参数进行freeze. 指定范围为0. ~ 1., 默认为`0.`. 该参数提供了lora与全参数微调的折中方案.
- `--tuner_backend`: 表示lora, qlora的后端支持, 默认是`'swift'`. 你可以选择的值包括: 'swift', 'peft'.
- `--template_type`: 表示使用的对话模板的类型, 默认是`'AUTO'`, 即根据`model_type`查找`MODEL_MAPPING`中的`template`. 可以选择的`template_type`可以查看`TEMPLATE_MAPPING.keys()`.
Expand Down
1 change: 1 addition & 0 deletions requirements/framework.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
accelerate
dacite
datasets
jieba
matplotlib
Expand Down
2 changes: 2 additions & 0 deletions requirements/llm.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
charset_normalizer
cpm_kernels
fastapi
gradio>=3.40.0
sentencepiece
tiktoken
transformers_stream_generator
uvicorn
2 changes: 1 addition & 1 deletion swift/llm/deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ async def _generate_stream():
usage=usage_info,
id=request_id,
created=created_time)
yield f'data:{json.dumps(asdict(response))}\n\n'
yield f'data:{json.dumps(asdict(response), ensure_ascii=False)}\n\n'
yield 'data:[DONE]\n\n'

if request.stream:
Expand Down
39 changes: 15 additions & 24 deletions swift/llm/tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,28 @@
from swift.tuners import (LongLoRAConfig, LongLoRAModelType, LoraConfig,
LoRAConfig, NEFTuneConfig, Swift)
from swift.utils import freeze_model_parameters, get_logger
from .utils import SftArguments, find_all_linear_for_lora
from .utils import SftArguments, find_all_linear_for_lora, is_lora

logger = get_logger()


def prepare_model(model, args: SftArguments):
# Preparing LoRA
if args.sft_type in ('lora', 'qalora', 'longlora'):
if is_lora(args.sft_type):
if args.resume_from_checkpoint is None:
if 'ALL' in args.lora_target_modules:
assert len(args.lora_target_modules) == 1
args.lora_target_modules = find_all_linear_for_lora(
model, args.quantization_bit, args.model_type)
logger.info(
f'Setting lora_target_modules: {args.lora_target_modules}')
lora_kwargs = {
'r': args.lora_rank,
'target_modules': args.lora_target_modules,
'lora_alpha': args.lora_alpha,
'lora_dropout': args.lora_dropout_p
}
if args.sft_type == 'lora':
lora_kwargs = {
'r': args.lora_rank,
'target_modules': args.lora_target_modules,
'lora_alpha': args.lora_alpha,
'lora_dropout': args.lora_dropout_p
}
if args.tuner_backend == 'swift':
lora_config = LoRAConfig(
lora_dtype=args.lora_dtype, **lora_kwargs)
Expand All @@ -36,35 +36,26 @@ def prepare_model(model, args: SftArguments):
model = Swift.prepare_model(model, lora_config)
logger.info(f'lora_config: {lora_config}')
elif args.sft_type == 'longlora':
assert args.tuner_backend != 'peft', (
'peft does not support longlora. You need to set `--tuner_backend swift`.'
)
assert args.tuner_backend == 'swift'
assert LongLoRAModelType.LLAMA in args.model_type
longlora_config = LongLoRAConfig(
r=args.lora_rank,
target_modules=args.lora_target_modules,
lora_alpha=args.lora_alpha,
lora_dropout=args.lora_dropout_p,
lora_dtype=args.lora_dtype,
model_type=LongLoRAModelType.LLAMA,
use_flash_attn=args.use_flash_attn)
use_flash_attn=args.use_flash_attn,
**lora_kwargs)
model = Swift.prepare_model(model, longlora_config)
logger.info(f'longlora_config: {longlora_config}')
elif args.sft_type == 'qalora':
assert args.tuner_backend == 'swift'
assert getattr(
model, 'quantization_method',
None) == 'gptq', 'qalora must be used with auto_gptq'
lora_kwargs = {}
lora_config = LoRAConfig(
r=args.lora_rank,
target_modules=args.lora_target_modules,
lora_alpha=args.lora_alpha,
lora_dropout=args.lora_dropout_p,
qalora_config = LoRAConfig(
lora_dtype=args.lora_dtype,
use_qa_lora=True,
**lora_kwargs)
model = Swift.prepare_model(model, lora_config)
logger.info(f'lora_config: {lora_config}')
model = Swift.prepare_model(model, qalora_config)
logger.info(f'qalora_config: {qalora_config}')
else:
model = Swift.from_pretrained(
model, args.resume_from_checkpoint, is_trainable=True)
Expand Down
8 changes: 5 additions & 3 deletions swift/llm/utils/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
from .argument import (DeployArguments, DPOArguments, InferArguments,
RomeArguments, SftArguments)
RomeArguments, SftArguments, is_lora)
from .client_utils import get_model_list_client, inference_client
from .dataset import (DATASET_MAPPING, DatasetName, GetDatasetFunction,
HfDataset, add_self_cognition_dataset, get_dataset,
get_dataset_from_repo, load_dataset_from_local,
Expand All @@ -23,9 +24,10 @@
CompletionResponseChoice,
CompletionResponseStreamChoice,
CompletionStreamResponse, DeltaMessage, Model,
ModelList, UsageInfo, random_uuid)
ModelList, UsageInfo, XRequest, random_uuid)
from .template import (DEFAULT_SYSTEM, TEMPLATE_MAPPING, History, Prompt,
Template, TemplateType, get_template, register_template)
StopWords, Template, TemplateType, get_template,
register_template)
from .utils import (LazyLLMDataset, LLMDataset, data_collate_fn, dataset_map,
download_dataset, find_all_linear_for_lora, get_time_info,
history_to_messages, inference, inference_stream,
Expand Down
Loading