Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ sft_args = SftArguments(
dataset=[DatasetName.blossom_math_zh],
output_dir='output',
gradient_checkpointing=True)
best_ckpt_dir = sft_main(sft_args)
best_ckpt_dir = sft_main(sft_args)['best_model_checkpoint']
print(f'best_ckpt_dir: {best_ckpt_dir}')
torch.cuda.empty_cache()
infer_args = InferArguments(
Expand All @@ -159,7 +159,11 @@ web_ui_main(infer_args)
```bash
# Experimental environment: A10, 3090, A100, ...
# 20GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset blossom-math-zh \
--output_dir output \

# Using DDP
# Experimental environment: 2 * 3090
Expand All @@ -169,18 +173,31 @@ NPROC_PER_NODE=2 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset blossom-math-zh \
--output_dir output \

# Using custom dataset
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --custom_train_dataset_path chatml.jsonl
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--custom_train_dataset_path chatml.jsonl \
--output_dir output \
```

**Inference**:
```bash
# Original Model
CUDA_VISIBLE_DEVICES=0 swift infer --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh

# Fine-tuned Model
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
```

**Web-UI**:
```bash
# Original Model
CUDA_VISIBLE_DEVICES=0 swift web-ui --model_id_or_path qwen/Qwen-7B-Chat

# Fine-tuned Model
CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
```

Expand Down
25 changes: 21 additions & 4 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ sft_args = SftArguments(
dataset=[DatasetName.blossom_math_zh],
output_dir='output',
gradient_checkpointing=True)
best_ckpt_dir = sft_main(sft_args)
best_ckpt_dir = sft_main(sft_args)['best_model_checkpoint']
print(f'best_ckpt_dir: {best_ckpt_dir}')
torch.cuda.empty_cache()
infer_args = InferArguments(
Expand All @@ -156,7 +156,11 @@ web_ui_main(infer_args)
```bash
# Experimental environment: A10, 3090, A100, ...
# 20GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset blossom-math-zh \
--output_dir output \

# 使用DDP
# Experimental environment: 2 * 3090
Expand All @@ -166,18 +170,31 @@ NPROC_PER_NODE=2 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset blossom-math-zh \
--output_dir output \

# 使用自己的数据集
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --custom_train_dataset_path chatml.jsonl
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--custom_train_dataset_path chatml.jsonl \
--output_dir output \
```

**推理**:
```bash
# 原始模型
CUDA_VISIBLE_DEVICES=0 swift infer --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh

# 微调后的模型
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
```

**Web-UI**
**Web-UI**:
```bash
# 原始模型
CUDA_VISIBLE_DEVICES=0 swift web-ui --model_id_or_path qwen/Qwen-7B-Chat

# 微调后的模型
CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
```

Expand Down
31 changes: 24 additions & 7 deletions examples/pytorch/llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ sft_args = SftArguments(
dataset=[DatasetName.blossom_math_zh],
output_dir='output',
gradient_checkpointing=True)
best_ckpt_dir = sft_main(sft_args)
best_ckpt_dir = sft_main(sft_args)['best_model_checkpoint']
print(f'best_ckpt_dir: {best_ckpt_dir}')
torch.cuda.empty_cache()
infer_args = InferArguments(
Expand All @@ -122,7 +122,11 @@ web_ui_main(infer_args)
```bash
# Experimental environment: A10, 3090, A100, ...
# 20GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset blossom-math-zh \
--output_dir output \

# Using DDP
# Experimental environment: 2 * 3090
Expand All @@ -132,18 +136,31 @@ NPROC_PER_NODE=2 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset blossom-math-zh \
--output_dir output \

# Using custom dataset
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --custom_train_dataset_path chatml.jsonl
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--custom_train_dataset_path chatml.jsonl \
--output_dir output \
```

**Inference**:
```bash
# Original Model
CUDA_VISIBLE_DEVICES=0 swift infer --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh

# Fine-tuned Model
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
```

**Web-UI**:
```bash
# Original Model
CUDA_VISIBLE_DEVICES=0 swift web-ui --model_id_or_path qwen/Qwen-7B-Chat

# Fine-tuned Model
CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
```

Expand Down Expand Up @@ -574,9 +591,9 @@ The template initialization function retrieves the complete chat template based
-- `check_model_is_latest`: Check if the model is the latest, default is `True`. If you need to train without internet connection, please set this parameter to `False`.
- `--max_new_tokens`: The maximum number of new tokens to generate. The default value is `2048`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--do_sample`: Whether to use sampling during generation. The default value is `True`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--temperature`: The temperature value for sampling during generation. The default value is `0.9`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--temperature`: The temperature value for sampling during generation. The default value is `0.3`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--top_k`: The value of k for top-k sampling during generation. The default value is `20`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--top_p`: The cumulative probability threshold for top-p sampling during generation. The default value is `0.9`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--top_p`: The cumulative probability threshold for top-p sampling during generation. The default value is `0.7`. This parameter only takes effect when `predict_with_generate` is set to True.
- `--repetition_penalty`: The repetition penalty applied during generation. The default value is `1.05`. This parameter only takes effect when `predict_with_generate` is set to True.


Expand Down Expand Up @@ -606,9 +623,9 @@ The template initialization function retrieves the complete chat template based
- `--bnb_4bit_use_double_quant`: Default value is `True`. For specific parameter details, please refer to the `sft.sh Command Line Arguments`. This parameter is not effective if `quantization_bit` is set to 0.
- `--max_new_tokens`: Maximum number of new tokens to generate. Default value is `2048`.
- `--do_sample`: Whether to use greedy decoding or sampling for generation. Default value is `True`.
- `--temperature`: Default value is `0.9`. This parameter only takes effect when `do_sample` is set to True.
- `--temperature`: Default value is `0.3`. This parameter only takes effect when `do_sample` is set to True.
- `--top_k`: Default value is `20`. This parameter only takes effect when `do_sample` is set to True.
- `--top_p`: Default value is `0.9`. This parameter only takes effect when `do_sample` is set to True.
- `--top_p`: Default value is `0.7`. This parameter only takes effect when `do_sample` is set to True.
- `--repetition_penalty`: Default value is `1.05`.
- `--use_flash_attn`: Default value is `None`, which means 'auto'. For specific parameter details, please refer to the `sft.sh Command Line Arguments`. The models that support 'flash_attn' include: qwen series, qwen-vl series, llama series, openbuddy series, mistral series, yi series, ziya series.
- `--ignore_args_error`: Default value is `False`. For specific parameter details, please refer to the `sft.sh Command Line Arguments`.
Expand Down
33 changes: 25 additions & 8 deletions examples/pytorch/llm/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ sft_args = SftArguments(
dataset=[DatasetName.blossom_math_zh],
output_dir='output',
gradient_checkpointing=True)
best_ckpt_dir = sft_main(sft_args)
best_ckpt_dir = sft_main(sft_args)['best_model_checkpoint']
print(f'best_ckpt_dir: {best_ckpt_dir}')
torch.cuda.empty_cache()
infer_args = InferArguments(
Expand All @@ -121,7 +121,11 @@ web_ui_main(infer_args)
```bash
# Experimental environment: A10, 3090, A100, ...
# 20GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset blossom-math-zh \
--output_dir output \

# 使用DDP
# Experimental environment: 2 * 3090
Expand All @@ -131,18 +135,31 @@ NPROC_PER_NODE=2 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--dataset blossom-math-zh \
--output_dir output \

# 使用自己的数据集
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --custom_train_dataset_path chatml.jsonl
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_id_or_path qwen/Qwen-7B-Chat \
--custom_train_dataset_path chatml.jsonl \
--output_dir output \
```

**推理**:
```bash
# 原始模型
CUDA_VISIBLE_DEVICES=0 swift infer --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh

# 微调后的模型
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
```

**Web-UI**
**Web-UI**:
```bash
# 原始模型
CUDA_VISIBLE_DEVICES=0 swift web-ui --model_id_or_path qwen/Qwen-7B-Chat

# 微调后的模型
CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
```

Expand Down Expand Up @@ -577,9 +594,9 @@ if __name__ == '__main__':
- `--check_model_is_latest`: 检查模型是否是最新, 默认为`True`. 如果你需要断网进行训练, 请将该参数设置为`False`.
- `--max_new_tokens`: 默认为`2048`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--do_sample`: 默认为`True`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--temperature`: 默认为`0.9`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--temperature`: 默认为`0.3`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--top_k`: 默认为`20`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--top_p`: 默认为`0.9`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--top_p`: 默认为`0.7`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
- `--repetition_penalty`: 默认为`1.05`. 该参数只有在`predict_with_generate`设置为True的时候才生效.


Expand Down Expand Up @@ -609,9 +626,9 @@ if __name__ == '__main__':
- `--bnb_4bit_use_double_quant`: 默认值为`True`. 具体的参数介绍可以在`sft.sh命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
- `--max_new_tokens`: 生成新token的最大数量, 默认值为`2048`.
- `--do_sample`: 是使用贪婪生成的方式还是采样生成的方式, 默认值为`True`.
- `--temperature`: 默认值为`0.9`. 该参数只有在`do_sample`设置为True时才生效.
- `--temperature`: 默认值为`0.3`. 该参数只有在`do_sample`设置为True时才生效.
- `--top_k`: 默认值为`20`. 该参数只有在`do_sample`设置为True时才生效.
- `--top_p`: 默认值为`0.9`. 该参数只有在`do_sample`设置为True时才生效.
- `--top_p`: 默认值为`0.7`. 该参数只有在`do_sample`设置为True时才生效.
- `--repetition_penalty`: 默认值为`1.05`.
- `--use_flash_attn`: 默认值为`None`, 即为'auto'. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--ignore_args_error`: 默认值为`False`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
Expand Down
2 changes: 1 addition & 1 deletion examples/pytorch/llm/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@
# or chat
args = InferArguments(model_type=ModelType.qwen_7b_chat_int4)
# or load from ckpt dir
# args = InferArguments(ckpt_dir='xxx/vx_xxx/checkpoint-xxx', load_args_from_ckpt_dir=True)
# args = InferArguments(ckpt_dir='xxx/vx_xxx/checkpoint-xxx')
web_ui_main(args)
3 changes: 2 additions & 1 deletion examples/pytorch/llm/llm_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@
from swift.llm.run import infer_main

if __name__ == '__main__':
infer_main()
result = infer_main()
print(f'infer_main result: {result}')
4 changes: 2 additions & 2 deletions examples/pytorch/llm/llm_sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
from swift.llm.run import sft_main

if __name__ == '__main__':
best_ckpt_dir = sft_main()
print(f'best_ckpt_dir: {best_ckpt_dir}')
output = sft_main()
print(f'sft_main output: {output}')
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,8 @@ python llm_infer.py \
--eval_human false \
--max_length 4096 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.1 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,8 @@ python llm_infer.py \
--eval_human false \
--max_length 2048 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.1 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ python llm_infer.py \
--eval_human false \
--max_length 4096 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.1 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ python llm_infer.py \
--eval_human false \
--max_length 4096 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.1 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
5 changes: 2 additions & 3 deletions examples/pytorch/llm/scripts/baichuan2_7b/qlora/infer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ python llm_infer.py \
--eval_human false \
--max_length 2048 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.7 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ python llm_infer.py \
--eval_human false \
--max_length 4096 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.1 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ python llm_infer.py \
--eval_human false \
--max_length 4096 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.1 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ python llm_infer.py \
--eval_human false \
--max_length 4096 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.1 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ python llm_infer.py \
--eval_human false \
--max_length 4096 \
--max_new_tokens 2048 \
--temperature 0.9 \
--top_k 20 \
--top_p 0.9 \
--temperature 0.1 \
--top_p 0.7 \
--repetition_penalty 1.05 \
--do_sample true \
--merge_lora_and_save false \
Loading