## 引言
[前文](https://golfxiao.blog.csdn.net/article/details/141440847)训练时都做了一定的编码工作，其实有一些框架可以支持我们零代码微调，[LLama-Factory](https://llamafactory.readthedocs.io/zh-cn/latest/)就是其中一个。这是一个专门针对大语言模型的微调和训练平台，有如下特性：
- 支持常见的模型种类：LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Yi、Gemma、Baichuan、ChatGLM、Phi 等等。 
- 支持单GPU和多GPU训练。
- 支持全参微调、Lora微调、QLora微调。
……
还有很多优秀的特性，详细参考：[https://llamafactory.readthedocs.io/zh-cn/latest/](https://llamafactory.readthedocs.io/zh-cn/latest/)

本文会尝试用LLamaFactory进行一次多GPU训练。



## 数据集准备

针对sft， llamafactory支持多种数据格式，我们这里选用alpaca，此格式简单清晰，每条数据只需包含三个字段：
- instruction 列对应的内容为人类指令； 
- input 列对应的内容为人类输入；  
- output 列对应的内容为模型回答。

```python
{
  "instruction": "计算这些物品的总费用。 ",
  "input": "输入：汽车 - $3000，衣服 - $100，书 - $20。",
  "output": "汽车、衣服和书的总费用为 $3000 + $100 + $20 = $3120。"
}
```
为了格式匹配，封装一个函数`to_alpaca`用于转换数据。

In [None]:
import json
import os

def to_alpaca(input_path, output_path):
    with open(input_path, 'r', encoding='utf-8') as infile, open(output_path, 'w', encoding='utf-8') as outfile:  
        dataset = []
        for line in infile:  
            # 解析每一行的 JSON  
            data = json.loads(line)  
            response = {"is_fraud":data["label"], "fraud_speaker":data["fraud_speaker"], "reason":data["reason"]}
            item = {
                'input': data['input'],
                'output': json.dumps(response, ensure_ascii=False),
                'instruction':data['instruction'],
            }  
            dataset.append(item)
        # 将结果写入输出文件  
        outfile.write(json.dumps(dataset, indent=4, ensure_ascii=False))  
        print(f"convert over，{input_path} to {output_path}")


批量将前一节构建好的数据作格式转换。

In [None]:
# 假设输入数据存储在 input.jsonl 文件中  
input_files = [
    '../dataset/fraud/train_test/train0819.jsonl',
    '../dataset/fraud/train_test/test0819.jsonl',
    '../dataset/fraud/train_test/eval0819.jsonl',
]

def filename(path):
    filename_with_ext = os.path.basename(path)
    filename, extention = os.path.splitext(filename_with_ext)
    return filename

for input_path in input_files:
    output_path = f'../dataset/fraud/train_test/{filename(input_path)}_alpaca.jsonl'
    to_alpaca(input_path, output_path)

convert over，../dataset/fraud/train_test/train0819.jsonl to ../dataset/fraud/train_test/train0819_alpaca.json
convert over，../dataset/fraud/train_test/test0819.jsonl to ../dataset/fraud/train_test/test0819_alpaca.json
convert over，../dataset/fraud/train_test/eval0819.jsonl to ../dataset/fraud/train_test/eval0819_alpaca.json

转换好数据集后，需要将其配置到LLamaFactory安装目录下的`data/dataset_info.json`文件中，只需要在文件最后添加我们新构造的数据集。

In [None]:
{
  "identity": {
    "file_name": "identity.json"
  },
  ……
  "anti_fraud": {
    "file_name": "train0819_alpaca.jsonl",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output"
    }
  }
}

## 参数配置
LLamaFactory的训练参数采用yaml文件保存，在安装目录下的`examples`子目录下有各种微调方法的示例配置，可以直接拷贝一份进行修改。

![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/95c191a523e840fc969c0d014c82047e.png)

查看配置文件

In [9]:
!cat /data2/downloads/LLaMA-Factory/qwen2_lora_sft.yaml 

### model
model_name_or_path: /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct
resume_from_checkpoint: /data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-1200

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.2


### dataset
dataset_dir: /data2/downloads/LLaMA-Factory/data
dataset: anti_fraud
template: qwen
cutoff_len: 1024
max_samples: 200000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: /data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826
logging_steps: 10
save_steps: 100
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 16
gradient_accumulation_steps: 1
gradient_checkpointing: true
learning_rate: 1.0e-4
num_train_epochs: 10.0
lr_scheduler_type: cosine
warmup_ratio: 0.05
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 8
eval_str

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## 开始训练

设置环境变量CUDA_VISIBLE_DEVICES声明训练过程中允许使用4张显卡，显卡编号分别为1、2、3、4。

使用	`llamafactory-cli`命令启动训练。

In [4]:
import os 

os.environ["CUDA_VISIBLE_DEVICES"] = "1,2,3,4"

In [5]:
!llamafactory-cli train /data2/downloads/LLaMA-Factory/qwen2_lora_sft.yaml 

[2024-08-27 18:06:56,229] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
  def forward(ctx, input, weight, bias=None):
  def backward(ctx, grad_output):
08/27/2024 18:07:00 - INFO - llamafactory.cli - Initializing distributed tasks at: 127.0.0.1:20962
W0827 18:07:02.274000 139932174230464 torch/distributed/run.py:779] 
W0827 18:07:02.274000 139932174230464 torch/distributed/run.py:779] *****************************************
W0827 18:07:02.274000 139932174230464 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0827 18:07:02.274000 139932174230464 torch/distributed/run.py:779] *****************************************
[2024-08-27 18:07:06,832] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-27 18

## 验证数据集上评估

In [1]:
%run evaluate.py
testdata_path = '/data2/anti_fraud/dataset/eval0819.jsonl'
model_path = '/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct'
device = 'cuda'

分别评估验证数据集上在不同checkpoint上的性能表现。

In [2]:
%%time
## eval_loss=0.0152
checkpoint_path_900 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-900'
evaluate(model_path, checkpoint_path_900, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:22<00:00, 11.59it/s]

tn：1160, fp:5, fn:103, tp:1080
precision: 0.9953917050691244, recall: 0.9129332206255283
CPU times: user 3min 26s, sys: 21.7 s, total: 3min 48s
Wall time: 3min 25s





In [9]:
%%time
## eval_loss=0.0137
checkpoint_path_1400 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-1400'
evaluate(model_path, checkpoint_path_1400, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:13<00:00, 12.12it/s]

tn：1162, fp:3, fn:67, tp:1116
precision: 0.9973190348525469, recall: 0.9433643279797126
CPU times: user 3min 19s, sys: 19.3 s, total: 3min 38s
Wall time: 3min 16s





In [4]:
%%time
## eval_loss=0.020
checkpoint_path_1800 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-1800'
evaluate(model_path, checkpoint_path_1800, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:22<00:00, 11.62it/s]

tn：1161, fp:4, fn:17, tp:1166
precision: 0.9965811965811966, recall: 0.9856297548605241
CPU times: user 3min 23s, sys: 21.7 s, total: 3min 45s
Wall time: 3min 24s





In [3]:
%%time
## eval_loss=0.035
checkpoint_path_2800 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-2800'
evaluate(model_path, checkpoint_path_2800, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:19<00:00, 11.77it/s]

tn：1161, fp:4, fn:9, tp:1174
precision: 0.9966044142614601, recall: 0.9923922231614539
CPU times: user 3min 21s, sys: 21.9 s, total: 3min 43s
Wall time: 3min 22s





从验证数据集上的评估结果来看，模型的精确率和召回率都有了显著的提升，多卡训练效果显著好于单卡，应该是批量增大，训练更稳定带来的好处。

## 测试数据集上评估


In [2]:
%run evaluate.py
testdata_path = '/data2/anti_fraud/dataset/test0819.jsonl'
model_path = '/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct'
device = 'cuda:1'

分别评估不同checkpoint在测试数据集上的性能表现。

In [4]:
%%time
## eval_loss=0.0152
checkpoint_path_900 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-900'
evaluate(model_path, checkpoint_path_900, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2349/2349 [03:43<00:00, 10.51it/s]

tn：1142, fp:25, fn:171, tp:1011
precision: 0.9758687258687259, recall: 0.8553299492385786
CPU times: user 3min 44s, sys: 25.3 s, total: 4min 10s
Wall time: 3min 46s





In [5]:
%%time
## eval_loss=0.0137
checkpoint_path_1400 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-1400'
evaluate(model_path, checkpoint_path_1400, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2349/2349 [03:22<00:00, 11.62it/s]

tn：1136, fp:31, fn:162, tp:1020
precision: 0.9705042816365367, recall: 0.8629441624365483
CPU times: user 3min 26s, sys: 19.6 s, total: 3min 46s
Wall time: 3min 24s





In [6]:
%%time
## eval_loss=0.020
checkpoint_path_1800 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-1800'
evaluate(model_path, checkpoint_path_1800, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2349/2349 [03:29<00:00, 11.20it/s]

tn：1129, fp:38, fn:104, tp:1078
precision: 0.9659498207885304, recall: 0.9120135363790186
CPU times: user 3min 29s, sys: 23.8 s, total: 3min 52s
Wall time: 3min 32s





In [7]:
%%time
## eval_loss=0.035
checkpoint_path_2800 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-2800'
evaluate(model_path, checkpoint_path_2800, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2349/2349 [03:50<00:00, 10.18it/s]

tn：1112, fp:55, fn:69, tp:1113
precision: 0.9529109589041096, recall: 0.9416243654822335
CPU times: user 3min 43s, sys: 31.6 s, total: 4min 15s
Wall time: 3min 53s





测试数据集上的评测结果相比验证数据集上的评估结果，性能有明显差距，模型训练中应该是出现了过拟合。
从相应的损失和梯度数据上也能看出来，在2885步时训练损失已经为0，梯度也变得非常小（0.00016）。
```json
{'loss': 0.0, 'grad_norm': 0.0001625923760002479, 'learning_rate': 8.870936304049726e-07, 'epoch': 9.43}
```