## 引言

交叉验证主要讨论的是数据集的划分问题。

通常情况下，我们会采用均匀随机抽样的方式将数据集划分成3个部分——训练集、验证集和测试集，这三个集合不能有交集，常见的比例是8:1:1（如同[前文](https://golfxiao.blog.csdn.net/article/details/141325192)我们所作的划分)。这三个数据集的用途分别是：
- 训练集：用来训练模型，去学习模型的权重和偏置这些参数，这些参数可称为学习参数。
- 验证集：用于在训练过程中选择超参数，比如批量大小、学习率、迭代次数等，它并不参与梯度下降，也不参与学习参数的确定。
- 测试集：用于训练完成后评价最终的模型时使用，它既不参与学习参数的确定，也不参数超参数的选择，而仅仅使用于模型的评价。

> 注：千万不能在训练过程中使用测试集，不论是用于训练还是用于超参数的选择，这会将测试数据无意中提前透露给模型，相当于作弊，使得模型测试时准确率虚高。

而交叉验证与上述不同的地方在于：在手动划分时只分出训练集和测试集，在训练时再从训练集中动态抽取一定比例作为验证集，并且在多轮训练中会循环提取不同的训练集和验证集，例如：
- 第一轮训练时，将训练集平均分成5份，其中4份用来训练，1份用来验证。
- 第二轮训练时，取另外的4份来训练，剩下的1份来验证。
- ……
- 如此循环，直到每份数据都参与过训练和验证。

这样做的好处在于：模型能更充分的利用数据，更全面的学习到数据的整体特征，减少过拟合风险。叉验证的思想

## 训练过程

#### 初始化

In [52]:
%run trainer.py

In [2]:
traindata_path = '/data2/anti_fraud/dataset/train0819.jsonl'
evaldata_path = '/data2/anti_fraud/dataset/eval0819.jsonl'
model_path = '/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct'
output_path = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_1'

In [3]:
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
device = 'cuda'

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
train_dataset, eval_dataset = load_dataset(traindata_path, evaldata_path, tokenizer)

Map:   0%|          | 0/18787 [00:00<?, ? examples/s]

Map:   0%|          | 0/2348 [00:00<?, ? examples/s]

#### 数据处理

In [51]:
import glob
import gc
import numpy as np
from datasets import Dataset, concatenate_datasets
from sklearn.model_selection import KFold

拼接训练集和验证集作为一个数据集。

In [53]:
datasets = concatenate_datasets([train_dataset, eval_dataset])
len(datasets)

21135

创建KFold对象用于按折子划分数据集。
- n_splits=5：表示将数据集划分为5份。
- shuffle=True：表示调用`kf.split`划分数据集前先将顺序打乱。

> KFold是由sklearn库提供的k折交叉验证方法，它通过将数据集分成k个相同大小的子集（称为折），每次迭代数据集时，使用其中一个作为验证集，其余4个作为训练集，并重复这个过程k次。

In [54]:
kf = KFold(n_splits=5, shuffle=True)
kf

KFold(n_splits=5, random_state=None, shuffle=True)

用kfold划分数据集时，实际拿到的是数据在数据集中的索引顺序，如下面示例的效果。

In [55]:
indexes = kf.split(np.arange(len(datasets)))
train_indexes, val_indexes = next(indexes)
train_indexes, val_indexes, len(train_indexes), len(val_indexes)

(array([    0,     2,     3, ..., 21129, 21131, 21134]),
 array([    1,     9,    12, ..., 21130, 21132, 21133]),
 16908,
 4227)

#### 超参数定义

定义超参构造函数，包括训练参数和Lora微调参数。这里相对于之前作的调整在于：
- 修改评估和保存模型的策略，由每100step改为每个epoch，原因是前者保存的checkpoint有太多冗余。
- 将num_train_epochs调整为2，表示每个折子的数据集训练2遍，k=5时数据总共会训练10遍。

> 注：当`per_device_train_batch_size=16`时训练过程中会意外发生OOM，所以临时将批次大小per_device_train_batch_size改为8.

In [13]:
def build_arguments(output_path):
    train_args = build_train_arguments(output_path)
    train_args.eval_strategy='epoch'
    train_args.save_strategy='epoch'
    train_args.num_train_epochs = 2
    train_args.per_device_train_batch_size = 8
    
    lora_config = build_loraconfig()
    lora_config.lora_dropout = 0.2   # 增加泛化能力
    lora_config.r = 16
    lora_config.lora_alpha = 32
    return train_args, lora_config

由于训练过程中需要迭代更换不同的训练集和验证集组合，而更换数据集就需要重新创建训练器，传入新的模型实例。除了第一次训练是从0开始训练，后面几次都需要加载前一轮训练保存的最新checkpoint，以接着之前的结果继续训练。

In [None]:
定义一个`find_last_checkpoint`方法，用于从一个目录中查找最新的checkpoint。
 - glob.glob 函数可以在指定目录下查找所有匹配 `checkpoint-*` 模式的子目录
 - os.path.getctime 返回文件的创建时间（或最近修改时间）
 - max 函数根据这些时间找出最后创建的目录，也就是最新的checkpoint。

In [56]:
# 确定最后的checkpoint目录
def find_last_checkpoint(output_dir):
    checkpoint_dirs = glob.glob(os.path.join(output_dir, 'checkpoint-*'))
    last_checkpoint_dir = max(checkpoint_dirs, key=os.path.getctime)
    return last_checkpoint_dir

find_last_checkpoint("/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_1")

'/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_1/checkpoint-3522'

定义一个新的加载模型的方法，用于从基座模型和指定的checkpoint中加载最新训练的模型，并根据训练目标来设置参数的require_grad属性，这里将来自lora的参数都设置为需要梯度，其余参数设置不可训练。

In [29]:
def load_model_v2(model_path, checkpoint_path='', device='cuda'):
    # 加载模型
    model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)
    # 加载lora权重
    if checkpoint_path: 
        model = PeftModel.from_pretrained(model, model_id=checkpoint_path).to(device)
    # 将基础模型的参数设置为不可训练
    for param in model.base_model.parameters():
        param.requires_grad = False
    
    # 将 LoRA 插入模块的参数设置为可训练
    for name, param in model.named_parameters():
        if 'lora' in name:
            param.requires_grad = True
    return model

在这个训练过程中，第一次训练用的是从零初始化的微调秩，而后面几次训练则需要从指定checkpoint来初始化微调秩，这导致了[原先的build_trainer方法](https://golfxiao.blog.csdn.net/article/details/141500352)不通用。所以定义一个新的训练器构建方法，将加载微调参数的逻辑移到外面。

In [30]:
def build_trainer_v2(model, tokenizer, train_args, train_dataset, eval_dataset):
    # 开启梯度检查点时，要执行该方法
    if train_args.gradient_checkpointing:
        model.enable_input_require_grads()
    return Trainer(
        model=model,
        args=train_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
        callbacks=[EarlyStoppingCallback(early_stopping_patience=5)],  # 早停回调
    )

定义交叉训练的主循环。
- kf.split函数划分了5份数据索引，以这5份数据索引进行5次迭代。
- 使用`datasets.select`基于索引在每次迭代时选择不同的数据作为训练集和验证集。
- 为了避免前次迭代训练的结果被下次迭代的结果给覆盖，每次迭代训练通过fold来拼接不同的输出目录output_path。
- 如果存在last_checkpoint_path,则从checkpoint来加载模型，如果不存在，则使用get_peft_model向模型中插入一个新的Lora微调秩。
- 使用新的build_trainer_v2方法来构建训练器并开始训练。
- 每次迭代完都找出此次训练中最新的checkpoint，作为下次训练的起点。

In [37]:
results = []
last_checkpoint_path = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_0/checkpoint-4226'

for fold, (train_index, val_index) in enumerate(kf.split(np.arange(len(datasets)))):
    if fold < 1:
        continue
    print(f"fold={fold} start, train_index={train_index}, val_index={val_index}")
    train_dataset = datasets.select(train_index)
    eval_dataset = datasets.select(val_index)
    print(f"train data: {len(train_dataset)}, eval: {len(eval_dataset)}")

    output_path = f'/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_{fold}'
    train_args, lora_config = build_arguments(output_path)
    if last_checkpoint_path:
        model = load_model_v2(model_path, last_checkpoint_path, device)
    else:
        model = load_model(model_path, device)
        model = get_peft_model(model, load_config)

    model.print_trainable_parameters()
    trainer = build_trainer_v2(model, tokenizer, train_args, train_dataset, eval_dataset)
    train_result = trainer.train()
    print(f"fold={fold}, result = {train_result}")
    results.append(train_result)

    # 清理旧模型和优化器以释放显存
    del model
    del trainer
    torch.cuda.empty_cache()
    gc.collect()
    
    last_checkpoint_path = find_last_checkpoint(output_path)


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
loading configuration file /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct/config.json
Model config Qwen2Config {
  "_name_or_path": "/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null

fold=1 start, train_index=[    0     1     2 ... 21131 21133 21134], val_index=[    3     8    10 ... 21124 21130 21132]
train data: 16908, eval: 4227


All model checkpoint weights were used when initializing Qwen2ForCausalLM.

All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
loading configuration file /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}

Detected kernel version 4.15.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_aut

Epoch,Training Loss,Validation Loss
1,0.0088,0.01142
2,0.0046,0.013666


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


fold=1, result = TrainOutput(global_step=4226, training_loss=0.007882393647162119, metrics={'train_runtime': 3364.7696, 'train_samples_per_second': 10.05, 'train_steps_per_second': 1.256, 'total_flos': 8.266749122550989e+16, 'train_loss': 0.007882393647162119, 'epoch': 2.0})


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
loading configuration file /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct/config.json
Model config Qwen2Config {
  "_name_or_path": "/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null

fold=2 start, train_index=[    0     1     2 ... 21130 21132 21134], val_index=[    6    22    37 ... 21125 21131 21133]
train data: 16908, eval: 4227


All model checkpoint weights were used when initializing Qwen2ForCausalLM.

All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
loading configuration file /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}

Detected kernel version 4.15.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_aut

Epoch,Training Loss,Validation Loss
1,0.0032,0.004718
2,0.003,0.004082


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


fold=2, result = TrainOutput(global_step=4226, training_loss=0.004613035453062235, metrics={'train_runtime': 3361.5076, 'train_samples_per_second': 10.06, 'train_steps_per_second': 1.257, 'total_flos': 8.219199150976205e+16, 'train_loss': 0.004613035453062235, 'epoch': 2.0})


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
loading configuration file /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct/config.json
Model config Qwen2Config {
  "_name_or_path": "/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null

fold=3 start, train_index=[    0     1     2 ... 21132 21133 21134], val_index=[    5    16    20 ... 21126 21128 21129]
train data: 16908, eval: 4227


All model checkpoint weights were used when initializing Qwen2ForCausalLM.

All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
loading configuration file /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}

Detected kernel version 4.15.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_aut

Epoch,Training Loss,Validation Loss
1,0.0072,0.001999
2,0.0,0.000814


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


fold=3, result = TrainOutput(global_step=4226, training_loss=0.003411124254891226, metrics={'train_runtime': 3358.0538, 'train_samples_per_second': 10.07, 'train_steps_per_second': 1.258, 'total_flos': 8.188047700785562e+16, 'train_loss': 0.003411124254891226, 'epoch': 2.0})


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
loading configuration file /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct/config.json
Model config Qwen2Config {
  "_name_or_path": "/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": null

fold=4 start, train_index=[    2     3     4 ... 21131 21132 21133], val_index=[    0     1     7 ... 21116 21120 21134]
train data: 16908, eval: 4227


All model checkpoint weights were used when initializing Qwen2ForCausalLM.

All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
loading configuration file /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}

Detected kernel version 4.15.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_aut

Epoch,Training Loss,Validation Loss
1,0.0049,0.002273
2,0.0102,0.002139


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


fold=4, result = TrainOutput(global_step=4226, training_loss=0.0025673676080894925, metrics={'train_runtime': 3361.8695, 'train_samples_per_second': 10.059, 'train_steps_per_second': 1.257, 'total_flos': 8.252965690146816e+16, 'train_loss': 0.0025673676080894925, 'epoch': 2.0})


收集5轮训练的数据。

第0轮训练数据：
| Epoch |	Training Loss	| Validation Loss |
| --- | --- | --- |
|1	| 0.0233 |	0.02189 | 
|2	| 0.0138 | 0.01614 |
|3	| 0.008800 |	0.011420 |
|4	| 0.004600 |	0.013666 |
|5	| 0.003200 |	0.004718 |
|6	| 0.003000 |	0.004082 |
|7	| 0.007200 |	0.001999 |
|8	| 0.000000 |	0.000814 |
|9	| 0.004900 | 0.002273 |
|10	| 0.010200 | 0.002139 |



对比前面[欺诈文本分类微调（七）—— lora单卡二次调优](https://golfxiao.blog.csdn.net/article/details/141500352)训练进行到2300步左右（大概两遍数据）就开始过拟合，主要现象是验证损失到0.0161就不再下降反而开始升高，K折交叉训练直到第4次迭代（大概八遍数据）过后才达到损失最低点，第5次迭代才出现了略微的过拟合（相比于第4次），过拟合的现象得到了极大的缓解，验证损失也降到了一个更低的值0.000814，这说明数据在训练和验证中被充分的使用。

In [42]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_0/checkpoint-4226'
evaluate(model_path, checkpoint_path, evaldata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:20<00:00, 11.71it/s]

tn：1151, fp:14, fn:74, tp:1109
precision: 0.98753339269813, recall: 0.937447168216399





In [38]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_1/checkpoint-4226'
evaluate(model_path, checkpoint_path, evaldata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:20<00:00, 11.69it/s]

tn：1158, fp:7, fn:16, tp:1167
precision: 0.9940374787052811, recall: 0.9864750633981403





In [39]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_2/checkpoint-4226'
evaluate(model_path, checkpoint_path, evaldata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:20<00:00, 11.71it/s]

tn：1162, fp:3, fn:3, tp:1180
precision: 0.9974640743871513, recall: 0.9974640743871513





In [40]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_3/checkpoint-4226'
evaluate(model_path, checkpoint_path, evaldata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:20<00:00, 11.69it/s]

tn：1164, fp:1, fn:4, tp:1179
precision: 0.9991525423728813, recall: 0.9966187658495351





In [41]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_4/checkpoint-4226'
evaluate(model_path, checkpoint_path, evaldata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2348/2348 [03:22<00:00, 11.57it/s]

tn：1164, fp:1, fn:1, tp:1182
precision: 0.9991546914623838, recall: 0.9991546914623838





## 评估测试
由于交叉训练中验证集和训练集都参与了模型学习参数的更新，所以用验证集进行评估已经没有意义。我们直接用测试集进行最后的评估。

In [48]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_0/checkpoint-4226'
testdata_path = '/data2/anti_fraud/dataset/test0819.jsonl'
evaluate(model_path, checkpoint_path, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2349/2349 [03:19<00:00, 11.75it/s]

tn：1135, fp:32, fn:128, tp:1054
precision: 0.9705340699815838, recall: 0.8917089678510999





In [49]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_2/checkpoint-4226'
testdata_path = '/data2/anti_fraud/dataset/test0819.jsonl'
evaluate(model_path, checkpoint_path, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2349/2349 [03:21<00:00, 11.64it/s]

tn：1133, fp:34, fn:64, tp:1118
precision: 0.9704861111111112, recall: 0.9458544839255499





In [50]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_3/checkpoint-4226'
testdata_path = '/data2/anti_fraud/dataset/test0819.jsonl'
evaluate(model_path, checkpoint_path, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2349/2349 [03:21<00:00, 11.66it/s]

tn：1128, fp:39, fn:64, tp:1118
precision: 0.9662921348314607, recall: 0.9458544839255499





In [43]:
%run evaluate.py
checkpoint_path='/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0903_4/checkpoint-4226'
testdata_path = '/data2/anti_fraud/dataset/test0819.jsonl'
evaluate(model_path, checkpoint_path, testdata_path, device, batch=True, debug=True)

run in batch mode, batch_size=8


progress: 100%|██████████| 2349/2349 [03:22<00:00, 11.58it/s]

tn：1124, fp:43, fn:50, tp:1132
precision: 0.963404255319149, recall: 0.9576988155668359





**小结**：本文通过引入K折交叉验证方法，循环选择不同的训练集和验证集进行多次迭代训练，将损失降到了一个更低的值，也在很大程度上缓解了[前面每次训练]过程中都出现的过拟合现象。最终在从未见过的测试数据集上进行评测时，精确率和召回率指标也有了一个大的提升，K折交叉验证这种方法确实能让模型对数据学习的更充分，最终得到的模型泛化能力也更好。

## 参考文献
- [交叉验证方法汇总](https://blog.csdn.net/WHYbeHERE/article/details/108192957)