<a href="https://colab.research.google.com/github/reven404/learning-ai-practice/blob/main/fine_tune_quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hugging Face Transformers 微调训练入门

本示例将介绍基于 Transformers 实现模型微调训练的主要流程，包括：
- 数据集下载
- 数据预处理
- 训练超参数配置
- 训练评估指标设置
- 训练器基本介绍
- 实战训练
- 模型保存

## YelpReviewFull 数据集

**Hugging Face 数据集：[ YelpReviewFull ](https://huggingface.co/datasets/yelp_review_full)**

### 数据集摘要

Yelp评论数据集包括来自Yelp的评论。它是从Yelp Dataset Challenge 2015数据中提取的。

### 支持的任务和排行榜
文本分类、情感分类：该数据集主要用于文本分类：给定文本，预测情感。

### 语言
这些评论主要以英语编写。

### 数据集结构

#### 数据实例
一个典型的数据点包括文本和相应的标签。

来自YelpReviewFull测试集的示例如下：

```json
{
    'label': 0,
    'text': 'I got \'new\' tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. \\nI took the tire over to Flynn\'s and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he\'d give me a new tire \\"this time\\". \\nI will never go back to Flynn\'s b/c of the way this guy treated me and the simple fact that they gave me a used tire!'
}
```

#### 数据字段

- 'text': 评论文本使用双引号（"）转义，任何内部双引号都通过2个双引号（""）转义。换行符使用反斜杠后跟一个 "n" 字符转义，即 "\n"。
- 'label': 对应于评论的分数（介于1和5之间）。

#### 数据拆分

Yelp评论完整星级数据集是通过随机选取每个1到5星评论的130,000个训练样本和10,000个测试样本构建的。总共有650,000个训练样本和50,000个测试样本。

## 下载数据集

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os
WORK_HOME='/content/drive/MyDrive/Colab Notebooks'
os.environ['TRANSFORMERS_CACHE'] = f'{WORK_HOME}/NLP/HuggingfaceCash'
os.environ['HF_DATASETS_CACHE'] = f'{WORK_HOME}/NLP/HuggingfaceCash/Datasets'
os.environ['HF_HOME'] = f'{WORK_HOME}/hf'
os.environ['HF_HUB_CACHE'] = f'{WORK_HOME}/hf/hub/cache'

Mounted at /content/drive


In [None]:
# @title
!pip install torch>=2.1.2 transformers timm datasets evaluate scikit-learn pandas peft accelerate autoawq optimum auto-gptq bitsandbytes>0.39.0 jiwer soundfile>=0.12.1 librosa langchain gradio trl

In [None]:
from datasets import load_dataset

dataset = load_dataset("yelp_review_full")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})

In [None]:
dataset["train"][1234]

{'label': 2,
 'text': 'Having lived near this Atria\'s location in the past, I can say that I\'ve spent more than my fair share of time at the PNC Park Atria\'s. My wife and I stopped in this past week, and for me, I was reminded of why I enjoy Atria\'s...but this having been our first time at Atria\'s together, and witnessing my wife\'s experience, I realized why the place could garner such low reviews.\\n\\nFor someone like myself, who\'s actually had enough positive and negative food experiences at Atria\'s to know exactly what to order and what to steer clear of, I can make the experience a positive one, but that\'s a situation I\'m afforded now thanks to my previous patience and convenience. My wife, opting to stray from my suggestions had an awful dinner. Her plate of bland fish and asparagus left her turned off completely. Luckily, we were using a gift card, so it wasn\'t as frustrating for her as it could\'ve been. Regardless, walking away from an entire plate of food is dishea

In [None]:
import random
import pandas as pd
import datasets
from IPython.display import display, HTML

In [None]:
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset) - 1)
        while pick in picks:
            pick = random.randint(0, len(dataset) - 1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(dataset["train"])

Unnamed: 0,label,text
0,4 stars,"Always great food, drinks & energy! If you are at the Fashion Show Mall this place is worth your visit. Good margaritas and yummy food! Their special events are excellent too. I have not had a chance to go to the cooking events but I have wanted to. It was fun to do Cinco de Mayo here!"
1,2 star,"Like most reviewers have said, its overpriced. The food is good, but not good enough to justify a $60 bill for one glass of wine and \""good\"" crab cakes. And there are no frills - no bread for the table, no sides, nada...so if I'm going to pay that much for a meal I better get more than 2 crab cakes and a glass of wine. I was disappointed and I feel like I could spend less than that and get a fantastic meal at a place like Eleven. \n\nAlso the service left much to be desired during our trip there. Earlier that day, I had made reservations online but I had somehow messed up the online reservation, but called the restaurant immediately after realizing my mistake to let them know the glitch with the online reservation. The gentleman I spoke with said he would take care of it and there would be no problem and my reservation would be changed to reflect the day/time we wanted. \n\nFast forward 6 hours and the hostess was completely clueless and said that we didn't have a reservation for us. I explained the situation from earlier that day but I was met with 3 hostess people looking at each other and the computer screen of reservations with confusion. They talked it over and said that they would get us a table shortly...now I'm not sure what \""shortly\"" means to most people, but my expectation was 5-10 minutes. Fast forward 25 minutes later, with no updates from them during this time, it was my friend and I who were pestering them with what was going on. We would have been find if they told us that sorry but we are unable to accommodate you, rather than having us wait in their lobby area for nearly 30 minutes!\n\nAll in all - sub par experience and I doubt I'd go back."
2,3 stars,"Tried out a slider and wasn't very impressed. However, I will say that the tomato bisque is simply delectable and should definitely be tried. What I found odd was that the waiters have numbers as name tags instead of their actual names... Are these waiters nothing but numbers to this establishment? Haha, I hope not."
3,2 star,Horrible horrible went with a friend his 2nd time there he had bad experience the first time but went on the first day (no waffles or sweet potato pancakes) so we went around noon place has been open for 2 weeks nice hostess got a table right away then sat there for 10 mins befits someone took our drink order still missing waffles on the menu so I went for a burger I wanted soup the waitress said it's not ready it was 1245pm 1/2hr later we got our food ehhh jus a burger not impressed saw some of the Mack bros there on the way out as we paid hostess asked how it was and I said ok actually not good 1/2hr for 2 burgers that sucked and are over priced he over heard and stared me down like he was gonna fight me lol I'll never go back there again
4,4 stars,"Sal\u00fad is a welcome addition to the neighborhood, especially across from my favorite yoga studio. The staff was very helpful and blended my juice fresh for me, which I appreciated. I am looking forward to returning soon!"
5,4 stars,"La Santisima, known in a past life as La Condesa, makes some of the best...THE BEST burritos and salsas in the valley. It looks like fast casual food, but it's a sit down restaurant that costs less than competitors while offering way higher quality eats. \n\nLike the 15+ varieties of salsa, ranging from a spicy, smoky chipotle to a light, creamy cilantro. Your food is going to take forever, so you might as well enjoy as many as you can. They all come across as very fresh--none of that canned crap. \n\nMy burrito, the Gaucho, was up there with the best I've had. And I've had a LOT of burros. The grilled white cheese, rich steak that had a faint wine flavor, cucumbers, and some peppers. Amazing. Great! The portion was huge as well, and I could barely finish it. I have a HUGE appetite. \n\nThe coolest part was that it was only around $9. That's somewhere between fast-casual and sit-down prices, but with better quality than virtually any Mexican restaurant. For about another $4 you can add some awesome aguas frescas like the horchata, complete with a touch of strawberry. I actually don't like that type of fruit infusion in something I expect to be straight up rice and cinnamon, but I can appreciate the quality ingredients. \n\nI'm sold on this place, and I definitely recommend you should go, but that doesn't mean I'm going to overlook a few flaws that irk me. The first is that I'm not sure the furniture is either comfortable, tasteful, or clean. The high wooden tables are much obliged for taller patrons like myself, but they were sticky...despite being actively cleaned as we walked in. Gross. The seats are shoddy as well. The airbrush-on-canvas Frida Kahlo paired with the \""no no THIS IS HOW MEXICAN FOOD IS DONE WE USE PROPER NAMES\"" motif on the menu give this a unique, yet eyeroll-worthy vibe I can best describe as try-hard and totally hipster. \n\nService is questionable as well. Although I appreciate that everyone who is not American actually takes their time at restaurants, I don't find anything particularly complex that warrants 20+ minutes for a burrito, and tacos/beans/rice for my dining partner. This is on top of the 10+ minutes it took to get a menu. We actually had to get up and grab them ourselves! They were not really paying attention. \n\nOh yeah, and then there's the warm tap water. Gross. \n\nAnyway, Santisima comes across as a place with great cooks that don't know how to run a restaurant. I can't say I enjoyed any aspect of dining in besides salsa sampling, and so I've gotta deduct points even though that was a 5-star burrito. Fortunately, you can avoid these issues by just ordering something to go, which is probably what I'll gladly do when I undoubtedly return."
6,5 stars,"A long time ago in a toxic waste dump far away called New Jersey, a young boy had his first pizza, a thin crusted thin crust pizza with fresh toppings all over it. It was heaven to that boy and for the last 20 years he has been trying to find a western region version close to that perfect bite. Those Guys Pies comes awfully close. Led by Roy Bass, who helped start Secret Pizza at the Cosmo, Those Guys Pies not only makes superb pizza but excellent cheesesteak (Roy hails from Cherry Hill, just a skimming stone away from Philly), and even homemade mozzarella sticks. The ingredients are fresh and clearly Roy and his partner in crime put east coast love into their creations.\n\nMy one regret is they do not serve slices at night. Still, leftover pizza will be good in the morning :)\n\nEnjoy the pie with the transplanted east coaster Seal of Approval."
7,1 star,"Bland food, rude service (from an Italian grandfather--which is weird); however, I've since learned not to blame it on Peru!!"
8,1 star,"I wish I'd consulted yelp. I came here this past week, after not eating for an entire day.\n\n I was super pissed my friend insisted on finishing his crappy $10ish meal, while I sat there sipping on sprite and trying to forget the horrifying food I'd just sampled.\n\n\""Do NOT go in there! Whoo!\"""
9,2 star,"To put it bluntly there are better places to eat at for the money you would spend here. Landrys tries to be an upscale restaurant but really falls short. \n\nTo start I had the lobster bisque and it was good! I wish I had just stuck to the bisque as my crab cakes were disappointing. There was a bit too much breading and not enough crab. Additionally they put some weird sauce on the bottom of the plate which I felt did not compliment the cakes. My Freind had the clam chowder which she said was also ver good. \n\nThe atmosphere is slightly upscale- maybe more along the lines of \""I want to be an upscale restaurant but don't know how.\"" Our server Rosa was very disinterested in us from the beginning when we told her water to start and let is look over the menu. She then proceeded to try and sell up some rewards card for $25.00- all of which happened before we really got to look at the menu and order..... Not the way you want to start off dinner!"


## 预处理数据

下载数据集到本地后，使用 Tokenizer 来处理文本，对于长度不等的输入数据，可以使用填充（padding）和截断（truncation）策略来处理。

Datasets 的 `map` 方法，支持一次性在整个数据集上应用预处理函数。

下面使用填充到最大长度的策略，处理整个数据集：

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


tokenized_datasets = dataset.map(tokenize_function, batched=True)



In [None]:
show_random_elements(tokenized_datasets["train"], num_examples=1)

Unnamed: 0,label,text,input_ids,token_type_ids,attention_mask
0,3 stars,"I've come in here a few times and I must say that it is delicious food at cheap prices. I got an entire teriyaki bowl and dumplings with a drink for under 7 dollars. Decent atmosphere, it is usually quiet, so perhaps a good spot to get some work done as you eat. Great place for lunch w friends. Take out is always an option too. Check it out if you're in the mood for something asian and different.","[101, 146, 112, 1396, 1435, 1107, 1303, 170, 1374, 1551, 1105, 146, 1538, 1474, 1115, 1122, 1110, 13108, 2094, 1120, 10928, 7352, 119, 146, 1400, 1126, 2072, 21359, 16383, 2293, 7329, 1105, 17549, 11082, 1114, 170, 3668, 1111, 1223, 128, 5860, 119, 13063, 3452, 6814, 117, 1122, 1110, 1932, 3589, 117, 1177, 3229, 170, 1363, 3205, 1106, 1243, 1199, 1250, 1694, 1112, 1128, 3940, 119, 2038, 1282, 1111, 5953, 192, 2053, 119, 5055, 1149, 1110, 1579, 1126, 5146, 1315, 119, 23114, 1122, 1149, 1191, 1128, 112, 1231, 1107, 1103, 6601, 1111, 1380, 1112, 1811, 1105, 1472, 119, 102, 0, 0, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, ...]"


### 数据抽样

使用 1000 个数据样本，在 BERT 上演示小规模训练（基于 Pytorch Trainer）

`shuffle()`函数会随机重新排列列的值。如果您希望对用于洗牌数据集的算法有更多控制，可以在此函数中指定generator参数来使用不同的numpy.random.Generator。

In [None]:
full_train_dataset = tokenized_datasets["train"].shuffle(seed=42)
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

## 微调训练配置

### 加载 BERT 模型

警告通知我们正在丢弃一些权重（`vocab_transform` 和 `vocab_layer_norm` 层），并随机初始化其他一些权重（`pre_classifier` 和 `classifier` 层）。在微调模型情况下是绝对正常的，因为我们正在删除用于预训练模型的掩码语言建模任务的头部，并用一个新的头部替换它，对于这个新头部，我们没有预训练的权重，所以库会警告我们在用它进行推理之前应该对这个模型进行微调，而这正是我们要做的事情。

In [None]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### 训练超参数（TrainingArguments）

完整配置参数与默认值：https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments

源代码定义：https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/training_args.py#L161

**最重要配置：模型权重保存路径(output_dir)**

In [None]:
from transformers import TrainingArguments,logging

model_dir = f"{WORK_HOME}/models/bert-base-cased-finetune-yelp"

logging.set_verbosity_info()

# logging_steps 默认值为500，根据我们的训练数据和步长，将其设置为100
training_args = TrainingArguments(output_dir=model_dir,
                                  resume_from_checkpoint=True,
                                  per_device_train_batch_size=16,
                                  num_train_epochs=5,
                                  logging_steps=100)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [None]:
# 完整的超参数配置
print(training_args)

TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_la

### 训练过程中的指标评估（Evaluate)

**[Hugging Face Evaluate 库](https://huggingface.co/docs/evaluate/index)** 支持使用一行代码，获得数十种不同领域（自然语言处理、计算机视觉、强化学习等）的评估方法。 当前支持 **完整评估指标：https://huggingface.co/evaluate-metric**

训练器（Trainer）在训练过程中不会自动评估模型性能。因此，我们需要向训练器传递一个函数来计算和报告指标。

Evaluate库提供了一个简单的准确率函数，您可以使用`evaluate.load`函数加载

In [None]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")


接着，调用 `compute` 函数来计算预测的准确率。

在将预测传递给 compute 函数之前，我们需要将 logits 转换为预测值（**所有Transformers 模型都返回 logits**）。

In [None]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

#### 训练过程指标监控

通常，为了监控训练过程中的评估指标变化，我们可以在`TrainingArguments`指定`evaluation_strategy`参数，以便在 epoch 结束时报告评估指标。

In [None]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir=model_dir,
                                  save_total_limit=2,
                                  evaluation_strategy="epoch",
                                  per_device_train_batch_size=16,
                                  # per_device_eval_batch_size=32,
                                  num_train_epochs=3,
                                  logging_steps=5000)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


## 开始训练

### 实例化训练器（Trainer）

`kernel version` 版本问题：暂不影响本示例代码运行

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    # train_dataset=small_train_dataset,
    train_dataset=full_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


## 使用 nvidia-smi 查看 GPU 使用

为了实时查看GPU使用情况，可以使用 `watch` 指令实现轮询：`watch -n 1 nvidia-smi`:

```shell
Every 1.0s: nvidia-smi                                       a4ab7d6551f4: Sat Apr  6 16:07:06 2024

Sat Apr  6 16:07:06 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   77C    P0              61W /  70W |  11843MiB / 15360MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
```

In [None]:
# trainer.train()
trainer.train(resume_from_checkpoint=True)

Loading model from /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/checkpoint-116500.
The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 650,000
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 121,875
  Number of trainable parameters = 108,314,117
  Continuing training from checkpoint, will skip to saved global_step
  Continuing training from epoch 2
  Continuing training from global step 116500
  Will skip the first 2 epochs then the first 35250 batches in the first epoch.


Epoch,Training Loss,Validation Loss,Accuracy
3,0.5993,0.727575,0.71


Saving model checkpoint to /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/tmp-checkpoint-117000
Configuration saved in /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/tmp-checkpoint-117000/config.json
Model weights saved in /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/tmp-checkpoint-117000/model.safetensors
Deleting older checkpoint [/content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/checkpoint-116000] due to args.save_total_limit
Saving model checkpoint to /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/tmp-checkpoint-117500
Configuration saved in /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/tmp-checkpoint-117500/config.json
Model weights saved in /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/tmp-checkpoint-117500/model.safetensors
Deleting older checkpoint [/content/drive/MyDrive/Colab

TrainOutput(global_step=121875, training_loss=0.0264668609775641, metrics={'train_runtime': 8164.7464, 'train_samples_per_second': 238.832, 'train_steps_per_second': 14.927, 'total_flos': 5.1345926792994816e+17, 'train_loss': 0.0264668609775641, 'epoch': 3.0})

In [None]:
small_test_dataset = tokenized_datasets["test"].shuffle(seed=64).select(range(100))

In [None]:
trainer.evaluate(small_test_dataset)

The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 100
  Batch size = 8


{'eval_loss': 0.9230320453643799,
 'eval_accuracy': 0.6,
 'eval_runtime': 5.378,
 'eval_samples_per_second': 18.594,
 'eval_steps_per_second': 2.417,
 'epoch': 3.0}

### 保存模型和训练状态

- 使用 `trainer.save_model` 方法保存模型，后续可以通过 from_pretrained() 方法重新加载
- 使用 `trainer.save_state` 方法保存训练状态

In [None]:
trainer.save_model(model_dir)

Saving model checkpoint to /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp
Configuration saved in /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/config.json
Model weights saved in /content/drive/MyDrive/Colab Notebooks/models/bert-base-cased-finetune-yelp/model.safetensors


In [None]:
trainer.save_state()

In [None]:
# trainer.model.save_pretrained("./")