<a href="https://colab.research.google.com/github/reven404/learning-ai-practice/blob/main/fine_tune_quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hugging Face Transformers 微调训练入门

本示例将介绍基于 Transformers 实现模型微调训练的主要流程，包括：
- 数据集下载
- 数据预处理
- 训练超参数配置
- 训练评估指标设置
- 训练器基本介绍
- 实战训练
- 模型保存

## YelpReviewFull 数据集

**Hugging Face 数据集：[ YelpReviewFull ](https://huggingface.co/datasets/yelp_review_full)**

### 数据集摘要

Yelp评论数据集包括来自Yelp的评论。它是从Yelp Dataset Challenge 2015数据中提取的。

### 支持的任务和排行榜
文本分类、情感分类：该数据集主要用于文本分类：给定文本，预测情感。

### 语言
这些评论主要以英语编写。

### 数据集结构

#### 数据实例
一个典型的数据点包括文本和相应的标签。

来自YelpReviewFull测试集的示例如下：

```json
{
    'label': 0,
    'text': 'I got \'new\' tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. \\nI took the tire over to Flynn\'s and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he\'d give me a new tire \\"this time\\". \\nI will never go back to Flynn\'s b/c of the way this guy treated me and the simple fact that they gave me a used tire!'
}
```

#### 数据字段

- 'text': 评论文本使用双引号（"）转义，任何内部双引号都通过2个双引号（""）转义。换行符使用反斜杠后跟一个 "n" 字符转义，即 "\n"。
- 'label': 对应于评论的分数（介于1和5之间）。

#### 数据拆分

Yelp评论完整星级数据集是通过随机选取每个1到5星评论的130,000个训练样本和10,000个测试样本构建的。总共有650,000个训练样本和50,000个测试样本。

## 下载数据集

In [1]:
!pip install torch>=2.1.2 transformers ffmpeg ffmpeg-python timm datasets evaluate scikit-learn pandas peft accelerate autoawq optimum auto-gptq bitsandbytes>0.39.0 jiwer soundfile>=0.12.1 librosa langchain gradio trl

In [2]:
from datasets import load_dataset

dataset = load_dataset("yelp_review_full")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/6.72k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/299M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/23.5M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/650000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [3]:
dataset

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})

In [4]:
dataset["train"][1234]

{'label': 2,
 'text': 'Having lived near this Atria\'s location in the past, I can say that I\'ve spent more than my fair share of time at the PNC Park Atria\'s. My wife and I stopped in this past week, and for me, I was reminded of why I enjoy Atria\'s...but this having been our first time at Atria\'s together, and witnessing my wife\'s experience, I realized why the place could garner such low reviews.\\n\\nFor someone like myself, who\'s actually had enough positive and negative food experiences at Atria\'s to know exactly what to order and what to steer clear of, I can make the experience a positive one, but that\'s a situation I\'m afforded now thanks to my previous patience and convenience. My wife, opting to stray from my suggestions had an awful dinner. Her plate of bland fish and asparagus left her turned off completely. Luckily, we were using a gift card, so it wasn\'t as frustrating for her as it could\'ve been. Regardless, walking away from an entire plate of food is dishea

In [5]:
import random
import pandas as pd
import datasets
from IPython.display import display, HTML

In [6]:
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset) - 1)
        while pick in picks:
            pick = random.randint(0, len(dataset) - 1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [7]:
show_random_elements(dataset["train"])

Unnamed: 0,label,text
0,2 star,Don't believe the hype. Started off with the lobster bisque soup of which the lobster was chewy. My friend had the heirloom and barrata salad and it was fairly decent. We shared the ribeye which was a bit dry but bloody if that can even happen. As a side we had the lobster mashed potatoes that had pieces of the lobster shell mixed in which brutalized my gums. We also had the saut\u00e9ed asparagus that never made it to our table but we can only imagine what that would have tasted like. \n\nAs far as service our waiter was good and the team that assisted him was good as well. I was disappointed that I had to box my own food for the price I paid but hey it is what it is.
1,4 stars,"Fellow Yelpers,\n\nI had the pleasure of eating here for the first time a couple of years ago and recently had the chance to re-visit this place. It is still as good as it was back then. Consistency is a major plus for me, especially when talking about the quality of food in a restaurant. \n\nAll this talk of a Greek default as of late put me in the mood for their amazing cuisine once again. The location of this restaurant is great, tucked just far enough away from uptown where it is rather laid back yet you can still feel as if you are in the middle of Tryon's main drag.\n\nThe interior is spacious and decorated very well. The kitchen is more or less open and can be seen from the main dining area (see my photo). There is a partition separating the dining area from a smaller one where the bar is located. The bar has an awesome setup (see my photo) and is a good spot to just sit down and enjoy a drink or Greek beer.\n\nOnto the food, I ordered the Feta Cheese and Olives appetizer (see my photo) which was very delicious. The cheese was warm but not melted which led to a perfect texture. I enjoyed the liberal way in which olive oil was poured over the cheese much like the way I pour syrup on my pancakes. I shoot for a mild infarction every time.\n\nI ordered the Filet Mignon Kebab with Herb Rice (see my photo) which was quite good. The Filet was tender and cooked at the right temperature. Everything else combined for a great meal. I would recommend this joint for anyone wanting to try a great Greek place in the Charlotte area. \n\nIf a Greek default ever meant the closing of all Greek restaurants around the world, I would personally call Ben Bernanke and ask him to use his Houdini tricks and magically digitize billions of Dollars to fund their operations here in the US. Although this would go against all my economic beliefs, damn it I love Greek food. Go to Greek Isles and get lost in this Mediterranean sea of palatable goodies.\n\nAnt\u00edo sas!"
2,2 star,"The food and atmosphere were terrific but the service was pretty poor. It really ruined the whole experience for us. The server took a long time to even take our order, refused to split checks or even allow us to use multiple credit cards, had other people deliver our food, never refilled our drinks and was anxious to get us out of there. The place was only about half full and this girl claimed to have worked there for a while but obviously was not interested in doing a good job. For the money, I'd go someplace that actually wanted our business."
3,1 star,"Definitely the WORST hotel on the strip! If you want rude front desk people checking you in, wrong reservations, overflowing toilets, unmade beds and 5am wake up calls with construction workers drilling away outside your room...then this is the place to stay!! \n\nOVERALL WORST EXPERIENCE OF MY LIFE! I have been to Vegas 100s of times and never have I been so disgusted by the \""customer service\"" or lack thereof. And to top it off, when I tried to complain, the manager Jose (who was so rude) gave us a measly comp at the cafe! What an INSULT! It took over 2.5 months to finally get a hold of another manager (Darryl) who ended up being even worse than Jose. \n\nAfter spending over $450 for 4 nights and enduring 2 unbearably loud mornings waking up to pounding drilling noises, one of these managers should have adequately compensated us for ALL THE INCONVENIENCE!!! The Caesars Acquisition Company (which owns Bally's, Harrah's, Planet Hollywood, Paris, Flamingo..etc) SHOULD BE ASHAMED OF THEMSELVES!!! At least learn how to train your management at Bally's to be a little accommodating!!!!\n\nCorporate offices will definitely be hearing from me soon! Attached with a video clip of the horrendous construction noises!"
4,4 stars,"Yeah, there's a reason the hotcakes are so popular. \n\nThey're really, really good. They're not as thick as traditional pancakes and the edges are thin and crisp, like lacy crepes. I ordered mine with a breakfast (as I failed to eat dinner the night before - I was on a plane, and then I was too tired), with strawberry topping, and it was SO DELICIOUS. If I had spent a few more days, I'd probably go again, but I'm sure this would be on my list for a return trip. \n\nThe rest of the breakfast was normal diner fare. I have no complaints, but it wasn't particularly memorable. The sausage was greasy but good. The coffee was subpar, though."
5,2 star,"Mediterranean tofu scramble - warning: it has a lot of liquid from the tofu. It's more of an entree then a breakfast scramble. It seems like a basic toss with salt and pepper \nChicken and waffle - not crunchy! Soggy tofu with the \""crunch\"" layer falling apart and off. The waffle was also very soggy. \nOM burger - best of the three entrees but still not that great. The patty was falling apart and was again, very soggy. I loved the sesame bun though. It was crisp, soft and covered with sesame! \n\nSides:\nKale chips - plain and simple oil with salt \nLocal vegetables - eggplant, sweet potato, onion and pepper."
6,3 stars,"A fun and bumping place to be at.\n\nI didn't even think I was going to be able to do the annual pilgrimage to Vegas this year. But thanks to overbooked flights on Southwest in Seattle and me sweet talking them. I had a Saturday night stay in Vegas on them. Fantastic!!!! \n\nI've not been to Vegas since City Center and Aria open, so this was my chance to check it out. I didn't actually stayed here. This is more about the casino.\n\nI did a brief walk around Crystals (looks nice) because I didn't have time to waste. On to the craps table. \n\nFrom what I can tell it's nice and modern but then again...you should expect this from a casino that's new. Too much brown for my taste but to each their own. \n\nMet these guys from Denver, who live in different parts of the country now and they do an annual trip to Vegas. We played craps from 10pm-5am. More than fantastic!!! Dealers are nice and courtesy. Drink service was timely. \n\nA good variety of people. The usual trendy restaurants and eye candy at the bars/clubs. \n\nI know nothing about their rooms. But one of my best friend has stayed here and he likes it. Chances are I will too.\n\nNice place. Next time I'm back I will consider staying here."
7,4 stars,"This is has been on my list for a while, so it was one of my first choices for Spring Restaurant Week this year. I came here for lunch during the week with fellow Yelpers and we took advantage of their $30.14 prie fixe menu (Three Square $4 donation). I also utilized Open Table to make my reservation.\n\nThe three of us started with the Vine-Ripened Heirloom Tomato Salad for our appetizer. The fresh heirloom tomatoes (2-3 different types/colors) were served with a dollop of creamy Burrata cheese, red onion, extra virgin olive oil, balsamic vinegar, and fresh local basil. Really crisp and refreshing; great start to our meal.\n\nFor the entr\u00e9e, I selected the Delmonico's Prime Hamburger served on a toasted brioche roll with cheddar cheese, crispy bacon, and the usual fixings (lettuce, tomato, onion). It was perfectly medium rare with juices oozing out with every squeeze. The burger had great flavor and seasoning, so that made it stand out from other burgers I've had. It was also an extremely generous portion. I could barely finish half of it (and I eat A LOT). My only complaint was that although the burger started out juicy and delicious, it dried out really quickly and made it almost difficult to eat. I'm not sure what attributed to this, but it was odd. The fries were delicious! A unique preparation of crispiness.\n\nI also had the opportunity to taste the Butternut Squash Ravioli prepared with sage brown butter, toasted hazelnuts, and parmesan cheese and Pan Roasted Atlantic Salmon served with a sweet corn and crab vinaigrette and fresh basil. Both were really delicious! The salmon was so soft with a succulent rareness in the center of the filet. I was also really impressed with the large chunks of crab meat and whole kernels of corn.\n\nFor dessert, I opted for the White Chocolate Macadamia Nut Bread Pudding with Vanilla Bean Cr\u00e8me Anglaise. I'm glad someone else chose Emeril's Banana Cream Pie because it was a tough decision for me (both being my favorite desserts). The bread pudding was moist, warm, and very yummy. The pecan caramel-like sauce was a bit sweet, but fortunately, the pudding was not. The banana cream pie was topped with caramel sauce, chocolate shavings, and whipped cream. Yum. It was good, but I'll have to order one in the future to get a really good taste of it.\n\nThe restaurant is nice and I like the dimly lit rooms with spotlights on each table. It wasn't very busy during our meal, but I'm sure it would still be that nice and quiet. The service was also phenomenal. I was extremely pleased with the offering of Harney & Sons tea (transported in the most gorgeous wooden box display) and the fact that it was complimentary. Everyone was so prompt to refill my pot with hot water and of course, it was served with fresh lemon and honey. We also stayed past their lunch service and they were so kind to allow us to loiter for that long."
8,5 stars,Amazing food and great service!! The food was fresh and very tasty. We will definitely be back to this fantastic restaurant. Such a hidden gem! Support the locally owned restaurants :)
9,3 stars,"U-Swirl was one of the first frozen yogurt places to pop up in the Anthem area. For a while it's only competitor was Golden Spoon. Now, with the surplus of fro-yo shops, it is often forgotten about. I do prefer Yogurtland to USwirl, but I gave this place another try out of convenience. \n\nMy friends and I were looking through Yelp for a new joint, but were met by closed doors. By the time we got up the hill, it was either USwirl or Golden Spoon. At least this place lets you customize your own instead of having an employee make it for you. I never liked the game of \""will I get the person who is generous with the toppings?\"". This way, you know that you are to blame if your cup is expensive.\n\nCake flavors are the rage this season. Everywhere has cake batter, red velvet, or devil's food now. No different here, except I liked there version of the first two better than anywhere else. It is not sickeningly sweet. Yogurtland has a better selection of tarts and tropical flavors though. \n\nThe best part about USwirl is that they have passion, strawberry, and orange pearls. They look like large fish eggs and when you bite into them, they pop."


## 预处理数据

下载数据集到本地后，使用 Tokenizer 来处理文本，对于长度不等的输入数据，可以使用填充（padding）和截断（truncation）策略来处理。

Datasets 的 `map` 方法，支持一次性在整个数据集上应用预处理函数。

下面使用填充到最大长度的策略，处理整个数据集：

In [8]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


tokenized_datasets = dataset.map(tokenize_function, batched=True)

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Map:   0%|          | 0/650000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [9]:
show_random_elements(tokenized_datasets["train"], num_examples=1)

Unnamed: 0,label,text,input_ids,token_type_ids,attention_mask
0,5 stars,"Excellent experience all around. Food was very good and the service was equally very good. We had the monkey bread and lamb meatballs for appetizers (both wonderful), my wife had the ribs with pork belly while I had the cider-braised pot roast, both of which were tender and flavorful, \n\nWe will definitely be back and would highly recommend to anyone who enjoys great food with great service in a great atmosphere!","[101, 25764, 2541, 1155, 1213, 119, 6702, 1108, 1304, 1363, 1105, 1103, 1555, 1108, 7808, 1304, 1363, 119, 1284, 1125, 1103, 16019, 8162, 1105, 2495, 12913, 6092, 20088, 1111, 12647, 26883, 26542, 113, 1241, 7310, 114, 117, 1139, 1676, 1125, 1103, 10346, 1114, 19915, 7413, 1229, 146, 1125, 1103, 172, 18494, 118, 12418, 3673, 9814, 187, 20219, 117, 1241, 1104, 1134, 1127, 8886, 1105, 16852, 2365, 117, 165, 183, 165, 183, 2924, 1162, 1209, 5397, 1129, 1171, 1105, 1156, 3023, 18029, 1106, 2256, 1150, 16615, 1632, 2094, 1114, 1632, 1555, 1107, 170, 1632, 6814, 106, 102, 0, 0, 0, 0, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, ...]"


### 数据抽样

使用 1000 个数据样本，在 BERT 上演示小规模训练（基于 Pytorch Trainer）

`shuffle()`函数会随机重新排列列的值。如果您希望对用于洗牌数据集的算法有更多控制，可以在此函数中指定generator参数来使用不同的numpy.random.Generator。

In [10]:
full_train_dataset = tokenized_datasets["train"].shuffle(seed=42)
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

## 微调训练配置

### 加载 BERT 模型

警告通知我们正在丢弃一些权重（`vocab_transform` 和 `vocab_layer_norm` 层），并随机初始化其他一些权重（`pre_classifier` 和 `classifier` 层）。在微调模型情况下是绝对正常的，因为我们正在删除用于预训练模型的掩码语言建模任务的头部，并用一个新的头部替换它，对于这个新头部，我们没有预训练的权重，所以库会警告我们在用它进行推理之前应该对这个模型进行微调，而这正是我们要做的事情。

In [11]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### 训练超参数（TrainingArguments）

完整配置参数与默认值：https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments

源代码定义：https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/training_args.py#L161

**最重要配置：模型权重保存路径(output_dir)**

In [12]:
from transformers import TrainingArguments

model_dir = "models/bert-base-cased-finetune-yelp"

# logging_steps 默认值为500，根据我们的训练数据和步长，将其设置为100
training_args = TrainingArguments(output_dir=model_dir,
                                  per_device_train_batch_size=16,
                                  num_train_epochs=5,
                                  logging_steps=100)

In [13]:
# 完整的超参数配置
print(training_args)

TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_la

### 训练过程中的指标评估（Evaluate)

**[Hugging Face Evaluate 库](https://huggingface.co/docs/evaluate/index)** 支持使用一行代码，获得数十种不同领域（自然语言处理、计算机视觉、强化学习等）的评估方法。 当前支持 **完整评估指标：https://huggingface.co/evaluate-metric**

训练器（Trainer）在训练过程中不会自动评估模型性能。因此，我们需要向训练器传递一个函数来计算和报告指标。

Evaluate库提供了一个简单的准确率函数，您可以使用`evaluate.load`函数加载

In [14]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]


接着，调用 `compute` 函数来计算预测的准确率。

在将预测传递给 compute 函数之前，我们需要将 logits 转换为预测值（**所有Transformers 模型都返回 logits**）。

In [15]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

#### 训练过程指标监控

通常，为了监控训练过程中的评估指标变化，我们可以在`TrainingArguments`指定`evaluation_strategy`参数，以便在 epoch 结束时报告评估指标。

In [19]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir=model_dir,
                                  evaluation_strategy="epoch",
                                  per_device_train_batch_size=32,
                                  # per_device_eval_batch_size=32,
                                  num_train_epochs=1,
                                  logging_steps=30)

## 开始训练

### 实例化训练器（Trainer）

`kernel version` 版本问题：暂不影响本示例代码运行

In [20]:
trainer = Trainer(
    model=model,
    args=training_args,
    # train_dataset=small_train_dataset,
    train_dataset=full_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

## 使用 nvidia-smi 查看 GPU 使用

为了实时查看GPU使用情况，可以使用 `watch` 指令实现轮询：`watch -n 1 nvidia-smi`:

```shell
Every 1.0s: nvidia-smi                               bc3e50a81c78: Sun Mar 24 09:14:48 2024

Sun Mar 24 09:14:48 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0              46W / 400W |    891MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
```

In [21]:
trainer.train()

Epoch,Training Loss,Validation Loss


Epoch,Training Loss,Validation Loss,Accuracy
1,0.7405,0.677498,0.701


TrainOutput(global_step=20313, training_loss=0.7587748494286989, metrics={'train_runtime': 14497.1781, 'train_samples_per_second': 44.836, 'train_steps_per_second': 1.401, 'total_flos': 1.710267926016e+17, 'train_loss': 0.7587748494286989, 'epoch': 1.0})

In [22]:
small_test_dataset = tokenized_datasets["test"].shuffle(seed=64).select(range(100))

In [23]:
trainer.evaluate(small_test_dataset)

{'eval_loss': 0.8341051340103149,
 'eval_accuracy': 0.67,
 'eval_runtime': 0.9204,
 'eval_samples_per_second': 108.65,
 'eval_steps_per_second': 14.124,
 'epoch': 1.0}

### 保存模型和训练状态

- 使用 `trainer.save_model` 方法保存模型，后续可以通过 from_pretrained() 方法重新加载
- 使用 `trainer.save_state` 方法保存训练状态

In [24]:
trainer.save_model(model_dir)

In [25]:
trainer.save_state()

In [None]:
# trainer.model.save_pretrained("./")

## Homework: 使用完整的 YelpReviewFull 数据集训练，看 Acc 最高能到多少