# Hugging Face Transformers 微调训练入门

本示例将介绍基于 Transformers 实现模型微调训练的主要流程，包括：
- 数据集下载
- 数据预处理
- 训练超参数配置
- 训练评估指标设置
- 训练器基本介绍
- 实战训练
- 模型保存

## YelpReviewFull 数据集

**Hugging Face 数据集：[ YelpReviewFull ](https://huggingface.co/datasets/yelp_review_full)**

### 数据集摘要

Yelp评论数据集包括来自Yelp的评论。它是从Yelp Dataset Challenge 2015数据中提取的。

### 支持的任务和排行榜
文本分类、情感分类：该数据集主要用于文本分类：给定文本，预测情感。

### 语言
这些评论主要以英语编写。

### 数据集结构

#### 数据实例
一个典型的数据点包括文本和相应的标签。

来自YelpReviewFull测试集的示例如下：

```json
{
    'label': 0,
    'text': 'I got \'new\' tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. \\nI took the tire over to Flynn\'s and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he\'d give me a new tire \\"this time\\". \\nI will never go back to Flynn\'s b/c of the way this guy treated me and the simple fact that they gave me a used tire!'
}
```

#### 数据字段

- 'text': 评论文本使用双引号（"）转义，任何内部双引号都通过2个双引号（""）转义。换行符使用反斜杠后跟一个 "n" 字符转义，即 "\n"。
- 'label': 对应于评论的分数（介于1和5之间）。

#### 数据拆分

Yelp评论完整星级数据集是通过随机选取每个1到5星评论的130,000个训练样本和10,000个测试样本构建的。总共有650,000个训练样本和50,000个测试样本。

## 下载数据集

In [1]:
from datasets import load_dataset

dataset = load_dataset("yelp_review_full")

  from .autonotebook import tqdm as notebook_tqdm
Downloading readme: 100%|██████████| 6.72k/6.72k [00:00<00:00, 23.3MB/s]
Downloading data: 100%|██████████| 299M/299M [00:24<00:00, 12.0MB/s] 
Downloading data: 100%|██████████| 23.5M/23.5M [00:01<00:00, 19.4MB/s]
Generating train split: 100%|██████████| 650000/650000 [00:01<00:00, 363627.46 examples/s]
Generating test split: 100%|██████████| 50000/50000 [00:00<00:00, 311312.27 examples/s]


In [2]:
dataset

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})

In [3]:
dataset["train"][100]

{'label': 0,
 'text': 'My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\\nThe cashier took my friends\'s order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited over five minutes for a gigantic order that included precisely one kid\'s meal. After watching two people who ordered after me be handed their food, I asked where mine was. The manager started yelling at the cashiers for \\"serving off their orders\\" when they didn\'t have their food. But neither cashier was anywhere near those controls, and the manager was the one serving food to customers and clearing the boards.\\nThe manager was rude when giving me my order. She didn\'t make sure that I had everything ON MY RECEIPT, and never even had the decency to apologize that I felt I was getting poor service.\\nI\'ve eaten at various McDonalds restaurants for over 30 years. 

In [4]:
import random
import pandas as pd
import datasets
from IPython.display import display, HTML

In [5]:
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [6]:
show_random_elements(dataset["train"])

Unnamed: 0,label,text
0,5 stars,"Loved it.\n\nI had a salad (boring I know). But it was amazing. Really fresh and delicious cheese (I have no idea what kind however), great guacamole, and a strong and wonderful 3330 margarita. \n\nThe waiters were fairly attentive and the food didn't take too long. I think we were there until their closing time or 9:30 or 10. The restaurant appeared to be very friendly to all types of families. The atmosphere was super comfortable but still had a decor and an appropriate level of mex restaurant ambiance.\n\nThe sign out front is the best part though! It's a great place to grab a meal and judging by the margarita probably an ok place to just have chips and drinks too!"
1,3 stars,"Me and my friends just came here for a short while walking the strip during the day. I was hoping there would be a lot of people, but it was kind of empty. Maybe it gets more crowded at night.\n\nAnyways, me and my friends were watching the live band, in which the singer asked my boyfriend to be spanked, which he accepted, haha. Also I treated my friend to a drink for his birthday ($11, pricey!) but it was made really good... I wish I knew the name of it. Anyways after telling the bartender it was my friend's birthday he gave him a crazy elevated shot.... I videotaped that moment for his future wife and employers.\n\nThen we continued on walking the strip..."
2,2 star,"Food is good. But as Ally K. mentioned, the service SUCKS... jus tlike any other Cheesecake.\n\nWe came here only for dessert and I guess our waitor thought we weren't worth his time. So he didn't come at all to check up on our desserts or anything. \n\nWe got the strawberry shortcake. They have this at Cheesecake as well, but it's a little different at GL because they have sugar on top of the shortbread.\n\nI also tried their strawberry smoothie. It has coconut in it. It's just like Jamba Juice strawberry's wild with coconut.\n\nTheir creme brulee comes with regular and chocolate. The chocolate one is VERY rich in chocolate. For 6.95, it's not bad for two large servings. \n\nJust don't expect great service and you'll be fine."
3,4 stars,"Bellagio is definitely the high end hotel on the Strip. It is indeed quite luxurious in the resort and in the room. The room was beautiful with a large entrance that you most often see in master bedrooms in multimillion dollar mansions. The bathroom was nice and large with a huge soak tub.\n\nWe splurge on a fountain view, but we were located in the side building not the main building. I'm sure that would have even more of an upgrade. It's nice that they reserve a channel to play the music while the fountain show is going on. The Internet is free and it's unlimited to the number of devices, which is great!\n\nThe lay out of the resort was nice. From the front desk to the elevators, there is the seasonal room where they change the design monthly. You'll also pass the spa, shops, and a coffee shop. Near the coffee shop, there are some tables with a couple of chairs for you to sit and enjoy your drink. They line the windows so you can see the well manicured grounds outside.\n\nThe elevators to the rooms to the side building is tucked away, which we appreciated so there isn't a crowd pooling in the middle of the walkway.\n\nThe one downside was that the check in line was really slow, but once you get up there, you get the staff member's full undivided attention.\n\nI would return if I was in the mood to splurge for a special occasion or something."
4,2 star,"Watching movies at SMG was great, but I will agree with the other reviewers about the bad service for dinner. My husband and I went here tonight and read the reviews ahead of time, so we knew what we were walking in to. We ordered the raspberry crush mojito from the menu before the show, and the bartender didn't know what it was. We ended up getting drinks with no raspberries or mint and not resembling mojitos at all. We ordered just popcorn and cheese fries for the show. The popcorn came right away, but the fries didn't show up for a while, and when they did, they weren't hot. We asked to send them back, but they never came. We asked them what happened and when they finally arrived they were hot, and the server apologized profusely and removed it from the bill. Sketchy service at best and I am guessing this is due to not having dedicated servers, and rather a pool of servers running about.i will mention the cheese fries were good once they came out hot."
5,4 stars,"I have to admit, when I was trying to find a place to eat for a nicer dinner on our last trip to Vegas, I felt that finally being able to eat at a Thomas Keller restaurant would be putting my name on the fast track to earning some hardcore foodie street cred. Granted, it wasn't Per Se or the French Laundry, but one has to start somewhere.\n\nWe had a reservation for a Sunday evening, late enough that we were hungry but not so late that we still couldn't enjoy a later show. We were seated promptly, though strangely we were not greeted for several minutes. The cocktail list is pretty on-par in terms of Vegas pricing, and considering that these are actually well made drinks, as opposed to watered down slushies in plastic cups, this was a welcome change. I do wish that they had more wine offerings under $50 a bottle, because there were essentially none in this price range, however my Bouchon Cocktail was dangerous and fresh, and my husband's Manhattan was excellent.\n\nMy husband ordered the steak and frites, and I ordered the scallop special for the evening. The couple dining next to us felt their steak was tough, and although my husband didn't find it tough, he did feel that it was cooked more than he was expecting. The frites, by the way, tasted exactly like McDonald's fries. Not a bad thing, but interesting.\n\nMy scallop special was good, though not outstanding: the scallops were well seared and obviously cooked well. The dish came with tomatoes, eggplant and zucchini, which were cooked amazingly, considering I do not often like 2 of those vegetables. The dish also came with some incredibly unremarkable shrimp, whose purpose I couldn't identify, and a savory herbs de provence pain perdu, which was heavenly. I would have been very happy with just the seared scallops, the pain perdu and the vegetables. The shrimp, as well as the broth the dish sat in, didn't say much to me.\n\nFor dessert, my husband ordered the chocolate and burnt orange dessert, which we both enjoyed. I got one of the cheaper wines by the glass, and was incredibly pleased with it. I'm thankful and happy we were able to eat here, but would definitely choose somewhere new the next time I visit."
6,4 stars,"The beer is AMAZING, especially the Red Roover. I would recommend any beer lover to stop by. The only reason I did not give them a 5 is due to lack of food. I know they provide free popcorn and do schedule food trucks, but it would be an even better location with some bar food to pair with their delicious beer."
7,3 stars,Went there a couple weeks ago for a business lunch and am catching-up on my reviews. I have eaten at this location many times and have never had a bad meal. It seems like the portions have gotten smaller though. It is always jammed for lunch but usually you are seated quickly. The food is good and I especially like the pica de gallo (salsa). The fact that they have been in business for so many years shows that they know what they are doing. The prices are very reasonable.
8,3 stars,"Went there to try their happy hour small plates menu. Didn't get past the bar so that's all I can comment on. Lively. Tried the crabcakes (excellent), lobster salad roll (pretty good) and fried oysters (excellent.) Side of fries good but too expensive. Service was a bit slow and the drinks expensive for a happy hour."
9,1 star,"Bought an Audi TT with low miles and went to Yelp to see who could give it an inspection and tune up. Exklusiv showed up on top and so I went. I say I went because showing up is the ONLY way to get service. Phones not answered. Vmails and Emails not returned. I drive all the way there to schedule an appointment for a recommended Oil Change, Transmission Fluid Flush, Brake Fluid Flush, 4 spark plugs, and it turns out I need new brake pads and rotors so I agree...$1100 which I am happy to pay for great car service. I pick up the car and a couple of days later there's a foul smell coming form the engine AND an oil slick in my garage. Transmission fluid leak. I call to get Derek to help BUT no phone, vmail or email answered so I drive over, in a different car. Derek tells me to bring it in, looks at it and finds that a seal failed. He replaces it and we're all good BUT there's fluid all over the under-carriage. He pulls out a pressure washer to clean my car but that's not working either so he dabs it with paper towels. Car stinks for a couple of days and we're good. I ask him if I need more fluid since a lot has gushed out and he says no. So I ask him if it was overfilled in the first place, only makes sense right? He says no, go figure...While I'm there I tell him that my brake pedal is REALLY low, especially given that I now have brand new brakes all around AND new brake fluid. He tells me that Audi brake pedals that go virtually to the floor before braking is \""normal\"". Anyone who has ever driven a car knows that's not normal but I just want out at this point. I go to a tire shop to buy new tires a few days later and ask them about the brakes and in 5 minutes they tell me my master brake cylinder needs to be replaced. They replace it and brakes are fine, pedal is nice and high and firm. The tire shop tells me they also replaced my brake pads because the ones in place were the wrong part # and were moving around in the caliper causing a noisy and possibly dangerous situation. Unbelievable, Derek installed the wrong brake pads on my car!!!\n\nSo I give Derek at Exklusiv $1100 for Transmission, Oil change and brakes and he royally screws up 2 out of 3. I figure $1100 is a lot for an oil change so I go and check my oil just for fun. As I pull on the plastic ring of the dipstick to pull it out it comes without the dipstick. Yep he also broke my dipstick and glued it back together with cheap glue. So now I have to spend $40 for a new Audi dipstick.\n\nRecap: Bad/zero communication - Oil leak - Oil mess - lying about brake pedal feel - missing broken master cylinder - wrong brake pads - broken dipstick.\nIf you still want to take a chance on this place good luck getting hold of anyone to get an appointment!"


## 预处理数据

下载数据集到本地后，使用 Tokenizer 来处理文本，对于长度不等的输入数据，可以使用填充（padding）和截断（truncation）策略来处理。

Datasets 的 `map` 方法，支持一次性在整个数据集上应用预处理函数。

下面使用填充到最大长度的策略，处理整个数据集：

In [7]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


tokenized_datasets = dataset.map(tokenize_function, batched=True)

tokenizer_config.json: 100%|██████████| 29.0/29.0 [00:00<00:00, 139kB/s]
config.json: 100%|██████████| 570/570 [00:00<00:00, 4.00MB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.14MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 771kB/s]
Map: 100%|██████████| 650000/650000 [03:25<00:00, 3167.69 examples/s]
Map: 100%|██████████| 50000/50000 [00:16<00:00, 2977.15 examples/s]


In [8]:
show_random_elements(tokenized_datasets["train"], num_examples=1)

Unnamed: 0,label,text,input_ids,token_type_ids,attention_mask
0,1 star,"This Gamestop is a joke.\n\nI bought a new game from this place, but they only had an open-case game, so they put a sticker on it to tell if the case was opened or not. I decided that I didn't want the game without even opening it. I made sure the sticker was intact.\n\nI go back to return the game, they open it right in front of my face, and they tell me that I can't return it because the sticker was ripped. I tell them this is a joke, and I did not open it and they did. They still me after a couple minutes of arguing that I cannot return it, only trade it in. I got tired of arguing and I just traded it in. I did not want to deal with them. I walked out of the store disappointed.\n\nMoral of this story: never shop at this Gamestop, ever.","[101, 1188, 2957, 9870, 1110, 170, 8155, 119, 165, 183, 165, 183, 2240, 3306, 170, 1207, 1342, 1121, 1142, 1282, 117, 1133, 1152, 1178, 1125, 1126, 1501, 118, 1692, 1342, 117, 1177, 1152, 1508, 170, 6166, 1200, 1113, 1122, 1106, 1587, 1191, 1103, 1692, 1108, 1533, 1137, 1136, 119, 146, 1879, 1115, 146, 1238, 112, 189, 1328, 1103, 1342, 1443, 1256, 2280, 1122, 119, 146, 1189, 1612, 1103, 6166, 1200, 1108, 9964, 119, 165, 183, 165, 183, 2240, 1301, 1171, 1106, 1862, 1103, 1342, 117, 1152, 1501, 1122, 1268, 1107, 1524, 1104, 1139, 1339, 117, 1105, 1152, 1587, 1143, 1115, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]"


### 数据抽样

使用 1000 个数据样本，在 BERT 上演示小规模训练（基于 Pytorch Trainer）

`shuffle()`函数会随机重新排列列的值。如果您希望对用于洗牌数据集的算法有更多控制，可以在此函数中指定generator参数来使用不同的numpy.random.Generator。

In [9]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(6500))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(500))

## 微调训练配置

### 加载 BERT 模型

警告通知我们正在丢弃一些权重（`vocab_transform` 和 `vocab_layer_norm` 层），并随机初始化其他一些权重（`pre_classifier` 和 `classifier` 层）。在微调模型情况下是绝对正常的，因为我们正在删除用于预训练模型的掩码语言建模任务的头部，并用一个新的头部替换它，对于这个新头部，我们没有预训练的权重，所以库会警告我们在用它进行推理之前应该对这个模型进行微调，而这正是我们要做的事情。

In [10]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

model.safetensors: 100%|██████████| 436M/436M [00:33<00:00, 12.9MB/s] 
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### 训练超参数（TrainingArguments）

完整配置参数与默认值：https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments

源代码定义：https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/training_args.py#L161

**最重要配置：模型权重保存路径(output_dir)**

In [11]:
from transformers import TrainingArguments

model_dir = "models/bert-base-cased-finetune-yelp"

# logging_steps 默认值为500，根据我们的训练数据和步长，将其设置为100
training_args = TrainingArguments(output_dir=model_dir,
                                  per_device_train_batch_size=16,
                                  num_train_epochs=5,
                                  logging_steps=100)

In [12]:
# 完整的超参数配置
print(training_args)

TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=

### 训练过程中的指标评估（Evaluate)

**[Hugging Face Evaluate 库](https://huggingface.co/docs/evaluate/index)** 支持使用一行代码，获得数十种不同领域（自然语言处理、计算机视觉、强化学习等）的评估方法。 当前支持 **完整评估指标：https://huggingface.co/evaluate-metric**

训练器（Trainer）在训练过程中不会自动评估模型性能。因此，我们需要向训练器传递一个函数来计算和报告指标。 

Evaluate库提供了一个简单的准确率函数，您可以使用`evaluate.load`函数加载

In [13]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

2024-01-29 03:06:38.926566: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-29 03:06:38.926617: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-29 03:06:38.928228: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-29 03:06:38.937079: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Downloading builder script: 100%|██████████| 


接着，调用 `compute` 函数来计算预测的准确率。

在将预测传递给 compute 函数之前，我们需要将 logits 转换为预测值（**所有Transformers 模型都返回 logits**）。

In [14]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

#### 训练过程指标监控

通常，为了监控训练过程中的评估指标变化，我们可以在`TrainingArguments`指定`evaluation_strategy`参数，以便在 epoch 结束时报告评估指标。

In [15]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir=model_dir,
                                  evaluation_strategy="epoch", 
                                  per_device_train_batch_size=16,
                                  num_train_epochs=3,
                                  logging_steps=30)

## 开始训练

### 实例化训练器（Trainer）

`kernel version` 版本问题：暂不影响本示例代码运行

In [16]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

## 使用 nvidia-smi 查看 GPU 使用

为了实时查看GPU使用情况，可以使用 `watch` 指令实现轮询：`watch -n 1 nvidia-smi`:

```shell
Every 1.0s: nvidia-smi                                                   Wed Dec 20 14:37:41 2023

Wed Dec 20 14:37:41 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:0D.0 Off |                    0 |
| N/A   64C    P0              69W /  70W |   6665MiB / 15360MiB |     98%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     18395      C   /root/miniconda3/bin/python                6660MiB |
+---------------------------------------------------------------------------------------+
```

In [17]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,1.0354,0.962203,0.564
2,0.6928,0.931731,0.572
3,0.4215,1.094285,0.612


TrainOutput(global_step=1221, training_loss=0.7741010347221056, metrics={'train_runtime': 586.8279, 'train_samples_per_second': 33.23, 'train_steps_per_second': 2.081, 'total_flos': 5130803778048000.0, 'train_loss': 0.7741010347221056, 'epoch': 3.0})

In [18]:
small_test_dataset = tokenized_datasets["test"].shuffle(seed=64).select(range(1000))

In [19]:
trainer.evaluate(small_test_dataset)

{'eval_loss': 1.1566851139068604,
 'eval_accuracy': 0.593,
 'eval_runtime': 10.4592,
 'eval_samples_per_second': 95.61,
 'eval_steps_per_second': 11.951,
 'epoch': 3.0}

### 保存模型和训练状态

- 使用 `trainer.save_model` 方法保存模型，后续可以通过 from_pretrained() 方法重新加载
- 使用 `trainer.save_state` 方法保存训练状态

In [20]:
trainer.save_model(model_dir)

In [21]:
trainer.save_state()

In [None]:
# trainer.model.save_pretrained("./")

## Homework: 使用完整的 YelpReviewFull 数据集训练，看 Acc 最高能到多少

In [22]:
# 使用完整数据集进行训练
train_dataset = tokenized_datasets["train"].shuffle(seed=42)
eval_dataset = tokenized_datasets["test"].shuffle(seed=42)

In [23]:
training_args = TrainingArguments(output_dir=model_dir,
                                  evaluation_strategy="steps", 
                                  per_device_train_batch_size=48,
                                  num_train_epochs=1,
                                  logging_steps=1000,
                                  save_strategy="no")

In [24]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

In [25]:
trainer.train()

Step,Training Loss,Validation Loss,Accuracy
1000,0.8835,0.825288,0.63172
2000,0.8159,0.782463,0.6582
3000,0.7885,0.775746,0.65878
4000,0.7671,0.76206,0.66452
5000,0.758,0.743907,0.67036
6000,0.7524,0.741269,0.67304
7000,0.737,0.724167,0.67962
8000,0.7295,0.716843,0.68332
9000,0.7215,0.729863,0.67796
10000,0.7082,0.708954,0.68738


TrainOutput(global_step=13542, training_loss=0.7488633543010864, metrics={'train_runtime': 24862.3949, 'train_samples_per_second': 26.144, 'train_steps_per_second': 0.545, 'total_flos': 1.710267926016e+17, 'train_loss': 0.7488633543010864, 'epoch': 1.0})

In [26]:
test_dataset = tokenized_datasets["test"].shuffle(seed=64)

In [27]:
trainer.evaluate(test_dataset)

{'eval_loss': 0.6902426481246948,
 'eval_accuracy': 0.69404,
 'eval_runtime': 525.1144,
 'eval_samples_per_second': 95.217,
 'eval_steps_per_second': 11.902,
 'epoch': 1.0}

In [28]:
trainer.save_model(model_dir)

In [29]:
trainer.save_state()