# Hugging Face Transformers 微调训练入门

本示例将介绍基于 Transformers 实现模型微调训练的主要流程，包括：
- 数据集下载
- 数据预处理
- 训练超参数配置
- 训练评估指标设置
- 训练器基本介绍
- 实战训练
- 模型保存

## 下载数据集

In [18]:
from datasets import load_dataset

dataset=load_dataset("yelp_review_full")

In [19]:
dataset

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})

## 检查数据

In [20]:
import random
import pandas as pd
import datasets
from IPython.display import display, HTML
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [21]:
show_random_elements(dataset['train'])

Unnamed: 0,label,text
0,3 stars,"If you go to Cave Creek you must go to the Buffalo Chip, Harold's , hideaway etc. just great local restaurants in a great town north of Phoenix/Scottsdale. I like the Chip more for drinking dancing and rodeo than the food. But it is okay for a little grub But it is fun. So try it"
1,3 stars,"Chuy's was pretty good. Seems like all of their locations are pretty similar. I think they'd be best known for their cheap margaritas... $2 pints, $4 small pitchers. They aren't too strong and come from a mediocre mix, but if you're looking for something sweet and cheap its a good bet.\n\nFree serve-yourself chips and salsa are pretty good. \n\nAs for food... pretty decent prices and okay to good food. I wouldn't say its authentic Mexican, actually I think the menu is a bit confused. But the food is good overall. It's a place we'll go to every month or so."
2,3 stars,"I ended up eating at Taggia while staying at Firesky so it was a choice of convenience. I've had the food from here several times using room service and it's never anything to complain about. It was the same story the day I had lunch here. I had an organic greens salad and shared the margherita and goat cheese pizzas with my fellow lunchers. All of the food was good - the goat cheese pizza in particular with its thin, crispy crust.\n\nUnfortunately the day we ate here our service was MIA. We were told we could seat ourselves so we did. After about 10 minutes someone came by to take our drink order and maybe 10 minutes later our waters arrived. Well 2 out of 3 of them did anyway. Then we ordered two salads and two pizzas to share. One pizza came first. WTH? Where were the salads? Or the other pizza? The salads showed up a few minutes later and then our server realized that she had forgotten our second pizza. No biggie since we had salads and one pizza to eat. But the service was lackluster with a L. Like Andrea R says, I wouldn't go out of my way to eat here, but when in the area it's a good option to have."
3,2 star,"I recently had a work luncheon at Ricardo's, I had been before years ago and it was extremely unmemorable. This visit would be more memorable but for the wrong reasons. \n\nWhen given the choice, I prefer to order off the menu than choose a buffet. But the whole group went to the buffet and I didn't want to be the oddball. I had two carne asada tacos, cheese enchilada and chips & salsa. The enchilada was bland the only hint of flavor was the acidity from the tomatoes. The salsa, too, was bland and watery. The chips were pretty generic. The first taco was ok, a bit bland, but tender. The second was filled with grizzly meat. It really turned my stomach. Fortunately, the service was friendly and they were able to accomodate our large group."
4,4 stars,"We had a great time at this resort over the long weekend. The staff was super friendly, especially Adam, David and Cassie. Great job!!! And our suite was perfect to accommodate three women with lots of bags, make-up and shoes. The Hole in the Wall restaurant had a really good breakfast, friendly staff and an outdoors patio. Not so for the Rico Restaurant. They were a bit rude, overwhelmed and obviously didn't want our business. We also floated down the Lazy River, it was definitely Lazy...pretty slow but perfect temp. All in all, I'll be back."
5,1 star,"Im an owner with no kids, this place is not for my husband and I.. The element here is all about families and cooking in and playing in the pool from the moment it opens.\n\nThe restaurant bar is a bit of a joke, and the pressure to buy more points makes a relaxing vacation more stressful. We were an original owner and saw most of it built.\n\nWe noticed that they no longer offer a shuttle which is a mistake for those that want to go to the strip and not have to worry about driving. But after this weekend I see that they don't need to offer the shuttle because more than half the people there don't plan on leaving the facility at all.\n\nThe guests that we ran into all seemed to be there on a free vacation offer. They were tatted up and pushing baby strollers... and screaming to each other WAIT TIL YOU SEE THE POOL....\n\nMy hubby and myself both looked at each other and said OUT LOUD, we don't think we will be coming back here again ever to this location.\n\nWe came home and looked into selling it all together, but then thought maybe we would try another location that Diamond Resorts has to offer before we do so..\n\nSo bottom line, if you have kids and love the pool and slides and pool some more.. this is for you.. If your looking for a weekend with the hubby or friends in Vegas to relax and to enjoy what Vegas is all about... this resort is not for you.."
6,3 stars,"Booked a room here through Priceline for the Tuesday before Thanksgiving. Actually booked it on the drive in from Las Vegas through my cell phone, which was pretty sweet. Paid $25 + tax, so you can't beat the price. We had a hard time finding it as Google Maps was wrong about it's location, but you can't blame the hotel for that.\n\nWhat I can blame the hotel for is not giving me a king size bed. Priceline had booked me with 2 doubles, and in my experience I am always able to switch unless they're sold out. The front desk clerk told me they were indeed not sold out, but it was their policy not to let Priceline users switch rooms. So much for considering staying there the rest of Thanksgiving week.\n\nI was going to let it go, but then at 9:30am the maid knocked on our door and woke me up 2 hours earlier than I had planned (we got to sleep at 5am, give me a break). I ended up coming out of my room and saw that my do not disturb sign was on, so she must have chosen to ruin my day for fun. Tried to fall back asleep but was then kept up by the sounds of what looked to be a loud garbage truck parked right outside of our room.\n\nI give up on sleeping. At least they have solid free Internet so I can Yelp this hotel. Courtyard, you're lucky you're getting 3 stars from me."
7,2 star,"Been to 4 Cirques and this is the least favorite. Sets and costumes are absolutely amazing but the acts were very unimpressive compared to the older ones we've seen. \nThe only exception was the opening act with the two twin men on ropes that swung out into the audience. They weren't in the book at the shop so I think they added them in later to spice up the program. Breathtaking!\n\nPros- Art direction, sets, lights\n\nCons- Acts seen before and ANNOYING clowns"
8,2 star,"KOOLAID KID reminded me of home...nice touch..i know I know..this is suppose to be about the chicken & waffles, but I must say quenching my thirst is very important to me..so back to the food..it was just that chicken(no flavor) & waffles(nothing special)..mac & cheese was very nice...and the new building was very very nice..okay that's all"
9,1 star,Just called this location and I live 1.8 miles away. I asked them to deliver and they informed me that they would not deliver to my house because it was a couple hundred yards out of the map plan. They asked me to call the power and southern store. This store advised me that they could not deliver because jimmy johns has a two mile radius they can deliver to. Called this store back and they once again decided to tell me even though I was in the two mile radius they did not want to deliver to me and my only option was for pickup. I will never eat at this location. I know the owners at Firehouse Subs and they go out of the way and this location is just lazy. Not getting my money jimmy johns no matter how fast you are. Laziness is worse


## 数据预处理，给数据编码，统一长度

In [22]:
from transformers import AutoTokenizer
tokenizer=AutoTokenizer.from_pretrained("bert-base-cased")

def tokenize_fuc(example):
    return tokenizer(example['text'],padding="max_length",truncation=True)

tokenized_dataset=dataset.map(tokenize_fuc,batched=True)

: 

In [None]:
tokenized_dataset

DatasetDict({
    train: Dataset({
        features: ['label', 'text', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 50000
    })
})

In [None]:
show_random_elements(tokenized_dataset['train'],1)

Unnamed: 0,label,text,input_ids,token_type_ids,attention_mask
0,2 star,"Food is passable when you can get it, but items shown online are frequently\nunavailable. The employees don't speak enough English to say if or when they will ever be available and do not hide their annoyance that you asked. The used plate gatherers are aggressive and won't leave diners alone. One gorda laughed rudely when we asked her to leave us alone","[101, 6702, 1110, 2789, 1895, 1165, 1128, 1169, 1243, 1122, 117, 1133, 4454, 2602, 3294, 1132, 3933, 165, 22108, 15677, 8009, 2165, 119, 1109, 4570, 1274, 112, 189, 2936, 1536, 1483, 1106, 1474, 1191, 1137, 1165, 1152, 1209, 1518, 1129, 1907, 1105, 1202, 1136, 4750, 1147, 19236, 1115, 1128, 1455, 119, 1109, 1215, 4885, 8422, 1468, 1132, 9233, 1105, 1281, 112, 189, 1817, 20162, 1116, 2041, 119, 1448, 1301, 18484, 3348, 14708, 1193, 1165, 1195, 1455, 1123, 1106, 1817, 1366, 2041, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]"


## 抽取小部分数据

In [None]:
small_train_dataset=tokenized_dataset['train'].shuffle(seed=42).select(range(1000))
small_eval_dataset=tokenized_dataset['test'].shuffle(seed=42).select(range(1000))


## 加载模型

In [None]:
from transformers import AutoModelForSequenceClassification

model=AutoModelForSequenceClassification.from_pretrained('bert-base-cased',num_labels=5)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## 配置超参数

In [None]:
from transformers import TrainingArguments

model_dir=r"E:\model\language\fine-tuning\bert-base-cased-by-yelp"

training_arg=TrainingArguments(
    output_dir=model_dir,
    per_device_train_batch_size=4,
    num_train_epochs=1,
    logging_steps=100,
)

In [None]:
print(training_arg)

TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_la

### 训练过程中的指标评估（Evaluate)

**[Hugging Face Evaluate 库](https://huggingface.co/docs/evaluate/index)** 支持使用一行代码，获得数十种不同领域（自然语言处理、计算机视觉、强化学习等）的评估方法。 当前支持 **完整评估指标：https://huggingface.co/evaluate-metric**

训练器（Trainer）在训练过程中不会自动评估模型性能。因此，我们需要向训练器传递一个函数来计算和报告指标。 

Evaluate库提供了一个简单的准确率函数，您可以使用`evaluate.load`函数加载

In [None]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

In [None]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
from transformers import TrainingArguments, Trainer

training_arg = TrainingArguments(output_dir=model_dir,
                                  evaluation_strategy="epoch", 
                                  per_device_train_batch_size=2,
                                  num_train_epochs=2,
                                  logging_steps=30)

## 训练

In [None]:
trainer=Trainer(
    model=model,
    args=training_arg,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics
)

In [None]:
trainer.train()

  0%|          | 0/334 [00:00<?, ?it/s]

{'loss': 1.666, 'grad_norm': 9.737825393676758, 'learning_rate': 4.550898203592814e-05, 'epoch': 0.09}
{'loss': 1.6623, 'grad_norm': 13.326910972595215, 'learning_rate': 4.101796407185629e-05, 'epoch': 0.18}
{'loss': 1.6795, 'grad_norm': 11.941048622131348, 'learning_rate': 3.652694610778443e-05, 'epoch': 0.27}
{'loss': 1.6321, 'grad_norm': 8.127102851867676, 'learning_rate': 3.2035928143712576e-05, 'epoch': 0.36}
{'loss': 1.5899, 'grad_norm': 12.411124229431152, 'learning_rate': 2.754491017964072e-05, 'epoch': 0.45}
{'loss': 1.5403, 'grad_norm': 12.295513153076172, 'learning_rate': 2.3053892215568866e-05, 'epoch': 0.54}
{'loss': 1.3488, 'grad_norm': 18.60458755493164, 'learning_rate': 1.8562874251497005e-05, 'epoch': 0.63}
{'loss': 1.4204, 'grad_norm': 16.866500854492188, 'learning_rate': 1.407185628742515e-05, 'epoch': 0.72}
{'loss': 1.2988, 'grad_norm': 17.695011138916016, 'learning_rate': 9.580838323353295e-06, 'epoch': 0.81}
{'loss': 1.1339, 'grad_norm': 6.416609287261963, 'learni

  0%|          | 0/125 [00:00<?, ?it/s]

NameError: name 'predictions' is not defined

## 测试

In [None]:
t_dataset = tokenized_datasets["test"].shuffle(seed=64).select(range(100))

In [None]:
trainer.evaluate(small_test_dataset)

In [None]:
trainer.save_model(model_dir)

In [None]:
trainer.save_state()