# 1. Pip install packages

In [1]:
# Transformers installation
# In huggingface: ! pip install transformers datasets
! pip install --upgrade transformers
! pip install --upgrade datasets
! pip install --upgrade accelerate
#
# evaluate : Leap on Python 3.6: ERROR: No matching distribution found for evaluate, 
#   because evaluate requires Python>=3.7 and huggingface>=0.7.0. See pypi.
# Mocked it and fixed it by downloading evaluate-0.4.1.zip to ~/Downloads, unziped
#   edited setup.py in unziped, set python_requires>=3.7 and HF 0.4.0 ,
#   then ran pip install evaluate-0.4.1/ from the ~/Downloads directory.
#   That installed evaluate to site-packages in the venv, but I still needed to edit __init__.py
#   in site-packages/evaluate/__init__.py and remove # mz removed from .hub import push_to_hub
#   After that, restarted kernel, and 'evaluate' started to work.
! pip install --upgrade evaluate
! pip install --upgrade scikit-learn

# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git



# 2. Fine-tune a pretrained model on our dataset

There are significant benefits to using a pretrained model. It reduces computation costs, your carbon footprint, and allows you to use state-of-the-art models without having to train one from scratch. 🤗 Transformers provides access to thousands of pretrained models for a wide range of tasks. When you use a pretrained model, you train it on a dataset specific to your task. This is known as fine-tuning, an incredibly powerful training technique. In this tutorial, you will fine-tune a pretrained model with a deep learning framework of your choice:

* Fine-tune a pretrained model with 🤗 Transformers [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer).
* Fine-tune a pretrained model in TensorFlow with Keras.
* Fine-tune a pretrained model in native PyTorch.

<a id='data-processing'></a>

## 2.1. Prepare our dataset: datasets.load_dataset from yelp data file, create transformers.AutoTokenizer from pre-trained Bert model, create tokenized_dataset (slow), from it, small tokenized datasets for training and test

In [2]:
#@title
from IPython.display import HTML

# HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/_BZearw7f0w?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')

Before you can fine-tune a pretrained model, download a dataset and prepare it for training. The previous tutorial showed you how to process data for training, and now you get an opportunity to put those skills to the test!

Begin by loading the [Yelp Reviews](https://huggingface.co/datasets/yelp_review_full) dataset:

In [3]:
from datasets import load_dataset

dataset = load_dataset("yelp_review_full")
 # Show 1 items in dataset
dataset["train"][100]


  from .autonotebook import tqdm as notebook_tqdm
Reusing dataset yelp_review_full (/home/mzimmermann/.cache/huggingface/datasets/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf)
100%|██████████| 2/2 [00:00<00:00, 181.47it/s]


{'label': 0,
 'text': 'My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\\nThe cashier took my friends\'s order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited over five minutes for a gigantic order that included precisely one kid\'s meal. After watching two people who ordered after me be handed their food, I asked where mine was. The manager started yelling at the cashiers for \\"serving off their orders\\" when they didn\'t have their food. But neither cashier was anywhere near those controls, and the manager was the one serving food to customers and clearing the boards.\\nThe manager was rude when giving me my order. She didn\'t make sure that I had everything ON MY RECEIPT, and never even had the decency to apologize that I felt I was getting poor service.\\nI\'ve eaten at various McDonalds restaurants for over 30 years. 

As you now know, you need a tokenizer to process the text and include a padding and truncation strategy to handle any variable sequence lengths. To process your dataset in one step, use 🤗 Datasets [`map`](https://huggingface.co/docs/datasets/process.html#map) method to apply a preprocessing function over the entire dataset. With the tiny model, we get a message similar to

"Could not locate the tokenizer configuration file, will try to use the model config instead."



In [4]:
from transformers import AutoTokenizer

# Factory instantiate AutoTokenizer from pretrained model - tokenizer must be defined in the model (?).
# Note on naming on huggingface: 
#   - https://huggingface.co/google-bert/bert-base-cased 
#     redirects to https://huggingface.co/google-bert/bert-base-cased
#     in code parameters it is just called "bert-base-cased"
#     But, the same is not true for the TINY "bert_uncased_L-2_H-128_A-2",
#     it exists as https://huggingface.co/google/bert_uncased_L-2_H-128_A-2,
#     but there is no redirect 
tokenizer = AutoTokenizer.from_pretrained("google/bert_uncased_L-2_H-128_A-2") # "bert-base-cased"

# Function uses tokenizer from pretrained model.
# If the pretrained model has no predefined max_length, we MUST set it. Using max_length=max_length
def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=128)

# Timing:
#   - local/laptop/CPU
#     - with max_length=128 : 1m:56s
tokenized_datasets = dataset.map(tokenize_function, batched=True)


100%|██████████| 650/650 [01:56<00:00,  5.58ba/s]
Loading cached processed dataset at /home/mzimmermann/.cache/huggingface/datasets/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-68322ed0dbc89025.arrow


Create a smaller subset of the full dataset to fine-tune on to reduce the time it takes:

In [5]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
print(small_eval_dataset)
print(small_eval_dataset.data[0][1:3]) # label (0-5)
print(small_eval_dataset.data[1][1:3]) # text of review
# print(small_eval_dataset.data[2][1:3])
# print(small_eval_dataset.data[3][1:3])
# print(small_eval_dataset.data[4][1:3])


Loading cached shuffled indices for dataset at /home/mzimmermann/.cache/huggingface/datasets/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-c45ce1b890a8dcca.arrow
Loading cached shuffled indices for dataset at /home/mzimmermann/.cache/huggingface/datasets/yelp_review_full/yelp_review_full/1.0.0/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf/cache-971705894e3ff669.arrow


Dataset({
    features: ['label', 'text', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 1000
})
[
  [
    0,
    0
  ]
]
[
  [
    "Don't waste your time.  We had two different people come to our house to give us estimates for a deck (one of them the OWNER).  Both times, we never heard from them.  Not a call, not the estimate, nothing.",
    "All I can say is the worst! We were the only 2 people in the place for lunch, the place was freezing and loaded with kids toys! 2 bicycles, a scooter, and an electronic keyboard graced the dining room. A fish tank with filthy, slimy fingerprints smeared all over it is there for your enjoyment.\n\nOur food came... no water to drink, no tea, medium temperature food. Of course its cold, just like the room, I never took my jacket off! The plates are too small, you food spills over onto some semi-clean tables as you sit in your completely worn out booth seat. The fried noodles were out of a box and nasty, the shrimp was mushy, the fri

<a id='trainer'></a>

## 2.2. Train (fine tune) model using Pytorch on our dataset

At this point, you should follow the section corresponding to the framework you want to use. You can use the links
in the right sidebar to jump to the one you want - and if you want to hide all of the content for a given framework,
just use the button at the top-right of that framework's block!

In [6]:
#@title
from IPython.display import HTML

# HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/nvBXf7s7vTI?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')

### 2.2.1. Factory instantiate AutoModelForSequenceClassification model from the pre-trained Bert model file, during instantiation set number of expected features in dataset.



Hugginface transformers provides a [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) class optimized for training transformer models (no need to manually write our own training loop). The [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) API supports a wide range of training options and features such as logging, gradient accumulation, and mixed precision.

Start by loading your model and specify the number of expected features (called labels in num_labels here). From the Yelp Review [dataset card](https://huggingface.co/datasets/yelp_review_full#data-fields), you know there are five features:

In [7]:
from transformers import AutoModelForSequenceClassification

# Factory instantiate AutoModelForSequenceClassification model from the pre-trained Bert model file;
# labels correspond to 5 Dataset features: ['label', 'text', 'input_ids', 'token_type_ids', 'attention_mask'],
model = AutoModelForSequenceClassification.from_pretrained("google/bert_uncased_L-2_H-128_A-2", num_labels=5) # "bert-base-cased"

Some weights of the model checkpoint at google/bert_uncased_L-2_H-128_A-2 were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification w

<Tip>

There are warnings about pretrained weights ['classifier.bias', 'classifier.weight'] not being used, and also warnings about initization from checkpoints. This is normal. The pretrained head of the BERT model is discarded, and replaced with a randomly initialized classification head. You will fine-tune this new model head on your sequence classification task, transferring the knowledge of the pretrained model to it.

</Tip>

### 2.2.2. Evaluate: Use the evaluate package to create 'metric', get compute_metric (epsilon of accuracy), and create training_args for monitoring

[Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) does not automatically evaluate model performance during training. You'll need to pass [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) a function to compute and report metrics. The [🤗 Evaluate](https://huggingface.co/docs/evaluate/index) library provides a simple [`accuracy`](https://huggingface.co/spaces/evaluate-metric/accuracy) function you can load with the [evaluate.load](https://huggingface.co/docs/evaluate/main/en/package_reference/loading_methods#evaluate.load) (see this [quicktour](https://huggingface.co/docs/evaluate/a_quick_tour) for more information) function:

In [8]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

Call `compute` on `metric` to calculate the accuracy of your predictions. Before passing your predictions to `compute`, you need to convert the predictions to logits (remember all 🤗 Transformers models return logits):

In [9]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

To monitor the evaluation metrics during fine-tuning, the `evaluation_strategy` parameter must be added to the training arguments. Below, we ask it to report at the end of each epoch:

In [10]:
from transformers import TrainingArguments, Trainer

# The training output will go to "test_trainer" directory. Errors will be evaluated after each epoch.
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

### 2.2.3. Trainer: Create trainer instance from model, train_dataset, eval_dataset, training_args, and compute_metrics, then call trainer.train(model, training_args, train_dataset, eval_dataset, metrics)

Create a [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) object with your model, training arguments, training and test datasets, and evaluation function. 

todo there is a way to save checkpoints (temporary state) during training; this can be set in configuration, something like (save_strategy="yes"). See [purpose of save pretrained](https://discuss.huggingface.co/t/what-is-the-purpose-of-save-pretrained/9167). Add this stuff

In [11]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

import torch
print(torch.cuda.is_available())

False


Then fine-tune your model by calling [train()](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.train):

In [12]:
# Timing:
#   - local/server/cuda=false(CPU): 
#   - local/laptop/cuda=false(CPU)/max_lenght=128, epoch=3 : 1m04s, using 1.3G of memory @50%CPU Utilization.

trainer.train()

The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 1000
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 375


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.571663,0.292
2,No log,1.546337,0.335
3,No log,1.535603,0.341


The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 8
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 8
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 8


Training co

TrainOutput(global_step=375, training_loss=1.5644772135416667, metrics={'train_runtime': 68.0035, 'train_samples_per_second': 44.115, 'train_steps_per_second': 5.514, 'total_flos': 953756928000.0, 'train_loss': 1.5644772135416667, 'epoch': 3.0})

### 2.2.4 Save my re-trained model

There are save methods both on Model (save_pretrained) and Trainer (save_model).



In [13]:
# Save my re-trained model

trainer.save_model("HUGE-NO-BACKUP/model:tiny-bert_uncased_L-2_H-128_A-2-max_length-128-finetuned-on-yelp-1000.local-laptop-cpu")

Saving model checkpoint to HUGE-NO-BACKUP/model:tiny-bert_uncased_L-2_H-128_A-2-max_length-128-finetuned-on-yelp-1000.local-laptop-cpu
Configuration saved in HUGE-NO-BACKUP/model:tiny-bert_uncased_L-2_H-128_A-2-max_length-128-finetuned-on-yelp-1000.local-laptop-cpu/config.json
Model weights saved in HUGE-NO-BACKUP/model:tiny-bert_uncased_L-2_H-128_A-2-max_length-128-finetuned-on-yelp-1000.local-laptop-cpu/pytorch_model.bin


## Additional resources

For more fine-tuning examples, refer to:

- [🤗 Transformers Examples](https://github.com/huggingface/transformers/tree/main/examples) includes scripts
  to train common NLP tasks in PyTorch and TensorFlow.

- [🤗 Transformers Notebooks](https://huggingface.co/docs/transformers/main/en/notebooks) contains various notebooks on how to fine-tune a model for specific tasks in PyTorch and TensorFlow.