Notebook prepared by Henrique Lopes Cardoso (hlc@fe.up.pt).

# TRANSFORMERS

In this notebook we will explore [Hugging Face Transformers](https://huggingface.co/docs/transformers/index).
You may also want to check the [Hugging Face course](https://huggingface.co/course/), which will explain you how to use this technology in a much greater depth.

Training transformer models is computationally expensive. Hugging Face makes available several pretrained [models](https://huggingface.co/models) that can be used as is, or fine-tuned to a specific NLP task, such as one of sentence classification. That's what we'll do in this notebook.

Hugging Face also makes available several [datasets](https://huggingface.co/datasets) that can be used to train or fine-tune a model.

## Loading a dataset

In this notebook, we'll start by using a local dataset (instead of using a dataset stored at Hugging Face).
Let's load data for our classification task.

In [1]:
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Restaurant_Reviews.tsv', delimiter = '\t', quoting = 3)

dataset.rename(columns={'Liked':'label'}, inplace = True) # shouldn't need this if label_names could be used in TrainingArguments...

dataset.head()

Unnamed: 0,Review,label
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1


For ease of usage with Transformer models, we convert the dataset into a Hugging Face dataset and split it into train, validation and test sets.

In [2]:
from datasets import Dataset

dataset_hf = Dataset.from_pandas(dataset)

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from datasets import DatasetDict

# 90% train, 10% test+validation
train_test = dataset_hf.train_test_split(test_size=0.1)

# Split the 10% test+validation set in half test, half validation
valid_test = train_test['test'].train_test_split(test_size=0.5)

# gather everyone if you want to have a single DatasetDict
train_valid_test_dataset = DatasetDict({
    'train': train_test['train'],
    'validation': valid_test['train'],
    'test': valid_test['test']
})

In [4]:
train_valid_test_dataset

DatasetDict({
    train: Dataset({
        features: ['Review', 'label'],
        num_rows: 900
    })
    validation: Dataset({
        features: ['Review', 'label'],
        num_rows: 50
    })
    test: Dataset({
        features: ['Review', 'label'],
        num_rows: 50
    })
})

## Fine-tuning a pretrained model

As a starting example, we'll use a lighter BERT-based model. We will need to load:
- the [tokenizer](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer) (which is used to [preprocess](https://huggingface.co/docs/transformers/preprocessing) the data before it can be used by the model)
- the [model](https://huggingface.co/docs/transformers/autoclass_tutorial#automodel) itself

In [5]:
model_name = "distilbert-base-uncased"

### Tokenizer

We first load the tokenizer for our model:

In [6]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name)

Now we need to [preprocess](https://huggingface.co/docs/transformers/preprocessing) our data. We will do it for the three partitions (train, validation and test) in a single step. For that, we'll make use of [map](https://huggingface.co/docs/datasets/process#map) with the help of an auxiliary function.

In [7]:
def preprocess_function(sample):
    return tokenizer(sample["Review"], truncation=True)

In [8]:
tokenized_dataset = train_valid_test_dataset.map(preprocess_function, batched=True)

100%|██████████| 1/1 [00:00<00:00, 31.33ba/s]
100%|██████████| 1/1 [00:00<00:00, 334.23ba/s]
100%|██████████| 1/1 [00:00<00:00, 334.21ba/s]


In [9]:
tokenized_dataset

DatasetDict({
    train: Dataset({
        features: ['Review', 'label', 'input_ids', 'attention_mask'],
        num_rows: 900
    })
    validation: Dataset({
        features: ['Review', 'label', 'input_ids', 'attention_mask'],
        num_rows: 50
    })
    test: Dataset({
        features: ['Review', 'label', 'input_ids', 'attention_mask'],
        num_rows: 50
    })
})

When preprocessing the text, we have actually translated the text into numbers, which is known as [encoding](https://huggingface.co/course/chapter2/4?fw=pt#encoding).

In [10]:
tokenized_dataset['train'][321]

{'Review': 'They dropped more than the ball.',
 'label': 0,
 'input_ids': [101, 2027, 3333, 2062, 2084, 1996, 3608, 1012, 102],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

Encoding is done in a two-step process: tokenization, followed by conversion to input IDs.

In [11]:
tokens = tokenizer.tokenize(tokenized_dataset['train'][321]['Review'])
print(tokens)
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

['they', 'dropped', 'more', 'than', 'the', 'ball', '.']
[2027, 3333, 2062, 2084, 1996, 3608, 1012]


The tokenizer actually adds two special tokens when preprocessing: one at the beginning, and one at the end.

In [12]:
inputs = tokenizer(tokenized_dataset['train'][321]['Review'])
inputs['input_ids']   # or inputs.input_ids

[101, 2027, 3333, 2062, 2084, 1996, 3608, 1012, 102]

We can [decode](https://huggingface.co/course/chapter2/4?fw=pt#decoding) the sequence to check what are these tokens:

In [13]:
tokenizer.decode(inputs['input_ids'])

'[CLS] they dropped more than the ball. [SEP]'

As with enconding, we can decode in two separate steps:

In [14]:
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'])
print(tokens)
print(tokenizer.convert_tokens_to_string(tokens))

['[CLS]', 'they', 'dropped', 'more', 'than', 'the', 'ball', '.', '[SEP]']
[CLS] they dropped more than the ball. [SEP]


### Loading the model

We now load the pretrained model:

In [15]:
from transformers import AutoModel

model = AutoModel.from_pretrained(model_name)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading the model in this way only gets us the base Transformer module: given some inputs, we obtain the hidden state of the model -- a high-dimensional vector representing the "contextual understanding" of that input by the Transformer model.

In other words, we are leaving out the *head* of the model, which is needed for whatever NLP task we want to address.

Let's look at a particular example:

In [16]:
inputs = tokenizer(train_valid_test_dataset['train'][321]['Review'], padding=True, truncation=True, return_tensors="pt")

print(train_valid_test_dataset['train'][321])
print(inputs['input_ids'])
print(inputs['input_ids'].shape)

outputs = model(**inputs)
print(outputs.last_hidden_state)   # or outputs["last_hidden_state"]

print(outputs.last_hidden_state.shape)

{'Review': 'They dropped more than the ball.', 'label': 0}
tensor([[ 101, 2027, 3333, 2062, 2084, 1996, 3608, 1012,  102]])
torch.Size([1, 9])
tensor([[[-0.1096, -0.0425, -0.0866,  ...,  0.0223,  0.3055,  0.2207],
         [ 0.0129,  0.1259, -0.2013,  ..., -0.0088,  0.5012, -0.2007],
         [ 0.0587,  0.0848, -0.1349,  ..., -0.1785,  0.1844, -0.5037],
         ...,
         [ 0.0858, -0.2218, -0.1071,  ...,  0.0398,  0.0567, -0.2378],
         [ 0.8239, -0.0150, -0.5019,  ...,  0.2271, -0.3998, -0.5345],
         [ 0.3378,  0.2211,  0.0639,  ...,  0.0901, -0.0363, -0.4727]]],
       grad_fn=<NativeLayerNormBackward0>)
torch.Size([1, 9, 768])


As you can see, the hidden state representation has three dimensions:
- the *batch size* (in this case we are passing the model a single input sequence)
- the *sequence length*, that is, the number of tokens created by the tokenizer when encoding each input sequence
- the *hidden state size*, which is the vector dimension of each token (768 in the case of this model)

Since we want to use the model for classification, we should load it with an appropriate classification head:

In [17]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

Now the outputs of the model will be much different: we get *logits* with the prediction for each class.

In [18]:
outputs = model(**inputs)
print(outputs.logits)
print(outputs.logits.shape)

tensor([[-0.1148, -0.0612]], grad_fn=<AddmmBackward0>)
torch.Size([1, 2])


Logits are raw, unnormalized scores outputted by the last layer of the model. To be converted to probabilities, they need to go through a *softmax* layer.

In [19]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

model.config.id2label

tensor([[0.4866, 0.5134]], grad_fn=<SoftmaxBackward0>)


{0: 'LABEL_0', 1: 'LABEL_1'}

Now we can interpret the obtained values as probabilities, and identify the class for which the model assigns a higher probability for the input example.

Note, however, that for now the model is just guessing the output logits/probabilities, as it hasn't been trained with our dataset just yet. To better see this behavior, ask the user for some input, feed it to the model, and check its predictions.

In [20]:
# your code here


### Fine-tuning

The next step is to [fine-tune](https://huggingface.co/docs/transformers/training) the model with our train data. To do so, we can make use of a [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer).
There are several aspects of training that you can specify via [TrainingArguments](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments).

In [21]:
from transformers import TrainingArguments, Trainer
from transformers import DataCollatorWithPadding
from datasets import load_metric
import numpy as np

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch", # run validation at the end of each epoch
    save_strategy="epoch",
    load_best_model_at_end=True,
)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [22]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: Review. If Review are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 900
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 171


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.244978,0.92
2,No log,0.19555,0.94
3,No log,0.203258,0.92


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: Review. If Review are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 50
  Batch size = 16
Saving model checkpoint to ./results\checkpoint-57
Configuration saved in ./results\checkpoint-57\config.json
Model weights saved in ./results\checkpoint-57\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-57\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-57\special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: Review. If Review are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 50
  B

TrainOutput(global_step=171, training_loss=0.2458481370357045, metrics={'train_runtime': 185.2393, 'train_samples_per_second': 14.576, 'train_steps_per_second': 0.923, 'total_flos': 21860225482896.0, 'train_loss': 0.2458481370357045, 'epoch': 3.0})

We can check the model's performance in the evaluation set.

In [23]:
trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: Review. If Review are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 50
  Batch size = 16


{'eval_loss': 0.19555048644542694,
 'eval_accuracy': 0.94,
 'eval_runtime': 0.5494,
 'eval_samples_per_second': 91.015,
 'eval_steps_per_second': 7.281,
 'epoch': 3.0}

And more importantly, we can check how the model fares in our test set.

In [24]:
trainer.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: Review. If Review are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 50
  Batch size = 16


PredictionOutput(predictions=array([[-1.7962309 ,  2.0445058 ],
       [ 1.7875428 , -1.7000335 ],
       [ 0.62254703, -0.6566263 ],
       [-1.7917267 ,  2.0327897 ],
       [ 1.8473191 , -1.813443  ],
       [ 1.076417  , -1.1199075 ],
       [-1.360338  ,  1.6948786 ],
       [ 1.5854409 , -1.5754024 ],
       [-1.6931822 ,  1.9673436 ],
       [ 1.7634561 , -1.6175969 ],
       [ 1.810007  , -1.7003915 ],
       [ 1.8010676 , -1.7620264 ],
       [-1.7221167 ,  1.9492059 ],
       [-0.784635  ,  1.0255976 ],
       [-1.7573326 ,  1.9279591 ],
       [ 1.7725143 , -1.76555   ],
       [-0.9923752 ,  1.2876046 ],
       [ 1.8278309 , -1.7965221 ],
       [-1.6947728 ,  1.9910243 ],
       [ 1.8068326 , -1.7501447 ],
       [-1.7213193 ,  2.051017  ],
       [-1.6788559 ,  2.0276203 ],
       [ 1.7324837 , -1.6981806 ],
       [-1.6609968 ,  1.875215  ],
       [-1.5135205 ,  1.8086493 ],
       [-1.732873  ,  2.0078683 ],
       [-1.6888554 ,  1.8992101 ],
       [ 1.7647946 , -1.63

#### Saving the model

The model can be saved for future loading.

In [25]:
trainer.save_model()

Saving model checkpoint to ./results
Configuration saved in ./results\config.json
Model weights saved in ./results\pytorch_model.bin
tokenizer config file saved in ./results\tokenizer_config.json
Special tokens file saved in ./results\special_tokens_map.json


#### Loading and using a saved model

In [26]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer2 = AutoTokenizer.from_pretrained("./results")
model2 = AutoModelForSequenceClassification.from_pretrained("./results", num_labels=2)

Didn't find file ./results\added_tokens.json. We won't load it.
loading file ./results\vocab.txt
loading file ./results\tokenizer.json
loading file None
loading file ./results\special_tokens_map.json
loading file ./results\tokenizer_config.json
loading configuration file ./results\config.json
Model config DistilBertConfig {
  "_name_or_path": "./results",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "problem_type": "single_label_classification",
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.19.2",
  "vocab_size": 30522
}

loading weights file ./results\pytorch_model.bin
All model checkpoint weig

To exploit the model, we can use a pipeline.

In [27]:
from transformers import TextClassificationPipeline

pipe = TextClassificationPipeline(model=model2, tokenizer=tokenizer2) #, return_all_scores=True)

In [28]:
pipe("I love this food!")

[{'label': 'LABEL_1', 'score': 0.9771549105644226}]

We can also use the model in a step-by-step fashion, as follows.

In [29]:
import torch

inputs = "I love this food!"

# tokenize inputs
tokenized_inputs = tokenizer2(inputs, return_tensors="pt")
print(tokenized_inputs)

# obtain model outputs
outputs = model2(**tokenized_inputs)
print(outputs)

# get the most likely label
labels = ['NEGATIVE', 'POSITIVE']
prediction = torch.argmax(outputs.logits)
print(labels[prediction])

{'input_ids': tensor([[ 101, 1045, 2293, 2023, 2833,  999,  102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}
SequenceClassifierOutput(loss=None, logits=tensor([[-1.7309,  2.0250]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
POSITIVE


Let's check again the performance of the model in the test set, possibly with additional metrics.

In [30]:
y_pred= []
for p in tokenized_dataset['test']['Review']:
    ti = tokenizer2(p, return_tensors="pt")
    out = model2(**ti)
    pred = torch.argmax(out.logits)
    y_pred.append(pred)   # our labels are already 0 and 1

In [31]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

y_test = tokenized_dataset['test']['label']

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='macro'))
print('Recall: ', recall_score(y_test, y_pred, average='macro'))
print('F1: ', f1_score(y_test, y_pred, average='macro'))

[[23  1]
 [ 2 24]]
Accuracy:  0.94
Precision:  0.94
Recall:  0.9407051282051282
F1:  0.9399759903961584


We can do the same using a Trainer, as before.

In [32]:
trainer2 = Trainer(
    model=model2,
    tokenizer=tokenizer2,
    compute_metrics=compute_metrics
)

No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [33]:
trainer2.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: Review. If Review are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 50
  Batch size = 8


PredictionOutput(predictions=array([[-1.7962308 ,  2.044506  ],
       [ 1.7875426 , -1.7000337 ],
       [ 0.6225471 , -0.6566263 ],
       [-1.7917267 ,  2.0327897 ],
       [ 1.847319  , -1.8134431 ],
       [ 1.0764166 , -1.119907  ],
       [-1.360338  ,  1.6948786 ],
       [ 1.5854409 , -1.5754024 ],
       [-1.6931822 ,  1.9673437 ],
       [ 1.7634561 , -1.6175969 ],
       [ 1.8100071 , -1.7003918 ],
       [ 1.8010676 , -1.7620264 ],
       [-1.7221166 ,  1.9492058 ],
       [-0.78463507,  1.0255976 ],
       [-1.7573326 ,  1.9279591 ],
       [ 1.7725143 , -1.7655503 ],
       [-0.9923751 ,  1.2876042 ],
       [ 1.8278309 , -1.7965221 ],
       [-1.6947728 ,  1.9910243 ],
       [ 1.8068324 , -1.7501446 ],
       [-1.7213192 ,  2.0510168 ],
       [-1.6788558 ,  2.0276203 ],
       [ 1.7324837 , -1.6981807 ],
       [-1.6609969 ,  1.8752148 ],
       [-1.5135205 ,  1.8086493 ],
       [-1.732873  ,  2.0078683 ],
       [-1.6888554 ,  1.8992101 ],
       [ 1.7647946 , -1.63

## Using a task-related pretrained model

Given the fact that Hugging Face includes several pretrained models, we can also use directly a model that has been pretrained with similar data or for a similar task.

In [34]:
from transformers import pipeline

model_name = "siebert/sentiment-roberta-large-english"
# model_name = "distilbert-base-uncased-finetuned-sst-2-english"
sentiment_analysis = pipeline("sentiment-analysis", model=model_name)

https://huggingface.co/siebert/sentiment-roberta-large-english/resolve/main/config.json not found in cache or force_download set to True, downloading to C:\Users\up201806451\.cache\huggingface\transformers\tmp_w3uzycp
Downloading: 100%|██████████| 687/687 [00:00<00:00, 688kB/s]
storing https://huggingface.co/siebert/sentiment-roberta-large-english/resolve/main/config.json in cache at C:\Users\up201806451/.cache\huggingface\transformers\228e83e1ade2247aebc5f0725e330fa58dedee3d9eec36c9249f25084a946130.1aece0680a18a95d51d6e1a5f83631412da37b87db65380c52052161354505ba
creating metadata file for C:\Users\up201806451/.cache\huggingface\transformers\228e83e1ade2247aebc5f0725e330fa58dedee3d9eec36c9249f25084a946130.1aece0680a18a95d51d6e1a5f83631412da37b87db65380c52052161354505ba
loading configuration file https://huggingface.co/siebert/sentiment-roberta-large-english/resolve/main/config.json from cache at C:\Users\up201806451/.cache\huggingface\transformers\228e83e1ade2247aebc5f0725e330fa58dedee

https://huggingface.co/siebert/sentiment-roberta-large-english/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to C:\Users\up201806451\.cache\huggingface\transformers\tmpopetg8wu
Downloading: 100%|██████████| 150/150 [00:00<00:00, 150kB/s]
storing https://huggingface.co/siebert/sentiment-roberta-large-english/resolve/main/special_tokens_map.json in cache at C:\Users\up201806451/.cache\huggingface\transformers\e7bd01a8669e2d76258ba5ab711ba48da69b2dfc573c7b02566c0e73bd4583f4.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
creating metadata file for C:\Users\up201806451/.cache\huggingface\transformers\e7bd01a8669e2d76258ba5ab711ba48da69b2dfc573c7b02566c0e73bd4583f4.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
loading file https://huggingface.co/siebert/sentiment-roberta-large-english/resolve/main/vocab.json from cache at C:\Users\up201806451/.cache\huggingface\transformers\b522c6365937d6f39045d31ba715d

Let's see how it performs without any fine-tuning (this time making use of the pipeline to predict the label for each of the test set samples).

In [35]:
y_pred= []
for p in train_valid_test_dataset['test']['Review']:
    if(sentiment_analysis(p)[0]['label'] == 'NEGATIVE'):
        y_pred.append(0)
    else:
        y_pred.append(1)

In [36]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

y_test = train_valid_test_dataset['test']['label']

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='macro'))
print('Recall: ', recall_score(y_test, y_pred, average='macro'))
print('F1: ', f1_score(y_test, y_pred, average='macro'))

[[24  0]
 [ 0 26]]
Accuracy:  1.0
Precision:  1.0
Recall:  1.0
F1:  1.0


As before, we can do the same via a Trainer.

In [37]:
from transformers import Trainer

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
trainer = Trainer(model=model, compute_metrics=compute_metrics)

loading configuration file https://huggingface.co/siebert/sentiment-roberta-large-english/resolve/main/config.json from cache at C:\Users\up201806451/.cache\huggingface\transformers\228e83e1ade2247aebc5f0725e330fa58dedee3d9eec36c9249f25084a946130.1aece0680a18a95d51d6e1a5f83631412da37b87db65380c52052161354505ba
Model config RobertaConfig {
  "_name_or_path": "siebert/sentiment-roberta-large-english",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "NEGATIVE",
    "1": "POSITIVE"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "NEGATIVE": 0,
    "POSITIVE": 1
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_

In [38]:
def preprocess_function(sample):
    return tokenizer(sample["Review"], truncation=True, padding=True)

In [39]:
tokenized_dataset = train_valid_test_dataset.map(preprocess_function, batched=True)

100%|██████████| 1/1 [00:00<00:00,  7.65ba/s]
100%|██████████| 1/1 [00:00<00:00, 200.55ba/s]
100%|██████████| 1/1 [00:00<00:00, 250.63ba/s]


In [40]:
trainer.predict(test_dataset=tokenized_dataset["test"])

The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: Review. If Review are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 50
  Batch size = 8


PredictionOutput(predictions=array([[-3.8258243,  3.0153737],
       [ 3.9576025, -3.65704  ],
       [-3.7286463,  2.8901403],
       [-3.7925448,  2.9612765],
       [ 3.783376 , -3.3045237],
       [ 3.8281064, -3.3748016],
       [ 3.630708 , -3.0900671],
       [ 3.9556363, -3.653799 ],
       [-3.6981673,  2.859723 ],
       [ 3.9423103, -3.5984879],
       [ 3.9537852, -3.6360152],
       [ 3.9561827, -3.6323886],
       [-3.7282338,  2.8893423],
       [-3.4794452,  2.6538527],
       [-3.798501 ,  2.9711885],
       [ 3.962435 , -3.6613283],
       [-3.8118863,  2.9890978],
       [ 3.9261298, -3.551795 ],
       [-3.819361 ,  3.000381 ],
       [ 3.956467 , -3.6563478],
       [-3.7488778,  2.910654 ],
       [-3.7555733,  2.9186604],
       [ 3.937712 , -3.5707688],
       [-3.7764068,  2.9404213],
       [-3.796835 ,  2.9664743],
       [-3.8126688,  2.9876044],
       [-3.8181229,  2.99839  ],
       [ 3.9505115, -3.6197758],
       [ 3.9555974, -3.6361768],
       [ 3.955

Note that we can still fine-tune the model with our training data, but the performance of the model is already quite good without any further training!