In [1]:
import pandas as pd
import numpy as np
import torch
import evaluate
from datasets import Dataset
# accuracy - does it match or not?
metric = evaluate.load("accuracy")

using dataset from: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences

In [2]:
reviews_df = pd.read_csv('https://raw.githubusercontent.com/gmjen/llmdatasets/main/amazon_cells_labelled.txt', sep='\t',
                           names=["review", "label"])

In [3]:
reviews_df['review'] = reviews_df['review'].str.lower()

In [4]:
reviews_df[reviews_df['label'] == 1]

Unnamed: 0,review,label
1,"good case, excellent value.",1
2,great for the jawbone.,1
4,the mic is great.,1
7,if you are razr owner...you must have this!,1
10,and the sound quality is great.,1
...,...,...
971,excellent product.,1
975,it is the best charger i have seen on the mark...,1
976,sweetest phone!!!,1
977,":-)oh, the charger seems to work fine.",1


create a Dataset object to for the Transformers library to use

In [5]:
dataset_reviews = Dataset.from_pandas(reviews_df)

# already has convenient train_test_split method
dataset = dataset_reviews.train_test_split(test_size=0.2, shuffle=True, seed=42)

In [6]:
dataset

DatasetDict({
    train: Dataset({
        features: ['review', 'label'],
        num_rows: 800
    })
    test: Dataset({
        features: ['review', 'label'],
        num_rows: 200
    })
})

In [7]:
dataset["train"] = dataset["train"].shuffle(seed=42)
dataset["test"] = dataset["test"].shuffle(seed=42)

In [8]:
dataset["train"][0]

{'review': 'the biggest complaint i have is, the battery drains superfast.',
 'label': 0}

In [9]:
dataset["train"][1]

{'review': 'it fits my ear well and is comfortable on.', 'label': 1}

In [10]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize_fn(examples):
    return tokenizer(examples["review"], padding="max_length", truncation=True, max_length = 128)

tokenized_datasets = dataset.map(tokenize_fn, batched=True)

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [11]:
train_dataset = tokenized_datasets["train"].shuffle(seed=42)
eval_dataset = tokenized_datasets["test"].shuffle(seed=42)

In [12]:
train_dataset[0]["review"]

'it worked very well.'

In [13]:
train_dataset[0]["label"]

1

In [14]:
torch.tensor(train_dataset[0]["input_ids"])

tensor([ 101, 2009, 2499, 2200, 2092, 1012,  102,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0])

[CLS] is a special token used in the BERT (Bidirectional Encoder Representations from Transformers) architecture. It stands for "classification" and is used as the first token in the input sequence. The [CLS] token is followed by the input text, and the output of the final hidden layer corresponding to the [CLS] token is used as the input to the classifier for the task.

In [15]:
np.array(tokenizer.convert_ids_to_tokens(train_dataset[0]["input_ids"]))

array(['[CLS]', 'it', 'worked', 'very', 'well', '.', '[SEP]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]',
       '[PAD]', '[P

Transformers has a Trainer class optimized to train Transformers models which we'll use here.

num_labels specifies the number of labels in your classification task.

In the case of the BERT model with AutoModelForSequenceClassification, the last layer of the model is a linear layer that maps the hidden representation of the input sequence to a vector of size num_labels. This final layer is then passed through a softmax function to convert the output to a probability distribution over the labels.

In [16]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

You will see a warning about some of the pretrained weights not being used and some weights being randomly initialized. Don't worry, this is completely normal! The pretrained head of the BERT model is discarded, and replaced with a randomly initialized classification head. You will fine-tune this new model head on your sequence classification task, transferring the knowledge of the pretrained model to it.

Training hyperparameters

Next, create a TrainingArguments class which contains all the hyperparameters you can tune as well as flags for activating different training options. For this tutorial you can start with the default training hyperparameters, but feel free to experiment with these to find your optimal settings.

Specify where to save the checkpoints from your training:

In [17]:
from transformers import TrainingArguments

training_args = TrainingArguments(output_dir="test_trainer")

The accuracy metric is a common evaluation metric used to measure the performance of a classification model. It measures the proportion of correctly classified samples out of the total number of samples in the test set.

In [18]:
metric = evaluate.load("accuracy")

eval_pred argument is a tuple containing the predicted logits and the true labels. Predicted logits are the output of the classifier, which represents the model's confidence for each class for each example. The true labels are the actual labels of the examples.

The np.argmax function is used to obtain the predicted class for each example. It takes the logits as input and gives us the index of the class with the highest confidence score.

Finally, the metric.compute function is used to compute the evaluation metric based on the predicted class and the true label for each example. The metric here refers to a specific evaluation metric object that was defined earlier in the code. The compute function takes two arguments - the predicted classes and the true labels - and returns the computed metric.

In [19]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [20]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch", num_train_epochs=4)

Create a Trainer object with your model, training arguments, training and test datasets, and evaluation function:

In [21]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

Now call `.train()`

In [22]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: review. If review are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 800
  Num Epochs = 4
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 400
  Number of trainable parameters = 109483778


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.164417,0.935
2,No log,0.284929,0.945
3,No log,0.27944,0.94
4,No log,0.285909,0.945


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: review. If review are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 200
  Batch size = 8
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: review. If review are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 200
  Batch size = 8
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: review. If review are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 200
  Batch size = 8
The fol

TrainOutput(global_step=400, training_loss=0.1590566062927246, metrics={'train_runtime': 80.4307, 'train_samples_per_second': 39.786, 'train_steps_per_second': 4.973, 'total_flos': 210488844288000.0, 'train_loss': 0.1590566062927246, 'epoch': 4.0})

In [23]:
train_dataset

Dataset({
    features: ['review', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 800
})

In PyTorch, the model.eval() method sets the model to evaluation mode. This is important because the behavior of certain layers, such as dropout and batch normalization, can differ between training and inference.

During training, dropout is used to randomly drop out units from the network to prevent overfitting. Batch normalization is used to normalize the activations of each layer to speed up training and improve performance.

During inference, we don't want to use dropout because we want to use the entire network for predictions. We also don't want to normalize the activations of each layer based on the statistics of the current mini-batch, because we are making predictions on individual samples. Therefore, we need to set the model to evaluation mode to ensure that these layers are applied correctly during inference.

When you call model.eval(), it sets the training attribute of the model to False, which disables dropout and batch normalization. It also freezes the parameters of the model, so that they are not updated during inference. This ensures that the model behaves consistently during inference and improves the accuracy of the predictions.


In [24]:
# set the model to evaluation mode
model.eval()

def eval_review(rev):
    # set the model to evaluation mode

    inputs = tokenizer(rev, return_tensors="pt")

    # move the input tensor to the same device as the model
    inputs = {k: v.to('cuda') for k, v in inputs.items()}

    # pass the input through the model to get the output logits
    with torch.no_grad():
        outputs = model(**inputs)

    # convert the logits to probabilities using a softmax function
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)

    # get the predicted label by selecting the index with the highest probability
    predicted_label = torch.argmax(probs, dim=-1)

    # return the predicted label
    return(predicted_label.item())

In [25]:
eval_review("I really don't like this thing at all.")

0

In [26]:
eval_review("It's amazing")

1

In [27]:
eval_review("I didn't know what to think but it turned out to be good")

1