<font size='6'> <b> Fine tuning Twitter roberta </b> </font>  
The objective of this notebook is to refine a version of Roberta, which has been specifically trained on Twitter data, on our subset of manually annotated tweets. We will compare the results obtained with the baseline given by logistic regression.


In [3]:
from google.colab import files
uploaded = files.upload()


Saving Training data.csv to Training data.csv


In [4]:
import os
os.environ["WANDB_DISABLED"] = "true"

In [13]:
!pip install -q transformers datasets scikit-learn

# Using GPU if available
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

Using device: cuda


In [14]:
from datasets import load_dataset
dataset = load_dataset('csv', data_files='Training data.csv')

In [15]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Use a model trained on Twitter data
model_name = "cardiffnlp/twitter-roberta-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
model.to(device)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
         

We divided the data in three splits: 80% for training, 10% for validation and 10% for testing.

In [16]:
def tokenize_function(example):
    return tokenizer(example['text'], padding='max_length', truncation=True, max_length=128)

tokenized_dataset = dataset['train'].map(tokenize_function, batched=True)
tokenized_dataset = tokenized_dataset.rename_column("label", "labels")
tokenized_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

# Dividing in train/test/validation
train_val_split = tokenized_dataset.train_test_split(test_size=0.2, seed=42)
train_dataset = train_val_split['train']

# Further split the 20% into validation and test (10% for each)
val_test_split = train_val_split['test'].train_test_split(test_size=0.5, seed=42)

val_dataset = val_test_split['train']
test_dataset = val_test_split['test']



The metric that has been selected for the evaluation of the model is F1 macro.

In [17]:
from sklearn.metrics import precision_recall_fscore_support, accuracy_score

def compute_metrics(p):
    preds = torch.tensor(p.predictions).argmax(dim=1)
    labels = torch.tensor(p.label_ids)

    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average=None, labels=[0, 1, 2])
    f1_macro = f1.mean()
    acc = accuracy_score(labels, preds)

    return {
        "accuracy": acc,
        "f1_macro": f1_macro,
        "f1_negative": f1[0],
        "f1_neutral": f1[1],
        "f1_positive": f1[2],
    }

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=8,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=2e-5,
    weight_decay=0.01,
    warmup_steps=50,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1_macro",
    greater_is_better=True,
    fp16=torch.cuda.is_available(),
    logging_dir="./logs",
    logging_steps=10
)

from transformers import Trainer, EarlyStoppingCallback

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
  trainer = Trainer(


In [18]:
trainer.train()

#Evaluating on test dataset
eval_result = trainer.evaluate(test_dataset)
print("Evaluation Results:", eval_result)


# Saving model and downloading it
import os
from datetime import datetime

model_name = "twitter-roberta-base"
timestamp = datetime.now().strftime("%Y%m%d_%H%M")
save_dir = f"./sentiment_model_{model_name.replace('/', '_')}_{timestamp}"

model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)

print(f"Model saved to: {save_dir}")

Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro,F1 Negative,F1 Neutral,F1 Positive
1,0.955,0.896465,0.54321,0.521412,0.65,0.298851,0.615385
2,0.8651,0.842985,0.580247,0.568354,0.625,0.387755,0.692308
3,0.6092,0.890838,0.617284,0.611172,0.538462,0.575758,0.719298
4,0.3845,0.990598,0.648148,0.647989,0.625,0.568966,0.75
5,0.2191,1.238752,0.611111,0.606949,0.593407,0.522523,0.704918
6,0.1635,1.47875,0.611111,0.609166,0.58427,0.542373,0.700855


Evaluation Results: {'eval_loss': 1.0222506523132324, 'eval_accuracy': 0.6975308641975309, 'eval_f1_macro': 0.6934714590964591, 'eval_f1_negative': 0.703125, 'eval_f1_neutral': 0.6153846153846154, 'eval_f1_positive': 0.7619047619047619, 'eval_runtime': 0.3691, 'eval_samples_per_second': 438.885, 'eval_steps_per_second': 29.801, 'epoch': 6.0}
Model saved to: ./sentiment_model_twitter-roberta-base_20250510_1019


In [19]:
# Downloading fine tuned model in a zipped folder
import shutil
shutil.make_archive(save_dir, 'zip', save_dir)

files.download(f"{save_dir}.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>