# Sentiment DistilBERT (benchmark)

### Environment configuration


In [1]:
!pip install -qU transformers accelerate datasets==2.16.0 watermark textattack
!pip install pyarrow
!pip install "numpy<2"
!pip install -q pandas tqdm

%reload_ext watermark
%watermark -vmp transformers,datasets,torch,numpy,pandas,tqdm

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Python implementation: CPython
Python version       : 3.12.12
IPython version      : 7.34.0

transformers: 4.57.1
datasets    : 2.16.0
torch       : 2.8.0+cu126
numpy       : 1.26.4
pandas      : 2.2.2
tqdm        : 4.67.1

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 6.6.105+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit

Using device: cuda


## Dataset

https://huggingface.co/datasets/takala/financial_phrasebank

In [2]:
from datasets import load_dataset, DatasetDict, ClassLabel, Dataset

dataset = load_dataset("takala/financial_phrasebank", "sentences_50agree")

full_dataset = dataset['train']

split_dataset = full_dataset.train_test_split(test_size=0.2, seed=42)
test_valid_split = split_dataset['test'].train_test_split(test_size=0.5, seed=42)

dataset = DatasetDict({
    'train': split_dataset['train'],
    'validation': test_valid_split['train'],
    'test': test_valid_split['test']
})

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Downloading builder script: 0.00B [00:00, ?B/s]

Downloading readme: 0.00B [00:00, ?B/s]

Downloading data:   0%|          | 0.00/682k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/4846 [00:00<?, ? examples/s]

In [3]:
label_names = dataset["train"].features["label"].names
label2id = {name: dataset["train"].features["label"].str2int(name) for name in label_names}
id2label = {id: label for label, id in label2id.items()}

print("Label names: ", label_names)
print("Label ids: ", label2id["negative"], label2id['neutral'], label2id["positive"])

Label names:  ['negative', 'neutral', 'positive']
Label ids:  0 1 2


## DistilBERT base uncased finetuned SST-2

[DistilBERT base uncased finetuned SST-2](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)

In [8]:
from transformers import pipeline

sentiment_clf = pipeline(
    model="distilbert-base-uncased-finetuned-sst-2-english",
    device=device, batch_size=32
)

Device set to use cuda


In [9]:
from transformers.pipelines.pt_utils import KeyDataset

test_outputs = []
for output in sentiment_clf(KeyDataset(dataset["test"], "sentence"), top_k=None):
    test_outputs.append(output[0])

print(f"Inference complete. Total predictions: {len(test_outputs)}")

Inference complete. Total predictions: 485


## Metrics

In [17]:
from sklearn.metrics import classification_report

true_labels = dataset["test"]["label"]

sst2_label_map_alt = {
    "NEGATIVE": label2id["negative"],
    "POSITIVE": label2id["positive"],
}
predicted_labels = [sst2_label_map_alt[output['label']] for output in test_outputs]

print("\n--- Final Test Results (Sentiment DistinBERT Zero-Shot Baseline via Pipeline) ---")

report = classification_report(
    y_true=true_labels,
    y_pred=predicted_labels,
    target_names=label_names,
    digits=4
)

print(report)


--- Final Test Results (Sentiment DistinBERT Zero-Shot Baseline via Pipeline) ---
              precision    recall  f1-score   support

    negative     0.2308    0.9500    0.3713        60
     neutral     0.0000    0.0000    0.0000       282
    positive     0.3697    0.6154    0.4619       143

    accuracy                         0.2990       485
   macro avg     0.2002    0.5218    0.2778       485
weighted avg     0.1376    0.2990    0.1821       485



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
