# Transformer Baseline – Sentiment Analysis (Single Dataset)

This notebook evaluates a pretrained Transformer model for binary sentiment classification on **one dataset only**.

**Model:** `distilbert-base-uncased-finetuned-sst-2-english`
**Architecture:** DistilBERT
**Training Data (original):** SST-2 (movie reviews)

## Purpose
- Run inference on `DATA_REVIEWS`
- Report metrics (classification report, confusion matrix)
- Provide optional error analysis

## How the Model Works (Short Technical Overview)
1) **Tokenization**: text → subword tokens + special tokens.
2) **Transformer encoder**: self-attention layers produce contextual embeddings.
3) **Classification head**: predicts `POSITIVE` / `NEGATIVE`.


In [1]:
# ============================================
# 1) Imports & setup
# ============================================
from __future__ import annotations

from dataclasses import dataclass
from typing import List

import pandas as pd
from transformers import pipeline
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_recall_fscore_support

pd.set_option("display.max_columns", 200)
pd.set_option("display.width", 120)


  from .autonotebook import tqdm as notebook_tqdm


## 2) Dataset path
Edit the path below if your filename differs.

**Expected columns** in the TSV:
- `Review` (text)
- `Liked` (label: `1` for positive, `0` for negative)


In [2]:
# ============================================
# 2) Path (EDIT IF NEEDED)
# ============================================
DATA_REVIEWS_PATH = "data/reviews_dataset.tsv"  # DATA_REVIEWS
SEP = "\t"


## 3) Load and quick sanity checks


In [3]:
# ============================================
# 3) Load dataset
# ============================================
def load_reviews_tsv(path: str, sep: str = "\t") -> pd.DataFrame:
    df = pd.read_csv(path, sep=sep)
    required = {"Review", "Liked"}
    missing = required - set(df.columns)
    if missing:
        raise ValueError(f"Missing columns in {path}: {missing}. Expected at least {required}.")
    df = df.copy()
    df["Review"] = df["Review"].fillna("").astype(str)
    df["Liked"] = df["Liked"].astype(int)
    return df

df = load_reviews_tsv(DATA_REVIEWS_PATH, sep=SEP)

print("DATA_REVIEWS:", df.shape)
display(df.head(3))

print("\nLabel distribution (0/1):")
print(df["Liked"].value_counts().to_dict())


DATA_REVIEWS: (6000, 2)


Unnamed: 0,Review,Liked
0,"I expected confusing, not this: impressive fans.",1
1,Not impressive at all — the check-in was actua...,0
2,I absolutely liked the drinks; it was outstand...,1



Label distribution (0/1):
{1: 3000, 0: 3000}


## 4) Create the pretrained pipeline
Notes:
- On CPU, use smaller `max_length` for faster evaluation if needed.
- If you have a GPU, set `device=0`.


In [4]:
# ============================================
# 4) Pretrained pipeline
# ============================================
MODEL_NAME = "distilbert-base-uncased-finetuned-sst-2-english"

clf = pipeline(
    task="sentiment-analysis",
    model=MODEL_NAME,
    device=-1  # CPU (set 0 for CUDA GPU)
)

BATCH_SIZE = 32
MAX_LENGTH = 256  # reduce for speed; 512 is slower


Loading weights: 100%|██████████| 104/104 [00:00<00:00, 252.54it/s, Materializing param=pre_classifier.weight]                                  


## 5) Evaluation helpers
We compute:
- classification report
- confusion matrix
- summary metrics (accuracy/precision/recall/f1 macro)


In [5]:
# ============================================
# 5) Evaluation helpers
# ============================================
@dataclass
class EvalResult:
    dataset_name: str
    n_rows: int
    accuracy: float
    precision_macro: float
    recall_macro: float
    f1_macro: float
    report_text: str
    conf_matrix: List[List[int]]
    y_true: List[int]
    y_pred: List[int]

def predict_labels(texts: List[str]) -> List[int]:
    preds = clf(
        texts,
        batch_size=BATCH_SIZE,
        truncation=True,
        max_length=MAX_LENGTH
    )
    return [1 if p["label"] == "POSITIVE" else 0 for p in preds]

def evaluate_dataset(df: pd.DataFrame, dataset_name: str) -> EvalResult:
    texts = df["Review"].fillna("").astype(str).tolist()
    y_true = df["Liked"].astype(int).tolist()
    y_pred = predict_labels(texts)

    acc = accuracy_score(y_true, y_pred)
    p, r, f1, _ = precision_recall_fscore_support(y_true, y_pred, average="macro", zero_division=0)
    rep = classification_report(y_true, y_pred, digits=4)
    cm = confusion_matrix(y_true, y_pred).tolist()

    return EvalResult(
        dataset_name=dataset_name,
        n_rows=len(df),
        accuracy=float(acc),
        precision_macro=float(p),
        recall_macro=float(r),
        f1_macro=float(f1),
        report_text=rep,
        conf_matrix=cm,
        y_true=y_true,
        y_pred=y_pred
    )

def summarize_results(r: EvalResult) -> pd.DataFrame:
    return pd.DataFrame([{
        "dataset": r.dataset_name,
        "rows": r.n_rows,
        "accuracy": r.accuracy,
        "precision_macro": r.precision_macro,
        "recall_macro": r.recall_macro,
        "f1_macro": r.f1_macro
    }])


## 6) Run evaluation


In [6]:
# ============================================
# 6) Run evaluation
# ============================================
res = evaluate_dataset(df, "DATA_REVIEWS")
display(summarize_results(res))


Unnamed: 0,dataset,rows,accuracy,precision_macro,recall_macro,f1_macro
0,DATA_REVIEWS,6000,0.879333,0.889214,0.879333,0.878563


## 7) Detailed report


In [7]:
# ============================================
# 7) Print report
# ============================================
print("=== DATA_REVIEWS ===")
print(res.report_text)


=== DATA_REVIEWS ===
              precision    recall  f1-score   support

           0     0.9512    0.7997    0.8689      3000
           1     0.8272    0.9590    0.8882      3000

    accuracy                         0.8793      6000
   macro avg     0.8892    0.8793    0.8786      6000
weighted avg     0.8892    0.8793    0.8786      6000



## 8) Confusion matrix
Format:
- rows = true labels (0,1)
- cols = predicted labels (0,1)


In [8]:
# ============================================
# 8) Confusion matrix
# ============================================
def cm_to_df(cm: List[List[int]]) -> pd.DataFrame:
    return pd.DataFrame(cm, index=["true_0", "true_1"], columns=["pred_0", "pred_1"])

display(cm_to_df(res.conf_matrix))


Unnamed: 0,pred_0,pred_1
true_0,2399,601
true_1,123,2877


## 9) Error analysis (optional)
Shows the longest misclassified reviews.


In [9]:
# ============================================
# 9) Error analysis
# ============================================
def build_error_table(df: pd.DataFrame, res: EvalResult, n: int = 20) -> pd.DataFrame:
    out = df.copy()
    out["y_true"] = res.y_true
    out["y_pred"] = res.y_pred
    out["is_error"] = out["y_true"] != out["y_pred"]

    err = out[out["is_error"]].copy()
    err["review_len"] = err["Review"].astype(str).str.len()
    return err.sort_values("review_len", ascending=False).head(n)[["y_true","y_pred","review_len","Review"]]

errors = build_error_table(df, res, n=15)
print("Errors (DATA_REVIEWS):", (pd.Series(res.y_true) != pd.Series(res.y_pred)).sum())
display(errors)


Errors (DATA_REVIEWS): 724


Unnamed: 0,y_true,y_pred,review_len,Review
3509,0,1,144,"seriously disappointing at first, yet it becam..."
5137,0,1,141,"quite unreliable at first, however it became o..."
2062,0,1,138,"honestly forgettable at first, but it became p..."
3387,0,1,137,"frankly forgettable at first, but it became pl..."
4803,0,1,137,"I felt frustrated by the service, but the rese..."
632,0,1,135,"really confusing at first, however it became g..."
413,0,1,135,"really terrible at first, yet it became impres..."
1566,0,1,135,The atmosphere started forgettable; yet the cr...
5981,0,1,134,"I couldn't stand the update, but the notificat..."
3536,0,1,134,The sound started unreliable; but the atmosphe...
