# Contextual Transformer Model 

## Introduction:

In this notebook, I fine-tune a pretrained Transformer model for stress detection from social media posts. The goal is to compare a contextual model against the interpretable feature-based baseline, using the same train/test split strategy and a robust evaluation setup.

In [2]:
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import kagglehub

RANDOM_STATE = 42

In [3]:
path = kagglehub.dataset_download("monishakant/dataset-for-stress-analysis-in-social-media")
csv_path = os.path.join(path, "dreaddit_StressAnalysis - Sheet1.csv")

df = pd.read_csv(csv_path)
df.head()


Unnamed: 0,id,subreddit,post_id,sentence_range,text,label,confidence,social_timestamp,social_karma,syntax_ari,...,lex_dal_min_pleasantness,lex_dal_min_activation,lex_dal_min_imagery,lex_dal_avg_activation,lex_dal_avg_imagery,lex_dal_avg_pleasantness,social_upvote_ratio,social_num_comments,syntax_fk_grade,sentiment
0,896,relationships,7nu7as,"[50, 55]","Its like that, if you want or not.“ ME: I have...",0,0.8,1514980773,22,-1.238793,...,1.0,1.2,1.0,1.65864,1.32245,1.80264,0.63,62,-0.148707,0.0
1,19059,anxiety,680i6d,"(5, 10)",I man the front desk and my title is HR Custom...,0,1.0,1493348050,5,7.684583,...,1.4,1.125,1.0,1.69133,1.6918,1.97249,1.0,2,7.398222,-0.065909
2,7977,ptsd,8eeu1t,"(5, 10)",We'd be saving so much money with this new hou...,1,1.0,1524516630,10,2.360408,...,1.1429,1.0,1.0,1.70974,1.52985,1.86108,1.0,8,3.149288,-0.036818
3,1214,ptsd,8d28vu,"[2, 7]","My ex used to shoot back with ""Do you want me ...",1,0.5,1524018289,5,5.997,...,1.0,1.3,1.0,1.72615,1.52,1.84909,1.0,7,6.606,-0.066667
4,1965,relationships,7r1e85,"[23, 28]",I haven’t said anything to him yet because I’m...,0,0.8,1516200171,138,4.649418,...,1.125,1.1429,1.0,1.75642,1.43582,1.91725,0.84,70,4.801869,0.141667


In [4]:
# X = raw text, y = labels
train_idx_path = "../data/train_idx.npy"
test_idx_path = "../data/test_idx.npy"

if os.path.exists(train_idx_path) and os.path.exists(test_idx_path):
    train_idx = np.load(train_idx_path)
    test_idx = np.load(test_idx_path)
else:
    # Fallback: deterministic split if index files are missing
    all_idx = df.index.to_numpy()
    labels = df["label"].astype(int)
    train_idx, test_idx = train_test_split(
        all_idx,
        test_size=0.2,
        random_state=RANDOM_STATE,
        stratify=labels,
    )
    np.save(train_idx_path, train_idx)
    np.save(test_idx_path, test_idx)

X_train = df.loc[train_idx, "text"].astype(str).tolist()
X_test  = df.loc[test_idx, "text"].astype(str).tolist()

y_train = df.loc[train_idx, "label"].astype(int).tolist()
y_test  = df.loc[test_idx, "label"].astype(int).tolist()

len(X_train), len(X_test)


(500, 143)

## Model Choice:

I start with a compact pretrained model (e.g., DistilBERT) to keep training efficient while still benefiting from contextual representations.

## Tokenization + Dataset:

In [5]:
# Install dependencies once from terminal:
# pip install -r requirements.txt

import torch
from datasets import Dataset
from transformers import AutoTokenizer

MODEL_NAME = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

train_ds = Dataset.from_dict({"text": X_train, "label": y_train})
test_ds  = Dataset.from_dict({"text": X_test,  "label": y_test})


def tokenize(batch):
    return tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=128
    )

train_tok = train_ds.map(tokenize, batched=True)
test_tok  = test_ds.map(tokenize, batched=True)

train_tok = train_tok.remove_columns(["text"])
test_tok  = test_tok.remove_columns(["text"])

train_tok.set_format("torch")
test_tok.set_format("torch")


Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/143 [00:00<?, ? examples/s]

## Fine-Tuning:

In [6]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
import evaluate

accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    return {
        "accuracy": accuracy.compute(predictions=preds, references=labels)["accuracy"],
        "f1_macro": f1.compute(predictions=preds, references=labels, average="macro")["f1"],
    }

model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2)

args = TrainingArguments(
    output_dir="results/transformer_distilbert",
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1_macro",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    learning_rate=2e-5,
    weight_decay=0.01,
    logging_steps=50,
    seed=RANDOM_STATE,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_tok,
    eval_dataset=test_tok,
    compute_metrics=compute_metrics,
)

trainer.train()



Loading weights:   0%|          | 0/100 [00:00<?, ?it/s]

DistilBertForSequenceClassification LOAD REPORT from: distilbert-base-uncased
Key                     | Status     | 
------------------------+------------+-
vocab_layer_norm.bias   | UNEXPECTED | 
vocab_layer_norm.weight | UNEXPECTED | 
vocab_transform.weight  | UNEXPECTED | 
vocab_transform.bias    | UNEXPECTED | 
vocab_projector.bias    | UNEXPECTED | 
pre_classifier.weight   | MISSING    | 
classifier.bias         | MISSING    | 
pre_classifier.bias     | MISSING    | 
classifier.weight       | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
  super().__init__(loader)


Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro
1,No log,0.610272,0.636364,0.600215
2,0.614582,0.521046,0.734266,0.733209
3,0.614582,0.512206,0.741259,0.740446


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

  super().__init__(loader)


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

  super().__init__(loader)


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

There were missing keys in the checkpoint model loaded: ['distilbert.embeddings.LayerNorm.weight', 'distilbert.embeddings.LayerNorm.bias'].
There were unexpected keys in the checkpoint model loaded: ['distilbert.embeddings.LayerNorm.beta', 'distilbert.embeddings.LayerNorm.gamma'].


TrainOutput(global_step=96, training_loss=0.5310603181521097, metrics={'train_runtime': 110.5014, 'train_samples_per_second': 13.574, 'train_steps_per_second': 0.869, 'total_flos': 49675274496000.0, 'train_loss': 0.5310603181521097, 'epoch': 3.0})