# **Failure Analysis of DistilBERT**

**Goal:** Examining the distilBERT model’s worst prediction data point using SHAP when the training fraction is 0.7.

**Related Notebook:** [Text Classification by DistilBERT](06_text_classification_byDistilBERT.ipynb)

**Background:** \
In notebooks/06_text_classification_byDistilBERT.ipynb, we observed a sharp drop in performance when the training fraction was reduced to 0.7. To pinpoint the cause of this degradation, we performed a focused failure analysis.

**Objective:**
- Load the model and tokenizer from the local checkpoint to avoid retraining.
- Load and preprocess the AI-Label dataset.
- Reuse helper utilities to split the dataset, tokenize text, and compute class probabilities.
- Identify the weakest prediction by ranking samples with the highest negative log-likelihood of their true class.
- Interpret that worst-case prediction with SHAP to understand each token’s contribution.

**Result and Conclusion:**
- Almost all tokens pushed the model strongly toward the wrong class (high-anxious) while the true label was low-anxious. Tokens such as “disposable” (0.933) and “anymore” (0.934) heavily contributed to this false prediction.
- Possible Causes are:
    - Feature bias: With only 70 % of data, the model over-relied on sentiment-laden words. Economic frustration phrases appear similar—in embedding space—to genuinely anxious language, leading to misclassification.
    - Label noise: The text discusses money-related stress but was labeled low-anxious. Such borderline cases create noisy negatives, especially with smaller training sets.
- We think this is a systematic error, because the model overweights economically negative terms (“disposable income,” “luxury items,” “dead”) as anxiety markers.
- To improve this issue in the future, we recommend:
    - Increase the training fraction (for example, ≥0.9) or adding more training data to improve representation and reduce feature bias; see notebooks/06_text_classification_byDistilBERT.ipynb, where using 0.9 yields better performance.
    - Adding rule-based or auxiliary features to detect genuine anxiety cues (e.g., physiological or emotional expressions) while down-weighting general complaints can improve model calibration.

In [11]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.model_selection import train_test_split
from datasets import Dataset, DatasetDict
import pandas as pd
import numpy as np
from functools import partial
import torch
from transformers import TextClassificationPipeline
import torch.nn.functional as F
import shap

In [14]:
# Load the model and tokenizer from local checkpoint
path = "../notebooks/.ipynb_checkpoints/distilbert_hyperparam_0.7"
model = AutoModelForSequenceClassification.from_pretrained(path)
tokenizer = AutoTokenizer.from_pretrained(path)

In [3]:
# Prepare data
ai_label = pd.read_csv('../data/processed/simple_ai_labels.csv')
ai_label['anxiety_level'] = np.where(ai_label['ai_severity']>=3, 1,0)

raw_data = pd.read_parquet('../data/processed/reddit_anxiety_v1.parquet')
ai_label = ai_label.merge(raw_data[['post_id', 'text_all']], on='post_id', how='left')
ai_label = ai_label[['subreddit','text_all', 'anxiety_level']] 
display(ai_label.info())
display(ai_label['subreddit'].value_counts())
ai_label.head(2)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   subreddit      1000 non-null   object
 1   text_all       1000 non-null   object
 2   anxiety_level  1000 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 23.6+ KB


None

subreddit
unpopularopinion    157
Anxiety             151
TrueOffMyChest      143
economy             140
mentalhealth        125
OpenAI              122
GetMotivated        109
HealthAnxiety        53
Name: count, dtype: int64

Unnamed: 0,subreddit,text_all,anxiety_level
0,economy,cre: the other real estate debacle developing,0
1,Anxiety,scared i have dementia or early onset or somet...,1


In [None]:
# Helper functions
def split_data(indf, target_col = 'anxiety_level', test_size=0.2, val_size=0.1, random_state=42):
    """
    Split the dataframe into training, testing and validation sets.
    """
    train_df, test_df = train_test_split(indf, test_size=test_size, random_state=random_state, stratify=indf[target_col])
    train_val_df, val_df = train_test_split(train_df, test_size=val_size, random_state=random_state, stratify=train_df[target_col])

    dataset = DatasetDict({
        "train": Dataset.from_pandas(train_val_df.reset_index(drop=True)),
        "validation": Dataset.from_pandas(val_df.reset_index(drop=True)),
        "test": Dataset.from_pandas(test_df.reset_index(drop=True)),
    })
    return dataset

def tokenize(batch, text_col):
    return tokenizer(batch[text_col], truncation=True, padding=True)

def predict_proba(raw_texts):
    """
    Predict the class probabilities for the given texts.
    """
    if isinstance(raw_texts, str):
        texts = [raw_texts]
    else:
        texts = list(raw_texts)  # works for numpy arrays, tuples, etc.

    cleaned = []
    for t in texts:
        if t is None or (isinstance(t, float) and np.isnan(t)):
            cleaned.append("")
        else:
            cleaned.append(t if isinstance(t, str) else str(t))

    outputs = pipe(cleaned, truncation=True)
    if isinstance(outputs, dict):  # pipe returns dict for a single string
        outputs = [outputs]

    return np.array([[c["score"] for c in o] for o in outputs])

In [5]:
text_col = "text_all"

# split the data
dataset = split_data(ai_label, "anxiety_level", test_size=(1-0.7), val_size=0.1, random_state=42)
# tokenize the dataset
tokenized_dataset = dataset.map(partial(tokenize, text_col=text_col), batched=True)
tokenized_dataset = tokenized_dataset.rename_column("anxiety_level", "labels")
tokenized_dataset = tokenized_dataset.with_format("torch")
# Get evaluation dataset
eval_dataset = tokenized_dataset["validation"]

Map:   0%|          | 0/629 [00:00<?, ? examples/s]

Map:   0%|          | 0/70 [00:00<?, ? examples/s]

Map:   0%|          | 0/301 [00:00<?, ? examples/s]

In [6]:
# set text-classification pipeline
device = 0 if torch.cuda.is_available() else -1
pipe = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    return_all_scores=True,
    function_to_apply="softmax",
    device=device
)

label_col = "labels"  
tokenized_dataset
eval_dataset = tokenized_dataset["test"]

texts  = np.array(eval_dataset[text_col], dtype=object)
labels = np.array(eval_dataset[label_col])

# Predict probabilities and classes
probs = np.array([[c["score"] for c in out] for out in pipe(texts.tolist(), truncation=True)])
yhat  = probs.argmax(axis=1)

# Per-sample cross-entropy loss (higher = model struggled more)
logits = torch.tensor(probs).log()  # log-probs (since pipe gave softmax probs)
loss_per_sample = F.nll_loss(logits, torch.tensor(labels), reduction='none').numpy()

# Failure sets
mis_idx   = np.where(yhat != labels)[0]
hi_loss   = loss_per_sample.argsort()[::-1]   # descending by loss
K = 20  # how many to inspect
focus_idx = np.unique(np.concatenate([mis_idx[:K], hi_loss[:K]]))

focus_texts  = texts[focus_idx].tolist()
focus_labels = labels[focus_idx].tolist()
focus_preds  = yhat[focus_idx].tolist()
focus_losses = loss_per_sample[focus_idx].tolist()

Device set to use cpu


In [7]:
# Use the tokenizer to define a text masker
masker = shap.maskers.Text(tokenizer)

# Create the explainer (uses model via predict_proba)
explainer = shap.Explainer(predict_proba, masker)

# Extract the most "worst" samples
batch = list(focus_texts[0].split())
print(batch)

# Generate SHAP values
sv = explainer(batch)

# Visualize one sample explanation
shap.plots.text(sv) 

['friday', 'night', 'at', 'the', 'state', 'fair', 'been', 'here', 'for', 'hours', 'and', 'it', 'is', 'dead.', 'no', 'one', 'has', 'disposable', 'income', 'anymore', 'yea', 'we', 'wanted', 'to', 'go,', 'family', 'of', '3.', 'but', 'tickets', '1', 'ride', 'band', 'and', 'food', 'and', '$300', 'later.', '<cmt>', 'these', 'things', 'have', 'become', 'luxury', 'items', 'in', "today's", 'economy.', 'better', 'to', 'sit', 'at', 'home', 'and', 'stare', 'at', 'the', 'walls', 'cuz', 'cable', 'and', 'internet', 'are', 'expenise', 'too.', '$340', 'a', 'month.', "i'm", 'getting', 'rid', 'of', 'both.', '<cmt>', 'which', 'state', 'fair?']


PartitionExplainer explainer: 78it [00:44,  1.42it/s]                        
