# Mental‑Health Chatbot: Happy Brain

# Mental Health Chatbot: Training & Deployment Notebook

## Project Objective

This project aims to build an emotion-aware mental health chatbot that is both conversational and empathetic. The system leverages:
- **Emotion Classification:** Fine-tuned using `SamLowe/roberta-base-go_emotions` to detect multi-label emotions in user inputs.
- **Text Generation:** Two T5 models are fine-tuned for:
  - **Response Generation:** To generate supportive, therapist-style responses.
  - **Question-Answering (QA):** To provide factual answers when questions are asked.
- **Integrated Pipeline:** A routing mechanism that chooses the proper response mode (QA vs. supportive response) based on detected emotions and input type.
- **Deployment:** A Gradio interface enabling voice and text-based interactions.

## Notebook Structure and Sections

1. **Library Imports & Environment Setup**  
   Sets up all the necessary imports from standard libraries, data handling, PyTorch, Hugging Face, and Gradio.

2. **Data Loading & Preprocessing**  
   - Loads multiple mental health-related CSV datasets.
   - Performs cleaning and mapping to uniform "question"/"answer" columns.
   - Splits the combined dataset into training and testing sets.

3. **Multi-Label Emotion Annotation & Custom Trainer Setup**  
   - Processes multi-label emotion annotations using `MultiLabelBinarizer`.
   - Defines custom data collators and trainers to support binary cross-entropy loss for multi-label classification.
   - Tokenizes and prepares data for the emotion classifier model.

4. **Model Training: Emotion Classification**  
   - Fine-tunes the `SamLowe/roberta-base-go_emotions` model on the processed data.
   - Logs evaluation metrics such as micro F1, precision, recall, and subset accuracy.
   - Saves the fine-tuned emotion classifier model.

5. **Model Training: T5 for Response Generation**  
   - Constructs input/target pairs for response generation (user input -> supportive response).
   - Fine-tunes a T5 model (e.g., `t5-small`) with evaluation metrics like ROUGE and BERTScore.
   - Saves the fine-tuned T5 response generator model.

6. **Model Training: T5 for Question-Answering (QA)**  
   - Constructs QA pairs with a specific prompt format.
   - Fine-tunes a separate T5 model for QA tasks.
   - Saves the fine-tuned T5 QA model.

7. **Model Deployment Setup with Combined Metadata**  
   - Saves a combined metadata file (`combined_model_metadata.pt`) that stores the paths to all three fine-tuned models.
   - Updates the loading code to reference this metadata file, ensuring that future training runs build on previous improvements rather than overwriting models.

8. **Unified Pipeline & Gradio Interface**  
   - Integrates the emotion classifier, T5 response generator, and T5 QA models into a single pipeline.
   - Uses helper functions (for prompt formatting, emotion detection, audio transcription, etc.) to generate responses.
   - Provides a fully-functional Gradio interface supporting both text and voice inputs with options for language, chat history, and conversation logging.

9. **Evaluation**  
   - Evaluates the emotion classifier using metrics such as micro F1, precision, recall, and subset accuracy.
   - Evaluates the generation models (both response generation and QA) using metrics including ROUGE, BERTScore, perplexity, and loss.
   - Outputs formatted evaluation results for easy monitoring.

## Summary

This notebook offers a complete end-to-end pipeline—from data preprocessing and model fine-tuning to evaluation and interactive deployment via Gradio—for an emotion-aware mental health chatbot. The modular structure ensures that each component (emotion detection, response generation, QA) is individually optimized, and a combined metadata system supports incremental training and easy deployment.

## 1. Imports & Configuration

In [85]:
# ================================
# Standard Library Imports
# ================================
import os
import glob
import json
import time
import random
import itertools
import tempfile
import datetime
import logging
import warnings
from pathlib import Path
from dataclasses import dataclass
import pprint  # for pretty-printing outputs

# ================================
# Scientific & Data Libraries
# ================================
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from sklearn.metrics import f1_score
from sklearn.preprocessing import MultiLabelBinarizer

# ================================
# Audio Processing
# ================================
from pydub import AudioSegment
import speech_recognition as sr

# ================================
# NLP & Transformers (Hugging Face)
# ================================
from datasets import load_dataset, Dataset, concatenate_datasets
import evaluate
import gradio as gr

# Tokenizers and Models
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    T5ForConditionalGeneration,
)

# Training Tools
from transformers import (
    Trainer,
    TrainingArguments,
    Seq2SeqTrainer,
    Seq2SeqTrainingArguments,
    TrainerCallback,
    DataCollatorWithPadding,
    DataCollatorForSeq2Seq,
    default_data_collator,
)

# Logging
from transformers import logging as hf_logging

# Other
import math
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from torch.utils.data import DataLoader


In [86]:
# Set up logging

hf_logging.set_verbosity_info()                          
warnings.filterwarnings("ignore")
logging.basicConfig(level=logging.INFO)

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Running on: {device}")

# Reproducibility
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if device == "cuda":
    torch.cuda.manual_seed_all(SEED)

# Ensure save directories exist
SAVE_ROOT = Path("./saved_models")
for sub in ["emotion_classifier", "t5_response_generator", "t5_qa", "final_combined"]:
    (SAVE_ROOT / sub).mkdir(parents=True, exist_ok=True)

Running on: cuda


## Data Loading and Preprocessing

In [87]:
# Toggle individual CSVs and provide their column mapping
# Format: name: (enabled, path, question_col, answer_col)
DATASETS = {
    "ds1": (True,  "./data/ds1_transformed_mental_health_chatbot_dataset.csv",  "question", "answer"),
    "ds2": (False, "./data/ds2_transformed_mental_health_chatbot.csv",         "question", "answer"),
    "ds3": (False, "./data/ds3_mental_health_faq_cleaned.csv",                 "Question", "Answer"),
    "ds4": (False, "./data/ds4_mental_health_chatbot_dataset_merged_modes.csv","prompt",   "response"),
    "ds5": (False, "./data/ds5_Mental_Health_FAQ.csv",                         "Question", "Answer"),
    "ds6": (False, "./data/ds6_mental_health_counseling.csv",                  "query",    "completion"),
}

In [88]:
# Robust cleaner that auto-maps columns to 'question' / 'answer'
def load_and_clean(path, q_col, a_col):
    df = pd.read_csv(path)

    # Normalize headers
    df.columns = [c.lower().strip() for c in df.columns]
    q_col = q_col.lower().strip()
    a_col = a_col.lower().strip()

    # Common renames
    rename_map = {
        "prompt": "question",
        "response": "answer",
        "questions": "question",
        "answers": "answer",
    }
    df = df.rename(columns=rename_map)

    # If provided cols exist, rename them to standard names
    if q_col in df.columns:
        df = df.rename(columns={q_col: "question"})
    if a_col in df.columns:
        df = df.rename(columns={a_col: "answer"})

    # Try to map 'context' -> 'question' if needed
    if "question" not in df.columns and "context" in df.columns:
        df = df.rename(columns={"context": "question"})

    # Verify if necessary columns exist
    if not {"question", "answer"}.issubset(df.columns):
        raise ValueError(f"Could not find 'question'/'answer' in {path}. Available columns: {list(df.columns)}")

    # Retain only required columns, drop missing values and duplicates
    df = df[["question", "answer"]].dropna()
    df["question"] = df["question"].astype(str).str.strip().str.replace(r"\s+", " ", regex=True)
    df["answer"]   = df["answer"].astype(str).str.strip().str.replace(r"\s+", " ", regex=True)
    df = df.drop_duplicates()

    # Convert to Hugging Face Dataset
    return Dataset.from_pandas(df.reset_index(drop=True))

In [89]:
# Load enabled datasets and create a unified dataset
datasets = []
for key, (enabled, path, q_col, a_col) in DATASETS.items():
    if enabled:
        print(f"Loading dataset '{key}' from {path} ...")
        ds = load_and_clean(path, q_col, a_col)
        print(f"Loaded {len(ds)} examples from '{key}'.")
        datasets.append(ds)
    else:
        print(f"Skipping dataset '{key}' as its toggle is off.")

Loading dataset 'ds1' from ./data/ds1_transformed_mental_health_chatbot_dataset.csv ...
Loaded 172 examples from 'ds1'.
Skipping dataset 'ds2' as its toggle is off.
Skipping dataset 'ds3' as its toggle is off.
Skipping dataset 'ds4' as its toggle is off.
Skipping dataset 'ds5' as its toggle is off.
Skipping dataset 'ds6' as its toggle is off.


In [90]:
if datasets:
    combined_dataset = concatenate_datasets(datasets)
    print(f"\nCombined dataset contains {len(combined_dataset)} examples.")
else:
    raise ValueError("No datasets enabled. Please enable at least one dataset in DATASETS.")

# Shuffle and split into training and testing datasets (e.g., 90% train, 10% test)
combined_dataset = combined_dataset.shuffle(seed=42)
split_dataset = combined_dataset.train_test_split(test_size=0.1, seed=42)
train_dataset = split_dataset['train']
test_dataset = split_dataset['test']
print(f"Training set: {len(train_dataset)} examples, Testing set: {len(test_dataset)} examples.")


Combined dataset contains 172 examples.
Training set: 154 examples, Testing set: 18 examples.


### Multi-Label Emotion Annotation & Custom Trainer Setup

In [91]:
# ================================
# Multi-label Emotion Annotation & Trainer Setup
# ================================

# Set a seed if not already defined
SEED = 42

# -------------------------------
# Data Collator: casts labels to float32
# -------------------------------
def float_label_collator(features):
    """
    Wrap the default HF collator but cast the `labels` tensor to float32
    so BCEWithLogitsLoss gets the right dtype.
    """
    batch = default_data_collator(features)
    if "labels" in batch:
        batch["labels"] = batch["labels"].to(torch.float32)
    # Uncomment next line to print label info during debugging
    # print("collator labels dtype/shape:", batch["labels"].dtype, batch["labels"].shape)
    return batch

In [92]:
# -------------------------------
# Custom Trainer for Multi-Label Classification
# -------------------------------
class MultiLabelTrainer(Trainer):
    """
    Custom Trainer that computes loss using binary cross‑entropy with logits.
    This ensures multi‑label targets (e.g., emotions) are correctly processed.
    """
    def compute_loss(self, model, inputs, return_outputs: bool = False, **kwargs):
        labels = inputs.pop("labels").float()
        outputs = model(**inputs)
        logits = outputs.logits

        # If labels' shape does not match logits, reshape to match
        if labels.shape != logits.shape:
            labels = labels.view_as(logits)
        
        loss = F.binary_cross_entropy_with_logits(logits, labels, reduction="mean")
        return (loss, outputs) if return_outputs else loss

In [93]:
# -------------------------------
# Callback: Print training progress (optional)
# -------------------------------
class StepPrinter(TrainerCallback):
    """
    A Trainer callback that prints step-wise loss and evaluation metrics
    while keeping the tqdm progress bar.
    """
    def on_log(self, args, state, control, logs=None, **kwargs):
        if not logs or not state.is_local_process_zero:
            return
        if "loss" in logs:
            print(f"Step {state.global_step:>6} • loss {logs['loss']:.4f}")
        if "eval_loss" in logs:
            metric = logs.get("micro_f1") or logs.get("bertscore_f1") or logs.get("rougeL")
            metric_str = f" • metric {metric:.4f}" if metric is not None else ""
            print(f"Epoch {int(state.epoch)}/{int(args.num_train_epochs)}"
                  f" • eval_loss {logs['eval_loss']:.4f}{metric_str}")

In [94]:
# -------------------------------
# Define Emotion Labels (28 total: 27 + neutral)
# -------------------------------
GO_EMOTION_LABELS = [
    'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring',
    'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval',
    'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief',
    'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief',
    'remorse', 'sadness', 'surprise', 'neutral'
]
num_labels = len(GO_EMOTION_LABELS)

In [95]:
# -------------------------------
# Annotate each example with multi-label emotion annotations.
# For this demo, we simulate emotion annotations: if an example has no emotion,
# we default to "neutral". Replace this with your real emotion annotations if available.
# -------------------------------
def annotate_emotions(example):
    emos = example.get("emotions", [])
    
    # If no explicit emotion annotations, default to ["neutral"]
    if not emos:
        emos = ["neutral"]
    
    # Save original emotions for visibility (optional)
    example["emotions"] = emos
    
    # Create a binary label vector for each of the 28 emotions
    example["labels"] = [1.0 if lbl in emos else 0.0 for lbl in GO_EMOTION_LABELS]
    return example

In [96]:
# -------------------------------
# Helper to retrieve input text from an example (for debugging)
# -------------------------------
def get_input_text(example):
    return example.get("text") or example.get("question") or "[NO TEXT FOUND]"

# -------------------------------
# (Re‑)load datasets for emotion annotation if not already loaded.
# -------------------------------
if 'train_dataset' not in globals() or 'test_dataset' not in globals():
    print("Reloading datasets for emotion annotation ...")
    datasets_list = []
    for name, (enabled, path, q_col, a_col) in DATASETS.items():
        if not enabled:
            continue
        ds = load_and_clean(path=path, q_col=q_col, a_col=a_col)
        datasets_list.append(ds)
    if not datasets_list:
        print("No datasets were enabled, using a fallback test dataset.")
        fallback_data = {
            "text": [
                "How are you?",
                "I feel really down today.",
                "I'm so happy with my progress!",
                "Why does nobody understand me?",
                "I'm feeling anxious about school.",
                "Life is good lately.",
                "Sometimes I just want to cry.",
                "Everything is falling apart.",
                "I’m grateful for my therapist.",
                "Can someone please just listen?"
            ]
        }
        ds = Dataset.from_dict(fallback_data)
        datasets_list.append(ds)
    full_ds = concatenate_datasets(datasets_list) if len(datasets_list) > 1 else datasets_list[0]
    full_ds = full_ds.shuffle(seed=SEED)
    split = full_ds.train_test_split(test_size=0.1, seed=SEED)
    train_dataset, test_dataset = split["train"], split["test"]

print(f"Train dataset: {len(train_dataset):,} examples • Test dataset: {len(test_dataset):,} examples")

Train dataset: 154 examples • Test dataset: 18 examples


In [97]:
# -------------------------------
# Map emotion annotations onto training and testing datasets
# -------------------------------
emo_train = train_dataset.map(annotate_emotions)
emo_test  = test_dataset.map(annotate_emotions)

print("Sample annotation:")
print(get_input_text(emo_train[0]), "->", emo_train[0]["emotions"])

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Sample annotation:
How does someone acquire a mental illness? -> ['neutral']


In [98]:
# -------------------------------
# Filter out any examples without at least one positive label (shouldn't happen after defaulting to neutral)
# -------------------------------
def has_nonzero_labels(example):
    return sum(example["labels"]) > 0

emo_train = emo_train.filter(has_nonzero_labels)
emo_test = emo_test.filter(has_nonzero_labels)

# Check for any problematic labels
bad_labels = [ex for ex in emo_train if "labels" not in ex or sum(ex["labels"]) == 0]
print("Number of examples with problematic labels:", len(bad_labels))
if bad_labels:
    print("Example with problematic labels:", bad_labels[0])

Filter:   0%|          | 0/154 [00:00<?, ? examples/s]

Filter:   0%|          | 0/18 [00:00<?, ? examples/s]

Number of examples with problematic labels: 0


In [99]:
# -------------------------------
# Load the tokenizer for emotion classification.
# We'll be using the 'SamLowe/roberta-base-go_emotions' tokenizer.
# -------------------------------
emo_tokenizer = AutoTokenizer.from_pretrained("SamLowe/roberta-base-go_emotions")

# Print columns before renaming for clarity
print("Before column rename:", emo_train.column_names)
# Rename 'question' to 'text' if needed for tokenization
if "question" in emo_train.column_names:
    emo_train = emo_train.rename_column("question", "text")
if "question" in emo_test.column_names:
    emo_test = emo_test.rename_column("question", "text")
print("After column rename:", emo_train.column_names)

loading file vocab.json from cache at C:\Users\mward\.cache\huggingface\hub\models--SamLowe--roberta-base-go_emotions\snapshots\58b6c5b44a7a12093f782442969019c7e2982299\vocab.json
loading file merges.txt from cache at C:\Users\mward\.cache\huggingface\hub\models--SamLowe--roberta-base-go_emotions\snapshots\58b6c5b44a7a12093f782442969019c7e2982299\merges.txt
loading file tokenizer.json from cache at C:\Users\mward\.cache\huggingface\hub\models--SamLowe--roberta-base-go_emotions\snapshots\58b6c5b44a7a12093f782442969019c7e2982299\tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at C:\Users\mward\.cache\huggingface\hub\models--SamLowe--roberta-base-go_emotions\snapshots\58b6c5b44a7a12093f782442969019c7e2982299\special_tokens_map.json
loading file tokenizer_config.json from cache at C:\Users\mward\.cache\huggingface\hub\models--SamLowe--roberta-base-go_emotions\snapshots\58b6c5b44a7a12093f782442969019c7e2982299\tokenizer_config

Before column rename: ['question', 'answer', 'emotions', 'labels']
After column rename: ['text', 'answer', 'emotions', 'labels']


In [100]:
# -------------------------------
# Tokenization: tokenize using the 'text' column.
# -------------------------------
def emo_tokenize(batch):
    return emo_tokenizer(batch["text"], padding=True, truncation=True)

In [101]:
# -------------------------------
# Function to cast labels to float32 using numpy
# -------------------------------
def cast_to_float(example):
    example["labels"] = np.array(example["labels"], dtype=np.float32)
    return example

In [102]:
# Tokenize and cast labels for both training and testing sets (batched processing)
emo_train_tok = emo_train.map(emo_tokenize, batched=True).map(cast_to_float)
emo_test_tok  = emo_test.map(emo_tokenize, batched=True).map(cast_to_float)

# Set the dataset format for PyTorch
emo_train_tok.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
emo_test_tok.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

print(f"Tokenized training examples: {len(emo_train_tok)} • Tokenized testing examples: {len(emo_test_tok)}")


Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Tokenized training examples: 154 • Tokenized testing examples: 18


## 4. Train Emotion Classifier (RoBERTa)

In [103]:
# Before training, check if a fine-tuned model exists and load it.

if os.path.isdir(SAVE_ROOT / "emotion_classifier"):
    print("Loading previously fine-tuned emotion model...")
    emo_model = AutoModelForSequenceClassification.from_pretrained(SAVE_ROOT / "emotion_classifier").to(device)
else:
    print("Loading base emotion model...")
    emo_model = AutoModelForSequenceClassification.from_pretrained(
        "SamLowe/roberta-base-go_emotions",
        problem_type="multi_label_classification",
        num_labels=len(GO_EMOTION_LABELS)
    ).to(device)

loading configuration file saved_models\emotion_classifier\config.json
Model config RobertaConfig {
  "_name_or_path": "saved_models\\emotion_classifier",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "admiration",
    "1": "amusement",
    "2": "anger",
    "3": "annoyance",
    "4": "approval",
    "5": "caring",
    "6": "confusion",
    "7": "curiosity",
    "8": "desire",
    "9": "disappointment",
    "10": "disapproval",
    "11": "disgust",
    "12": "embarrassment",
    "13": "excitement",
    "14": "fear",
    "15": "gratitude",
    "16": "grief",
    "17": "joy",
    "18": "love",
    "19": "nervousness",
    "20": "optimism",
    "21": "pride",
    "22": "realization",
    "23": "relief",
    "24": "remorse",
    "25": "sadness",
    "26": "surpri

Loading previously fine-tuned emotion model...


In [104]:
# ===============================================
# Model Training: Emotion Classification with RoBERTa
# ===============================================

# Define device (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Training device:", device)

Training device: cuda


In [105]:
# -------------------------------
# Tokenization for Model Training: Use fixed max_length
# -------------------------------
def emo_tokenize(batch):
    return emo_tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=128
    )

In [106]:
# Float conversion: ensures labels become float32 arrays.
def cast_to_float(example):
    example["labels"] = np.array(example["labels"], dtype=np.float32)
    return example

In [107]:
# Re-tokenize the datasets using the new tokenization function
emo_train_tok = emo_train.map(emo_tokenize, batched=True).map(cast_to_float)
emo_test_tok  = emo_test.map(emo_tokenize, batched=True).map(cast_to_float)

# Set format to PyTorch tensors
emo_train_tok.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
emo_test_tok.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

print(f"After tokenization: {len(emo_train_tok)} training examples; {len(emo_test_tok)} testing examples.")

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

After tokenization: 154 training examples; 18 testing examples.


In [108]:
# -------------------------------
# Load the pre-trained emotion classification model
# -------------------------------
emo_model = AutoModelForSequenceClassification.from_pretrained(
    "SamLowe/roberta-base-go_emotions",
    problem_type="multi_label_classification",
    num_labels=num_labels,  # equals len(GO_EMOTION_LABELS)
).to(device)
print("Loaded model:", emo_model.config._name_or_path)

loading configuration file config.json from cache at C:\Users\mward\.cache\huggingface\hub\models--SamLowe--roberta-base-go_emotions\snapshots\58b6c5b44a7a12093f782442969019c7e2982299\config.json
Model config RobertaConfig {
  "_name_or_path": "SamLowe/roberta-base-go_emotions",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "admiration",
    "1": "amusement",
    "2": "anger",
    "3": "annoyance",
    "4": "approval",
    "5": "caring",
    "6": "confusion",
    "7": "curiosity",
    "8": "desire",
    "9": "disappointment",
    "10": "disapproval",
    "11": "disgust",
    "12": "embarrassment",
    "13": "excitement",
    "14": "fear",
    "15": "gratitude",
    "16": "grief",
    "17": "joy",
    "18": "love",
    "19": "nervousness",
    "20": "optimism"

Loaded model: SamLowe/roberta-base-go_emotions


In [109]:
# -------------------------------
# Define Evaluation Metrics for Emotion Classification
# -------------------------------
def compute_emo_metrics(pred):
    logits, labels = pred
    # Compute probabilities using sigmoid on logits
    probs = torch.sigmoid(torch.tensor(logits))
    # Apply threshold (0.3) to decide positive labels
    preds = (probs > 0.3).int().numpy()
    labels = np.array(labels)

    # Defensive check: only consider rows with at least one positive label
    mask = labels.sum(axis=1) > 0
    if mask.sum() == 0:
        print("Warning: all evaluation labels are empty")
        return {"micro_f1": 0.0}

    try:
        f1 = f1_score(labels[mask], preds[mask], average="micro", zero_division=0)
    except ValueError as e:
        print("Metric error:", e)
        f1 = 0.0

    return {"micro_f1": f1}

In [110]:
# -------------------------------
# Set Up Training Arguments
# -------------------------------
# Define a root folder for saving model checkpoints if not defined already
from pathlib import Path
SAVE_ROOT = Path("./saved_models")
emo_args = TrainingArguments(
    output_dir=str(SAVE_ROOT / "emotion_classifier"),
    
    # Logging & Reporting
    logging_strategy="steps",
    logging_steps=10,
    logging_dir="./logs",
    report_to="none",
    
    # Training hyper‑parameters
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=2e-5,
    num_train_epochs=3,
    
    # Evaluation and checkpointing settings
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="micro_f1",
    greater_is_better=True,
    
    seed=SEED,
)

PyTorch: setting up devices


In [111]:
# -------------------------------
# Instantiate the Custom Trainer for Multi-Label Classification
# -------------------------------
trainer_emo = MultiLabelTrainer(
    model=emo_model,
    args=emo_args,
    train_dataset=emo_train_tok,
    eval_dataset=emo_test_tok,
    tokenizer=emo_tokenizer,
    data_collator=float_label_collator,
    compute_metrics=compute_emo_metrics,
    callbacks=[StepPrinter],
)

In [112]:
# -------------------------------
# Start Training (uncomment the next line to train)
# -------------------------------
trainer_emo.train()

# -------------------------------
# Save the Best Model and Tokenizer
# -------------------------------
# Create the directory structure if it doesn't exist
SAVE_ROOT.mkdir(exist_ok=True, parents=True)
(SAVE_ROOT / "emotion_classifier").mkdir(exist_ok=True, parents=True)

emo_model.save_pretrained(SAVE_ROOT / "emotion_classifier")
emo_tokenizer.save_pretrained(SAVE_ROOT / "emotion_classifier")

# Also save the trainer's final model state
trainer_emo.save_model()
print("Emotion classifier model and tokenizer saved to", SAVE_ROOT / "emotion_classifier")


The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text, emotions, answer. If text, emotions, answer are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 154
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 30
  Number of trainable parameters = 124,667,164


Epoch,Training Loss,Validation Loss,Micro F1
1,0.0296,0.003451,1.0
2,0.0036,0.002816,1.0
3,0.0032,0.002669,1.0


The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text, emotions, answer. If text, emotions, answer are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 18
  Batch size = 16


Step     10 • loss 0.0296
Epoch 1/3 • eval_loss 0.0035


Saving model checkpoint to saved_models\emotion_classifier\checkpoint-10
Configuration saved in saved_models\emotion_classifier\checkpoint-10\config.json
Model weights saved in saved_models\emotion_classifier\checkpoint-10\model.safetensors
tokenizer config file saved in saved_models\emotion_classifier\checkpoint-10\tokenizer_config.json
Special tokens file saved in saved_models\emotion_classifier\checkpoint-10\special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text, emotions, answer. If text, emotions, answer are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 18
  Batch size = 16


Step     20 • loss 0.0036
Epoch 2/3 • eval_loss 0.0028


Saving model checkpoint to saved_models\emotion_classifier\checkpoint-20
Configuration saved in saved_models\emotion_classifier\checkpoint-20\config.json
Model weights saved in saved_models\emotion_classifier\checkpoint-20\model.safetensors
tokenizer config file saved in saved_models\emotion_classifier\checkpoint-20\tokenizer_config.json
Special tokens file saved in saved_models\emotion_classifier\checkpoint-20\special_tokens_map.json
Saving model checkpoint to saved_models\emotion_classifier\checkpoint-30
Configuration saved in saved_models\emotion_classifier\checkpoint-30\config.json


Step     30 • loss 0.0032


Model weights saved in saved_models\emotion_classifier\checkpoint-30\model.safetensors
tokenizer config file saved in saved_models\emotion_classifier\checkpoint-30\tokenizer_config.json
Special tokens file saved in saved_models\emotion_classifier\checkpoint-30\special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text, emotions, answer. If text, emotions, answer are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 18
  Batch size = 16


Epoch 3/3 • eval_loss 0.0027


Saving model checkpoint to saved_models\emotion_classifier\checkpoint-30
Configuration saved in saved_models\emotion_classifier\checkpoint-30\config.json
Model weights saved in saved_models\emotion_classifier\checkpoint-30\model.safetensors
tokenizer config file saved in saved_models\emotion_classifier\checkpoint-30\tokenizer_config.json
Special tokens file saved in saved_models\emotion_classifier\checkpoint-30\special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from saved_models\emotion_classifier\checkpoint-10 (score: 1.0).
Configuration saved in saved_models\emotion_classifier\config.json
Model weights saved in saved_models\emotion_classifier\model.safetensors
tokenizer config file saved in saved_models\emotion_classifier\tokenizer_config.json
Special tokens file saved in saved_models\emotion_classifier\special_tokens_map.json
Saving model checkpoint to saved_models\emotion_classifier
Configuration saved in

Emotion classifier model and tokenizer saved to saved_models\emotion_classifier


## 5. Train T5 for Response Generation

In [113]:
# Before training, check if a fine-tuned model exists and load it.

if os.path.isdir(SAVE_ROOT / "t5_response_generator"):
    print("Loading previously fine-tuned T5 response generator model...")
    resp_model = T5ForConditionalGeneration.from_pretrained(SAVE_ROOT / "t5_response_generator").to(device)
else:
    print("Loading base T5 response generator model...")
    resp_model = T5ForConditionalGeneration.from_pretrained("t5-small").to(device)

loading configuration file saved_models\t5_response_generator\config.json
Model config T5Config {
  "_name_or_path": "t5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
   

Loading previously fine-tuned T5 response generator model...


All model checkpoint weights were used when initializing T5ForConditionalGeneration.

All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at saved_models\t5_response_generator.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.
loading configuration file saved_models\t5_response_generator\generation_config.json
Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}



In [114]:
# ===============================================
# Train T5 for Response Generation
# ===============================================
# Build input/target pairs: user text -> helpful response
# For now we use 'question' as input and 'answer' as target.
def build_t5_pairs(example):
    # Retrieve the question from either "question" or "text" columns,
    # and the corresponding answer from either "answer" or "response" columns.
    question = example.get("question") or example.get("text") or ""
    answer = example.get("answer") or example.get("response") or ""
    example["input_text"] = "respond: " + question
    example["target_text"] = answer
    return example

In [115]:
# Map the original training and testing datasets to build T5 pairs.
# We're using the original QA datasets (train_dataset and test_dataset) from our data-loading cell.
resp_train = train_dataset.map(build_t5_pairs)
resp_test  = test_dataset.map(build_t5_pairs)

# Load T5 tokenizer and model.
t5_resp_model_name = "t5-small"
tokenizer_t5 = AutoTokenizer.from_pretrained(t5_resp_model_name)

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

loading file spiece.model from cache at C:\Users\mward\.cache\huggingface\hub\models--t5-small\snapshots\df1b051c49625cf57a3d0d8d3863ed4d13564fe4\spiece.model
loading file tokenizer.json from cache at C:\Users\mward\.cache\huggingface\hub\models--t5-small\snapshots\df1b051c49625cf57a3d0d8d3863ed4d13564fe4\tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at C:\Users\mward\.cache\huggingface\hub\models--t5-small\snapshots\df1b051c49625cf57a3d0d8d3863ed4d13564fe4\tokenizer_config.json
loading file chat_template.jinja from cache at None


In [116]:
# Tokenize the input and target texts for T5.
def t5_tokenize(batch):
    # Tokenize the input text.
    model_inputs = tokenizer_t5(batch["input_text"], max_length=128, truncation=True)
    # Prepare target text tokenization; this context manager sets the tokenizer into target mode.
    with tokenizer_t5.as_target_tokenizer():
        labels = tokenizer_t5(batch["target_text"], max_length=128, truncation=True)
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

In [117]:
# Tokenize the training and testing T5 pairs.
resp_train_tok = resp_train.map(t5_tokenize, batched=True, remove_columns=resp_train.column_names)
resp_test_tok  = resp_test.map(t5_tokenize, batched=True, remove_columns=resp_test.column_names)

# Set datasets' format to output PyTorch tensors.
resp_train_tok.set_format("torch")
resp_test_tok.set_format("torch")

print(f"Tokenized training examples: {len(resp_train_tok)}; Tokenized testing examples: {len(resp_test_tok)}")

# Instantiate the T5 model for conditional generation and send it to the appropriate device.
resp_model = T5ForConditionalGeneration.from_pretrained(t5_resp_model_name).to(device)
print("Loaded T5 model:", t5_resp_model_name)

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Tokenized training examples: 154; Tokenized testing examples: 18


loading configuration file config.json from cache at C:\Users\mward\.cache\huggingface\hub\models--t5-small\snapshots\df1b051c49625cf57a3d0d8d3863ed4d13564fe4\config.json
Model config T5Config {
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size":

Loaded T5 model: t5-small


In [118]:
# -----------------------------------------------
# Define evaluation metrics: ROUGE and BERTScore
# -----------------------------------------------
rouge = evaluate.load("rouge")
bertscore = evaluate.load("bertscore")

def compute_resp_metrics(eval_pred):
    preds, labels = eval_pred
    # Replace -100 in the labels (which are ignored) with the pad_token_id so they can be decoded.
    labels = np.where(labels != -100, labels, tokenizer_t5.pad_token_id)
    
    # Decode predictions and labels.
    decoded_preds = tokenizer_t5.batch_decode(preds, skip_special_tokens=True)
    decoded_labels = tokenizer_t5.batch_decode(labels, skip_special_tokens=True)
    
    # Compute ROUGE.
    r = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    # Compute BERTScore.
    b = bertscore.compute(predictions=decoded_preds, references=decoded_labels, lang="en")
    
    return {
        "rougeL": r["rougeL"],
        "bertscore_f1": np.mean(b["f1"])
    }

In [119]:
# -----------------------------------------------
# Set up the Seq2Seq training arguments.
# -----------------------------------------------
resp_args = Seq2SeqTrainingArguments(
    output_dir=str(SAVE_ROOT / "t5_response_generator"),
    
    # Logging configuration.
    logging_strategy="steps",
    logging_steps=10,
    logging_dir="./logs",
    report_to="none",
    
    # Core hyper‑parameters.
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=3e-4,
    num_train_epochs=3,
    
    # Evaluation and checkpointing.
    evaluation_strategy="epoch",
    save_strategy="epoch",
    predict_with_generate=True,
    seed=SEED,
)

PyTorch: setting up devices


In [120]:
# -----------------------------------------------
# Create a data collator for T5 using Hugging Face's helper.
# -----------------------------------------------
from transformers import DataCollatorForSeq2Seq
data_collator = DataCollatorForSeq2Seq(tokenizer_t5, model=resp_model)

# Instantiate the Seq2Seq trainer with our model, tokenized data, metrics, and callbacks.
trainer_resp = Seq2SeqTrainer(
    model=resp_model,
    args=resp_args,
    train_dataset=resp_train_tok,
    eval_dataset=resp_test_tok,
    tokenizer=tokenizer_t5,
    data_collator=data_collator,
    compute_metrics=compute_resp_metrics,
    callbacks=[StepPrinter],
)

In [121]:
# -----------------------------------------------
# Begin training.
# -----------------------------------------------
# Uncomment the line below to start training. Note that training may take a while.
trainer_resp.train()

***** Running training *****
  Num examples = 154
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 60
  Number of trainable parameters = 60,506,624


Epoch,Training Loss,Validation Loss,Rougel,Bertscore F1
1,3.307,2.634834,0.036774,0.190253
2,3.0332,2.564415,0.0465,0.245034
3,3.0034,2.540509,0.039732,0.190298


Step     10 • loss 3.9551



***** Running Evaluation *****
  Num examples = 18
  Batch size = 8


Step     20 • loss 3.3070


INFO:absl:Using default tokenizer.
loading configuration file config.json from cache at C:\Users\mward\.cache\huggingface\hub\models--roberta-large\snapshots\722cf37b1afa9454edce342e7895e588b6ff1d59\config.json
Model config RobertaConfig {
  "_name_or_path": "roberta-large",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.49.0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading file vocab.json from cache at C:\Users\mward\.cache\huggingface\hub\models--roberta-large\snapshots\722cf37

Epoch 1/3 • eval_loss 2.6348


Saving model checkpoint to saved_models\t5_response_generator\checkpoint-20
Configuration saved in saved_models\t5_response_generator\checkpoint-20\config.json
Configuration saved in saved_models\t5_response_generator\checkpoint-20\generation_config.json
Model weights saved in saved_models\t5_response_generator\checkpoint-20\model.safetensors
tokenizer config file saved in saved_models\t5_response_generator\checkpoint-20\tokenizer_config.json
Special tokens file saved in saved_models\t5_response_generator\checkpoint-20\special_tokens_map.json
Copy vocab file to saved_models\t5_response_generator\checkpoint-20\spiece.model


Step     30 • loss 3.0956



***** Running Evaluation *****
  Num examples = 18
  Batch size = 8


Step     40 • loss 3.0332


INFO:absl:Using default tokenizer.


Epoch 2/3 • eval_loss 2.5644


Saving model checkpoint to saved_models\t5_response_generator\checkpoint-40
Configuration saved in saved_models\t5_response_generator\checkpoint-40\config.json
Configuration saved in saved_models\t5_response_generator\checkpoint-40\generation_config.json
Model weights saved in saved_models\t5_response_generator\checkpoint-40\model.safetensors
tokenizer config file saved in saved_models\t5_response_generator\checkpoint-40\tokenizer_config.json
Special tokens file saved in saved_models\t5_response_generator\checkpoint-40\special_tokens_map.json
Copy vocab file to saved_models\t5_response_generator\checkpoint-40\spiece.model


Step     50 • loss 2.9013


Saving model checkpoint to saved_models\t5_response_generator\checkpoint-60
Configuration saved in saved_models\t5_response_generator\checkpoint-60\config.json
Configuration saved in saved_models\t5_response_generator\checkpoint-60\generation_config.json


Step     60 • loss 3.0034


Model weights saved in saved_models\t5_response_generator\checkpoint-60\model.safetensors
tokenizer config file saved in saved_models\t5_response_generator\checkpoint-60\tokenizer_config.json
Special tokens file saved in saved_models\t5_response_generator\checkpoint-60\special_tokens_map.json
Copy vocab file to saved_models\t5_response_generator\checkpoint-60\spiece.model

***** Running Evaluation *****
  Num examples = 18
  Batch size = 8
INFO:absl:Using default tokenizer.


Epoch 3/3 • eval_loss 2.5405


Saving model checkpoint to saved_models\t5_response_generator\checkpoint-60
Configuration saved in saved_models\t5_response_generator\checkpoint-60\config.json
Configuration saved in saved_models\t5_response_generator\checkpoint-60\generation_config.json
Model weights saved in saved_models\t5_response_generator\checkpoint-60\model.safetensors
tokenizer config file saved in saved_models\t5_response_generator\checkpoint-60\tokenizer_config.json
Special tokens file saved in saved_models\t5_response_generator\checkpoint-60\special_tokens_map.json
Copy vocab file to saved_models\t5_response_generator\checkpoint-60\spiece.model


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=60, training_loss=3.215902900695801, metrics={'train_runtime': 21.3603, 'train_samples_per_second': 21.629, 'train_steps_per_second': 2.809, 'total_flos': 3241859088384.0, 'train_loss': 3.215902900695801, 'epoch': 3.0})

In [122]:
# -----------------------------------------------
# Robustly save the trained T5 response generation model and tokenizer.
# -----------------------------------------------
(SAVE_ROOT / "t5_response_generator").mkdir(exist_ok=True, parents=True)
resp_model.save_pretrained(SAVE_ROOT / "t5_response_generator")
tokenizer_t5.save_pretrained(SAVE_ROOT / "t5_response_generator")
trainer_resp.save_model()  # Save trainer's model state.

print("T5 response generator model and tokenizer saved to", SAVE_ROOT / "t5_response_generator")

Configuration saved in saved_models\t5_response_generator\config.json
Configuration saved in saved_models\t5_response_generator\generation_config.json
Model weights saved in saved_models\t5_response_generator\model.safetensors
tokenizer config file saved in saved_models\t5_response_generator\tokenizer_config.json
Special tokens file saved in saved_models\t5_response_generator\special_tokens_map.json
Copy vocab file to saved_models\t5_response_generator\spiece.model
Saving model checkpoint to saved_models\t5_response_generator
Configuration saved in saved_models\t5_response_generator\config.json
Configuration saved in saved_models\t5_response_generator\generation_config.json
Model weights saved in saved_models\t5_response_generator\model.safetensors
tokenizer config file saved in saved_models\t5_response_generator\tokenizer_config.json
Special tokens file saved in saved_models\t5_response_generator\special_tokens_map.json
Copy vocab file to saved_models\t5_response_generator\spiece.mode

T5 response generator model and tokenizer saved to saved_models\t5_response_generator


## Train T5 for Question‑Answer

In [123]:
# Before training, check if a fine-tuned model exists and load it.

if os.path.isdir(SAVE_ROOT / "t5_qa"):
    print("Loading previously fine-tuned T5 QA model...")
    qa_model = T5ForConditionalGeneration.from_pretrained(SAVE_ROOT / "t5_qa").to(device)
else:
    print("Loading base T5 QA model...")
    qa_model = T5ForConditionalGeneration.from_pretrained("t5-small").to(device)

loading configuration file saved_models\t5_qa\config.json
Model config T5Config {
  "_name_or_path": "t5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en

Loading previously fine-tuned T5 QA model...


All model checkpoint weights were used when initializing T5ForConditionalGeneration.

All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at saved_models\t5_qa.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.
loading configuration file saved_models\t5_qa\generation_config.json
Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}



In [124]:
# ===============================================
# Train T5 for Question‑Answer
# ===============================================
print("Training T5 for the Question‑Answer task.")

# Load (or initialize) the QA model and tokenizer.
# Here we fine-tune the base T5 model (t5-small) for QA.
qa_model = T5ForConditionalGeneration.from_pretrained(t5_resp_model_name).to(device)
# Note: We're reusing tokenizer_t5 from the response generation section.

# Create a DataCollator for Seq2Seq tasks.
qa_collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer_t5,
    model=qa_model,
    padding=True
)

loading configuration file config.json from cache at C:\Users\mward\.cache\huggingface\hub\models--t5-small\snapshots\df1b051c49625cf57a3d0d8d3863ed4d13564fe4\config.json


Training T5 for the Question‑Answer task.


Model config T5Config {
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_de": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
 

In [125]:
# Build QA pairs: "question: <text>" -> answer.
def build_qa_pairs(example):
    # Retrieve the question from 'question' (or 'text') and answer from 'answer' (or 'response').
    question = example.get("question") or example.get("text") or ""
    answer = example.get("answer") or example.get("response") or ""
    example["input_text"] = "question: " + question
    example["target_text"] = answer
    return example

In [126]:
# Map the QA pair-building function onto the training and testing datasets.
qa_train = train_dataset.map(build_qa_pairs)
qa_test  = test_dataset.map(build_qa_pairs)

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

In [127]:
# Tokenize the QA pairs.
def qa_tokenize(batch):
    # Tokenize the input text.
    model_inputs = tokenizer_t5(batch["input_text"], max_length=128, truncation=True)
    # Tokenize the target (answer) text.
    with tokenizer_t5.as_target_tokenizer():
        labels = tokenizer_t5(batch["target_text"], max_length=128, truncation=True)
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

In [128]:
qa_train_tok = qa_train.map(qa_tokenize, batched=True, remove_columns=qa_train.column_names)
qa_test_tok  = qa_test.map(qa_tokenize, batched=True, remove_columns=qa_test.column_names)

# Set the format for PyTorch tensors.
qa_train_tok.set_format("torch")
qa_test_tok.set_format("torch")

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

In [129]:
print(f"Tokenized QA training examples: {len(qa_train_tok)}; Tokenized QA testing examples: {len(qa_test_tok)}")

# Define Seq2Seq training arguments for the QA task.
qa_args = Seq2SeqTrainingArguments(
    output_dir=str(SAVE_ROOT / "t5_qa"),
    logging_strategy="steps",
    logging_steps=10,
    logging_dir="./logs",
    report_to="none",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=3e-4,
    num_train_epochs=3,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    predict_with_generate=True,
    seed=SEED,
)

PyTorch: setting up devices


Tokenized QA training examples: 154; Tokenized QA testing examples: 18


In [130]:
# Instantiate the Seq2Seq trainer for the QA task.
trainer_qa = Seq2SeqTrainer(
    model=qa_model,
    args=qa_args,
    train_dataset=qa_train_tok,
    eval_dataset=qa_test_tok,
    tokenizer=tokenizer_t5,
    data_collator=qa_collator,
    compute_metrics=compute_resp_metrics,  # Reusing the same metrics function as for response generation.
    callbacks=[StepPrinter],
)

In [131]:
# Uncomment the following line to start training (training may take some time):
trainer_qa.train()

# -------------------------------
# Save the trained QA model and tokenizer
# -------------------------------
(SAVE_ROOT / "t5_qa").mkdir(exist_ok=True, parents=True)
qa_model.save_pretrained(SAVE_ROOT / "t5_qa")
tokenizer_t5.save_pretrained(SAVE_ROOT / "t5_qa")
trainer_qa.save_model()
tokenizer_t5.save_pretrained(qa_args.output_dir)

print("T5 QA model and tokenizer saved to", SAVE_ROOT / "t5_qa")

***** Running training *****
  Num examples = 154
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 60
  Number of trainable parameters = 60,506,624


Epoch,Training Loss,Validation Loss,Rougel,Bertscore F1
1,3.259,2.643346,0.141225,0.588208
2,3.0177,2.552439,0.150305,0.634685
3,3.0205,2.539231,0.159643,0.68306


Step     10 • loss 3.6964



***** Running Evaluation *****
  Num examples = 18
  Batch size = 8


Step     20 • loss 3.2590


INFO:absl:Using default tokenizer.


Epoch 1/3 • eval_loss 2.6433


Saving model checkpoint to saved_models\t5_qa\checkpoint-20
Configuration saved in saved_models\t5_qa\checkpoint-20\config.json
Configuration saved in saved_models\t5_qa\checkpoint-20\generation_config.json
Model weights saved in saved_models\t5_qa\checkpoint-20\model.safetensors
tokenizer config file saved in saved_models\t5_qa\checkpoint-20\tokenizer_config.json
Special tokens file saved in saved_models\t5_qa\checkpoint-20\special_tokens_map.json
Copy vocab file to saved_models\t5_qa\checkpoint-20\spiece.model


Step     30 • loss 3.0657



***** Running Evaluation *****
  Num examples = 18
  Batch size = 8


Step     40 • loss 3.0177


INFO:absl:Using default tokenizer.


Epoch 2/3 • eval_loss 2.5524


Saving model checkpoint to saved_models\t5_qa\checkpoint-40
Configuration saved in saved_models\t5_qa\checkpoint-40\config.json
Configuration saved in saved_models\t5_qa\checkpoint-40\generation_config.json
Model weights saved in saved_models\t5_qa\checkpoint-40\model.safetensors
tokenizer config file saved in saved_models\t5_qa\checkpoint-40\tokenizer_config.json
Special tokens file saved in saved_models\t5_qa\checkpoint-40\special_tokens_map.json
Copy vocab file to saved_models\t5_qa\checkpoint-40\spiece.model


Step     50 • loss 2.8962


Saving model checkpoint to saved_models\t5_qa\checkpoint-60
Configuration saved in saved_models\t5_qa\checkpoint-60\config.json
Configuration saved in saved_models\t5_qa\checkpoint-60\generation_config.json


Step     60 • loss 3.0205


Model weights saved in saved_models\t5_qa\checkpoint-60\model.safetensors
tokenizer config file saved in saved_models\t5_qa\checkpoint-60\tokenizer_config.json
Special tokens file saved in saved_models\t5_qa\checkpoint-60\special_tokens_map.json
Copy vocab file to saved_models\t5_qa\checkpoint-60\spiece.model

***** Running Evaluation *****
  Num examples = 18
  Batch size = 8
INFO:absl:Using default tokenizer.


Epoch 3/3 • eval_loss 2.5392


Saving model checkpoint to saved_models\t5_qa\checkpoint-60
Configuration saved in saved_models\t5_qa\checkpoint-60\config.json
Configuration saved in saved_models\t5_qa\checkpoint-60\generation_config.json
Model weights saved in saved_models\t5_qa\checkpoint-60\model.safetensors
tokenizer config file saved in saved_models\t5_qa\checkpoint-60\tokenizer_config.json
Special tokens file saved in saved_models\t5_qa\checkpoint-60\special_tokens_map.json
Copy vocab file to saved_models\t5_qa\checkpoint-60\spiece.model


Training completed. Do not forget to share your model on huggingface.co/models =)


Configuration saved in saved_models\t5_qa\config.json
Configuration saved in saved_models\t5_qa\generation_config.json
Model weights saved in saved_models\t5_qa\model.safetensors
tokenizer config file saved in saved_models\t5_qa\tokenizer_config.json
Special tokens file saved in saved_models\t5_qa\special_tokens_map.json
Copy vocab file to saved_models\t5_qa\spiece.model
Saving model checkpoin

T5 QA model and tokenizer saved to saved_models\t5_qa


## Model Deployment Setup with Combined Metadata

In [132]:
# --------------------------------------------
# Save Combined Model Metadata
# --------------------------------------------
metadata = {
    "emotion_classifier": str(SAVE_ROOT / "emotion_classifier"),
    "t5_response_generator": str(SAVE_ROOT / "t5_response_generator"),
    "t5_qa": str(SAVE_ROOT / "t5_qa")
}

metadata_path = SAVE_ROOT / "combined_model_metadata.pt"
torch.save(metadata, metadata_path)
print("Saved combined model metadata to:", metadata_path)


Saved combined model metadata to: saved_models\combined_model_metadata.pt


In [133]:
# --------------------------------------------
# Load Models Using Combined Metadata
# --------------------------------------------
metadata_path = SAVE_ROOT / "combined_model_metadata.pt"

if metadata_path.exists():
    model_paths = torch.load(metadata_path)
    print("Loaded model metadata:", model_paths)
else:
    raise FileNotFoundError(f"Metadata file not found at {metadata_path}. Please ensure it exists.")

# Load Emotion Classification Model & Tokenizer from metadata
emo_model = AutoModelForSequenceClassification.from_pretrained(model_paths["emotion_classifier"]).to(device)
emo_tokenizer = AutoTokenizer.from_pretrained(model_paths["emotion_classifier"])

# Load T5 Response Generation Model & Tokenizer from metadata
resp_model = T5ForConditionalGeneration.from_pretrained(model_paths["t5_response_generator"]).to(device)
resp_tokenizer = AutoTokenizer.from_pretrained(model_paths["t5_response_generator"])

# Load T5 QA Model & Tokenizer from metadata
qa_model = T5ForConditionalGeneration.from_pretrained(model_paths["t5_qa"]).to(device)
qa_tokenizer = AutoTokenizer.from_pretrained(model_paths["t5_qa"])

print("All models loaded from metadata.")


loading configuration file saved_models\emotion_classifier\config.json
Model config RobertaConfig {
  "_name_or_path": "saved_models\\emotion_classifier",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "admiration",
    "1": "amusement",
    "2": "anger",
    "3": "annoyance",
    "4": "approval",
    "5": "caring",
    "6": "confusion",
    "7": "curiosity",
    "8": "desire",
    "9": "disappointment",
    "10": "disapproval",
    "11": "disgust",
    "12": "embarrassment",
    "13": "excitement",
    "14": "fear",
    "15": "gratitude",
    "16": "grief",
    "17": "joy",
    "18": "love",
    "19": "nervousness",
    "20": "optimism",
    "21": "pride",
    "22": "realization",
    "23": "relief",
    "24": "remorse",
    "25": "sadness",
    "26": "surpri

Loaded model metadata: {'emotion_classifier': 'saved_models\\emotion_classifier', 't5_response_generator': 'saved_models\\t5_response_generator', 't5_qa': 'saved_models\\t5_qa'}


loading file vocab.json
loading file merges.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading configuration file saved_models\t5_response_generator\config.json
Model config T5Config {
  "_name_or_path": "t5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summariza

All models loaded from metadata.


## Unified Pipeline & Gradio for the Mental Health Chatbot

In [134]:
# ===============================================
# Unified Pipeline & Gradio for the Mental Health Chatbot
# ===============================================

from transformers import AutoTokenizer, AutoModelForSequenceClassification, T5ForConditionalGeneration

# ---- Load Fine‑Tuned Models Using Combined Metadata ----
# Define the path to the combined metadata file.
metadata_path = SAVE_ROOT / "combined_model_metadata.pt"

if metadata_path.exists():
    model_paths = torch.load(metadata_path)
    print("Loaded model metadata:", model_paths)
else:
    raise FileNotFoundError(f"Metadata file not found at {metadata_path}. Please ensure it exists.")

Loaded model metadata: {'emotion_classifier': 'saved_models\\emotion_classifier', 't5_response_generator': 'saved_models\\t5_response_generator', 't5_qa': 'saved_models\\t5_qa'}


In [135]:
# Load Emotion Classification Model & Tokenizer from metadata.
emo_model = AutoModelForSequenceClassification.from_pretrained(model_paths["emotion_classifier"]).to(device)
emo_tokenizer = AutoTokenizer.from_pretrained(model_paths["emotion_classifier"])
emo_model.eval()  # Set emotion classifier to evaluation mode

# Load T5 Response Generation Model & Tokenizer from metadata.
resp_model = T5ForConditionalGeneration.from_pretrained(model_paths["t5_response_generator"]).to(device)
resp_tokenizer = AutoTokenizer.from_pretrained(model_paths["t5_response_generator"])

loading configuration file saved_models\emotion_classifier\config.json
Model config RobertaConfig {
  "_name_or_path": "saved_models\\emotion_classifier",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "admiration",
    "1": "amusement",
    "2": "anger",
    "3": "annoyance",
    "4": "approval",
    "5": "caring",
    "6": "confusion",
    "7": "curiosity",
    "8": "desire",
    "9": "disappointment",
    "10": "disapproval",
    "11": "disgust",
    "12": "embarrassment",
    "13": "excitement",
    "14": "fear",
    "15": "gratitude",
    "16": "grief",
    "17": "joy",
    "18": "love",
    "19": "nervousness",
    "20": "optimism",
    "21": "pride",
    "22": "realization",
    "23": "relief",
    "24": "remorse",
    "25": "sadness",
    "26": "surpri

In [136]:
# Load T5 QA Model & Tokenizer from metadata.
qa_model = T5ForConditionalGeneration.from_pretrained(model_paths["t5_qa"]).to(device)
qa_tokenizer = AutoTokenizer.from_pretrained(model_paths["t5_qa"])

# ---- Define Emotion Labels ----
# These labels match those used in our emotion annotation step.
DEFAULT_LABELS = [
    'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity',
    'desire', 'disappointment', 'disapproval', 'embarrassment', 'excitement', 'fear', 'gratitude',
    'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse',
    'sadness', 'surprise', 'neutral'
]
NUM_EMO_LABELS = emo_model.config.num_labels
EMOTION_LABELS = DEFAULT_LABELS[:NUM_EMO_LABELS]

loading configuration file saved_models\t5_qa\config.json
Model config T5Config {
  "_name_or_path": "t5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en

In [137]:
# Optionally, define a subset for emotion-based routing.
emotion_router_labels = {'confusion', 'caring', 'nervousness', 'grief', 'sadness', 'fear', 'remorse', 'love', 'anger'}

In [138]:
# ---- Helper Functions ----

def detect_emotions(text):
    """
    Detects emotions in the provided text using the fine‑tuned emotion classifier.
    Returns a list of emotion labels whose corresponding probabilities exceed 0.3.
    """
    inputs = emo_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(emo_model.device)
    with torch.no_grad():
        logits = emo_model(**inputs).logits
    probs = torch.sigmoid(logits).cpu().numpy()[0]
    detected = [EMOTION_LABELS[i] for i, p in enumerate(probs) if p > 0.3]
    return detected if detected else ["neutral"]

In [139]:
def format_input_prompt(user_input, language="English", history=None):
    """
    Optionally formats a prompt incorporating conversational history.
    """
    if history:
        combined = "\n".join(history + [user_input])
        return f"You are a supportive mental health assistant. Respond in {language}. The conversation so far:\n{combined}"
    return f"You are a supportive mental health assistant. Respond in {language}. The user says: {user_input}"

In [140]:
# ---- Unified Chatbot Pipeline Class ----

class MentalHealthChatbotPipeline:
    def __init__(self, labels, device="cpu"):
        self.device = device
        self.labels = labels
        self.chat_history = []  # Stores tuples like (speaker, text)
        
        # Ensure models are in evaluation mode.
        self.emo_model = emo_model.eval()
        self.qa_model = qa_model.eval()
        self.resp_model = resp_model.eval()

    def __call__(self, text, max_length=128):
        """
        Processes user text, detects emotions, routes the input to the appropriate model,
        and returns a dictionary with detected emotions, the reply, and the updated conversation history.
        """
        self.chat_history.append(("User", text))
        
        # ---- Emotion Detection ----
        emotions = detect_emotions(text)
        
        # ---- Model Selection Logic ----
        # Simple decision: if the input contains a question mark, use the QA model; otherwise, use the response generation model.
        if "?" in text:
            model, tokenizer = self.qa_model, qa_tokenizer
        else:
            model, tokenizer = self.resp_model, resp_tokenizer

        # ---- Generate Response ----
        # (Optionally, you could use format_input_prompt to incorporate history.)
        inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)
        with torch.no_grad():
            output_ids = model.generate(**inputs, max_length=max_length)
        reply = tokenizer.decode(output_ids[0], skip_special_tokens=True)

        self.chat_history.append(("Bot", reply))
        
        return {"Detected Emotions": emotions, "Response": reply, "History": self.chat_history}

In [141]:
# Instantiate the chatbot pipeline.
chatbot = MentalHealthChatbotPipeline(labels=EMOTION_LABELS, device=device)


## Gradio Interface for the Mental Health Chatbot (for testing)

In [142]:
# ===============================================
# Full-Functionality Gradio Interface for the Mental Health Chatbot
# ===============================================

# ---- Load Fine-Tuned Models Using Combined Metadata ----
metadata_path = SAVE_ROOT / "combined_model_metadata.pt"

if metadata_path.exists():
    model_paths = torch.load(metadata_path)
    print("Loaded model metadata:", model_paths)
else:
    raise FileNotFoundError(f"Metadata file not found at {metadata_path}. Please ensure it exists.")

Loaded model metadata: {'emotion_classifier': 'saved_models\\emotion_classifier', 't5_response_generator': 'saved_models\\t5_response_generator', 't5_qa': 'saved_models\\t5_qa'}


In [143]:
# Load Response Generation Model & Tokenizer
resp_tokenizer = AutoTokenizer.from_pretrained(model_paths["t5_response_generator"])
resp_model = T5ForConditionalGeneration.from_pretrained(model_paths["t5_response_generator"]).to(device)

# Load QA Model & Tokenizer
qa_tokenizer = AutoTokenizer.from_pretrained(model_paths["t5_qa"])
qa_model = T5ForConditionalGeneration.from_pretrained(model_paths["t5_qa"]).to(device)

# Load Emotion Classification Model & Tokenizer
emo_tokenizer = AutoTokenizer.from_pretrained(model_paths["emotion_classifier"])
emo_model = AutoModelForSequenceClassification.from_pretrained(model_paths["emotion_classifier"]).to(device)
emo_model.eval()  # Set emotion classifier to evaluation mode

loading file spiece.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading configuration file saved_models\t5_response_generator\config.json
Model config T5Config {
  "_name_or_path": "t5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_

RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
         

In [144]:
# ---- Define Emotion Labels ----
# Define the complete list of 28 emotion labels (as in the GoEmotions baseline)
DEFAULT_LABELS = [
    'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity',
    'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear',
    'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief',
    'remorse', 'sadness', 'surprise', 'neutral'
]

In [145]:
NUM_EMO_LABELS = emo_model.config.num_labels
EMOTION_LABELS = DEFAULT_LABELS[:NUM_EMO_LABELS]

# Define a subset of emotions for routing decisions (if needed)
emotion_router_labels = set(EMOTION_LABELS) & {'confusion', 'caring', 'nervousness', 'grief', 'sadness', 'fear', 'remorse', 'love', 'anger'}

In [146]:
# ---- Helper Functions ----

def format_input_prompt(user_input, language="English", history=None):
    """Formats a prompt that incorporates conversation history."""
    if history:
        combined = "\n".join(history + [user_input])
        return f"You are a supportive mental health assistant. Respond in {language}. The conversation so far:\n{combined}"
    return f"You are a supportive mental health assistant. Respond in {language}. The user says: {user_input}"

In [147]:
def detect_emotions(text):
    """
    Detects emotions in the provided text using the fine‑tuned emotion classifier.
    Returns a list of emotion labels whose corresponding probabilities exceed 0.3.
    """
    inputs = emo_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(emo_model.device)
    with torch.no_grad():
        logits = emo_model(**inputs).logits
    probs = torch.sigmoid(logits).cpu().numpy()[0]
    
    # Safety check: trim probabilities if there are more than expected.
    if len(probs) > len(EMOTION_LABELS):
        print(f"Warning: Received {len(probs)} probabilities; expected {len(EMOTION_LABELS)}. Trimming extra values.")
        probs = probs[:len(EMOTION_LABELS)]
    
    detected = [EMOTION_LABELS[i] for i, p in enumerate(probs) if p > 0.3]
    return detected if detected else ["neutral"]

In [148]:
def transcribe_audio(audio_file):
    """
    Converts an input audio file to text using speech recognition.
    Returns the transcribed text or an error message.
    """
    recognizer = sr.Recognizer()
    audio = AudioSegment.from_file(audio_file)
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp:
        audio.export(tmp.name, format="wav")
        with sr.AudioFile(tmp.name) as source:
            audio_data = recognizer.record(source)
            try:
                return recognizer.recognize_google(audio_data)
            except sr.UnknownValueError:
                return "[Unrecognized speech]"
            except sr.RequestError:
                return "[Speech recognition failed]"

In [149]:
def generate_chatbot_response(user_text, audio_input, mode, language, use_history, history, route_by_emotion, persist):
    """
    Main function to generate a chatbot response.
    
    Parameters:
      - user_text: Text input (if mode == "text")
      - audio_input: Audio file path (if mode == "voice")
      - mode: "text" or "voice" input mode
      - language: Desired language for the response
      - use_history: If True, include conversation history in the prompt
      - history: Current conversation history (as a list)
      - route_by_emotion: If True, route input based on detected emotion
      - persist: If True, save conversation to a log file
    
    Returns:
      A tuple (response, detected emotions as a string, updated history)
    """
    history = history or []
    user_input = user_text if mode == "text" else transcribe_audio(audio_input)
    emotions = detect_emotions(user_input)
    
    # Decide model based on detected emotions and routing flag.
    use_resp_model = any(e in emotion_router_labels for e in emotions) if route_by_emotion else False

    if use_resp_model:
        prompt = format_input_prompt(user_input, language, history if use_history else None)
        inputs = resp_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True).to(resp_model.device)
        model, tokenizer = resp_model, resp_tokenizer
    else:
        prompt = "question: " + user_input
        inputs = qa_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True).to(qa_model.device)
        model, tokenizer = qa_model, qa_tokenizer

    output_ids = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_length=64,
        num_beams=4,
        no_repeat_ngram_size=2,
        early_stopping=True
    )

    response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    full_history = history + [f"User: {user_input}", f"Bot: {response}"]

    if persist:
        with open("chatlog.txt", "a", encoding="utf-8") as log:
            log.write(f"\n[{datetime.datetime.now()}]\n{full_history[-2]}\n{full_history[-1]}\nDetected emotions: {emotions}\n")

    return response, ", ".join(emotions), full_history

In [150]:
# ---- Build the Gradio Interface ----

demo = gr.Interface(
    fn=generate_chatbot_response,
    inputs=[
        gr.Textbox(label="Type your message here (if using text mode)"),
        gr.Audio(type="filepath", label="Or speak here (if using voice mode)"),
        gr.Radio(["text", "voice"], value="text", label="Input Mode"),
        gr.Dropdown(choices=["English", "German", "Spanish", "French"], value="English", label="Response Language"),
        gr.Checkbox(label="Include chat history in response", value=True),
        gr.State(value=[]),
        gr.Checkbox(label="Route by detected emotion", value=True),
        gr.Checkbox(label="Save conversation to chatlog.txt", value=True)
    ],
    outputs=[
        gr.Textbox(label="Therapist Response"),
        gr.Textbox(label="Detected Emotions"),
        gr.State()
    ],
    title="Voice + Text Enabled Emotion-Aware Mental Health Chatbot",
    description="You can type or speak your message. Emotion-aware routing decides between Q&A and therapist-style support."
)

# Launch the interface.
demo.launch()

INFO:httpx:HTTP Request: GET http://127.0.0.1:7862/gradio_api/startup-events "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: HEAD http://127.0.0.1:7862/ "HTTP/1.1 200 OK"


* Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.




INFO:httpx:HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"


## Metrics and Evaluation

In [151]:
# ===============================================
# Model Evaluation Cell
# ===============================================

### Evaluation for Emotion Classification ###

def evaluate_emotion_classifier(model, tokenizer, dataset, batch_size=16):
    """
    Evaluate the emotion classifier over the provided dataset.
    Computes micro-averaged F1, Precision, Recall, and Subset Accuracy.
    Assumes that dataset is formatted with columns "input_ids", "attention_mask", "labels"
    and that labels is a multi-label binary vector.
    """
    model.eval()
    all_preds = []
    all_labels = []
    
    # Create a DataLoader for batch processing (if dataset is not huge, you can loop through it directly)
    dataloader = DataLoader(dataset, batch_size=batch_size)
    
    for batch in dataloader:
        # Move inputs and labels to device
        inputs = {k: v.to(device) for k, v in batch.items() if k != "labels"}
        labels = batch["labels"].numpy()
        
        with torch.no_grad():
            logits = model(**inputs).logits
        # Apply sigmoid for multi-label classification and threshold at 0.3
        preds = (torch.sigmoid(logits) > 0.3).cpu().numpy().astype(int)
        
        all_preds.append(preds)
        all_labels.append(labels)
    
    all_preds = np.concatenate(all_preds, axis=0)
    all_labels = np.concatenate(all_labels, axis=0)
    
    # Compute micro-averaged metrics
    micro_f1 = f1_score(all_labels, all_preds, average="micro", zero_division=0)
    micro_precision = precision_score(all_labels, all_preds, average="micro", zero_division=0)
    micro_recall = recall_score(all_labels, all_preds, average="micro", zero_division=0)
    subset_acc = accuracy_score(all_labels, all_preds)  # subset accuracy is strict
    
    return {
        "Emotion Classifier Micro-F1": micro_f1,
        "Emotion Classifier Micro-Precision": micro_precision,
        "Emotion Classifier Micro-Recall": micro_recall,
        "Emotion Classifier Subset Accuracy": subset_acc,
    }

In [152]:
print("Evaluating Emotion Classifier...")
emo_metrics = evaluate_emotion_classifier(emo_model, emo_tokenizer, emo_test_tok)
for metric, value in emo_metrics.items():
    print(f"{metric}: {value:.4f}")

Evaluating Emotion Classifier...
Emotion Classifier Micro-F1: 1.0000
Emotion Classifier Micro-Precision: 1.0000
Emotion Classifier Micro-Recall: 1.0000
Emotion Classifier Subset Accuracy: 1.0000


In [153]:
### Evaluation for Generation Models (Response Generation and QA) ###

# We already defined a compute_resp_metrics function in the training cells.
# Here, we define a helper to compute additional perplexity based on the evaluation loss.

def evaluate_generation_model(trainer, test_dataset):
    """
    Uses the Seq2SeqTrainer to compute evaluation metrics over the given test dataset.
    Adds perplexity (exp(eval_loss)) to the standard metrics.
    """
    # Predict returns a dictionary with metrics: eval_loss, and any metrics computed in compute_metrics.
    result = trainer.predict(test_dataset)
    eval_loss = result.metrics.get("eval_loss")
    
    # Compute perplexity if loss is available. (If eval_loss is zero or not available, perplexity is undefined.)
    if eval_loss is not None and eval_loss > 0:
        perplexity = math.exp(eval_loss)
    else:
        perplexity = float("inf")
    
    # Add perplexity to the metrics dictionary.
    result.metrics["perplexity"] = perplexity
    return result.metrics

In [154]:
print("\nEvaluating T5 Response Generation Model...")
resp_metrics = evaluate_generation_model(trainer_resp, resp_test_tok)
for metric, value in resp_metrics.items():
    print(f"T5 Response Generation {metric}: {value:.4f}")

print("\nEvaluating T5 QA Model...")
qa_metrics = evaluate_generation_model(trainer_qa, qa_test_tok)
for metric, value in qa_metrics.items():
    print(f"T5 QA {metric}: {value:.4f}")


***** Running Prediction *****
  Num examples = 18
  Batch size = 8



Evaluating T5 Response Generation Model...


INFO:absl:Using default tokenizer.

***** Running Prediction *****
  Num examples = 18
  Batch size = 8


T5 Response Generation test_loss: 2.5405
T5 Response Generation test_rougeL: 0.0397
T5 Response Generation test_bertscore_f1: 0.1903
T5 Response Generation test_runtime: 1.8881
T5 Response Generation test_samples_per_second: 9.5330
T5 Response Generation test_steps_per_second: 1.5890
T5 Response Generation perplexity: inf

Evaluating T5 QA Model...


INFO:absl:Using default tokenizer.


T5 QA test_loss: 2.5392
T5 QA test_rougeL: 0.1596
T5 QA test_bertscore_f1: 0.6831
T5 QA test_runtime: 2.5008
T5 QA test_samples_per_second: 7.1980
T5 QA test_steps_per_second: 1.2000
T5 QA perplexity: inf


In [155]:
# Additionally, you can evaluate other metrics such as ROUGE and BERTScore separately using the `evaluate` library,
# if desired. For example:

rouge_metric = evaluate.load("rouge")
bertscore_metric = evaluate.load("bertscore")

def compute_generation_metrics(trainer, test_dataset, tokenizer):
    result = trainer.predict(test_dataset)
    predictions, labels = result.predictions, result.label_ids
    # Replace -100 in labels by the tokenizer pad id.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    
    r = rouge_metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    b = bertscore_metric.compute(predictions=decoded_preds, references=decoded_labels, lang="en")
    
    # Here, we only extract the ROUGE-L and average BERTScore F1.
    metrics = {
        "rougeL": r["rougeL"],
        "bertscore_f1": np.mean(b["f1"])
    }
    return metrics

In [156]:
print("\nAdditional Generation Metrics for T5 Response Generation:")
additional_resp_metrics = compute_generation_metrics(trainer_resp, resp_test_tok, tokenizer_t5)
for metric, value in additional_resp_metrics.items():
    print(f"T5 Response Generation {metric}: {value:.4f}")

print("\nAdditional Generation Metrics for T5 QA Model:")
additional_qa_metrics = compute_generation_metrics(trainer_qa, qa_test_tok, tokenizer_t5)
for metric, value in additional_qa_metrics.items():
    print(f"T5 QA {metric}: {value:.4f}")


***** Running Prediction *****
  Num examples = 18
  Batch size = 8



Additional Generation Metrics for T5 Response Generation:


INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
loading configuration file config.json from cache at C:\Users\mward\.cache\huggingface\hub\models--roberta-large\snapshots\722cf37b1afa9454edce342e7895e588b6ff1d59\config.json
Model config RobertaConfig {
  "_name_or_path": "roberta-large",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.49.0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading file vocab.json from cache at C:\Users\mward\.cache\huggingface\hub\mode

T5 Response Generation rougeL: 0.0397
T5 Response Generation bertscore_f1: 0.1903

Additional Generation Metrics for T5 QA Model:


INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.


T5 QA rougeL: 0.1596
T5 QA bertscore_f1: 0.6831


