# Happy Brain: End-to-End Mental Health Chatbot

## 1. Project Overview

**Happy Brain** is an end-to-end mental health chatbot built from three independently fine-tuned NLP models. Each model serves a distinct purpose—emotion recognition, factual answering, or therapeutic text generation. These models are unified through a dynamic routing system that selects the most contextually appropriate response strategy based on the user's input and detected emotional tone.

---

## 2. Key Components and Workflow

### 2.1 Library Imports & Environment Setup
- Uses essential libraries: `torch`, `transformers`, `datasets`, `sklearn`, `gradio`, etc.
- Enforces CPU usage for compatibility and reproducibility.
- Sets consistent random seed.
- Creates a root directory (`SAVE_ROOT`) to store all models and metadata.

### 2.2 Data Loading & Preprocessing
- Loads and merges multiple mental health-related datasets.
- Cleans and standardizes into uniform `question → answer` format.
- Tokenizes input using the corresponding tokenizer (RoBERTa, T5).
- Splits datasets into training and testing sets.

---

## 3. Model Descriptions & Training

### 3.1 Emotion Classification — Model 1
- **Base Model:** `SamLowe/roberta-base-go_emotions`
- **Objective:** Multi-label emotion detection tailored for mental health.
- **Training:**
  - Classification head adjusted for multiple emotion labels.
  - Sigmoid activation with BCE loss.
  - Optimized using AdamW (LR=2e-5).
  - Implements early stopping and best-model checkpointing.
- **Saving:** Robust saving with retry logic to prevent Windows file-locking issues.

### 3.2 Supportive Response Generation — Model 2
- **Base Model:** `google/flan-t5-large`
- **Objective:** Generate emotionally supportive responses.
- **Tokenization & Training:**
  - Input: max 128 tokens; Output: max 64 tokens.
  - Beam search decoding with no-repeat n-grams (size 2).
- **Evaluation:** ROUGE-L and optional BERTScore.
- **Saving:** Uses temporary file saves for compatibility.

### 3.3 Factual Question Answering — Model 3
- **Base Model:** `google/flan-t5-base`
- **Objective:** Answer factual mental health-related questions.
- **Training & Saving:** Follows same protocol as Model 2.

---

## 4. Pipeline Integration & Routing Logic

### 4.1 Unified Metadata Loader
- Stores all model paths and configuration in `combined_model_metadata.pt`.

### 4.2 Dynamic Inference Pipeline
- **Emotion Detection:**  
  Uses RoBERTa model via `detect_emotions(text)` with threshold = 0.3.

- **Routing Strategy:**
  - If input includes a question mark → QA model (T5-base).
  - Otherwise, based on detected emotions → Supportive model (T5-large).

- **Prompt Construction:**  
  Leverages detected emotions and conversation history.

- **Standardized Generation Parameters:**
  ```python
  model.generate(
      input_ids=...,
      attention_mask=...,
      max_length=100,
      min_length=20,
      num_beams=4,
      no_repeat_ngram_size=2,
      early_stopping=True
  )

---

## 5. Optimization Techniques Used

Several optimization strategies were employed to enhance model performance and integration:

- **Consistent Tokenization:**  
  Uniform truncation and padding practices across all models to ensure coherent data processing.

- **Multi-Label Scoring:**  
  Use of Micro F1 score and threshold optimization for robust emotion classification.

- **Early Stopping:**  
  Retains the best-performing model checkpoint during training to prevent overfitting and improve generalization.

- **Temporary Save Logic:**  
  Implements a retry mechanism for file-saving to mitigate file-locking issues, particularly on Windows systems.

- **LoRA Compatibility (Planned):**  
  A modular design enabling future incorporation of Low-Rank Adaptation for efficient fine-tuning and memory optimization.

---

## 6. Evaluation Summary

A comprehensive evaluation framework ensures each model operates effectively within the chatbot pipeline:

### 6.1 Emotion Classification

Evaluated using micro-averaged metrics including:

- **F1 Score**  
- **Precision**  
- **Recall**  
- **Subset Accuracy**  

These metrics quantify the accuracy and reliability of multi-label emotion predictions.

### 6.2 Supportive Response Generation and Question-Answering

Evaluated through text-generation metrics including:

- **ROUGE-L:**  
  Assesses the fluency and accuracy by measuring textual overlap between generated and reference texts.

- **Perplexity:**  
  Reflects model coherence and confidence based on the evaluation loss.

- **BERTScore (Optional):**  
  Measures semantic similarity between predictions and references for nuanced quality assurance.

> Collectively, these evaluation techniques validate the chatbot's capability for emotionally intelligent and contextually appropriate interactions.

---

## 7. Summary

The **Happy Brain** chatbot integrates three specialized NLP models:

- `SamLowe/roberta-base-go_emotions` → Emotion detection  
- `google/flan-t5-base` → Factual QA  
- `google/flan-t5-large` → Supportive response generation

These are merged into a cohesive, modular pipeline designed for nuanced, empathetic, and factual mental health conversations. Each model is trained, optimized, evaluated, and integrated with precision—empowering the chatbot to offer meaningful and emotionally aware user support.

NOTE: The chatbot's responses are still mostly incorrect and require more refinement. Although emotion detection is generally accurate, Happy Brain is still a work in progress and needs much more training and refinement. This was a challenge considering the hardware and software limitations faced when creating a chatbot like this from (essentially) scratch in an educational setting. However, the chatbot is a great start and will provide a useful and empathetic platform for users seeking mental health support.






## 1. Imports & Configuration

In [427]:
# ================================
# Environment Setup
# ================================
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""  # Force CPU use

import warnings
import logging
import gc
import shutil
import tempfile
import time
from pathlib import Path
from dataclasses import dataclass

# ================================
# Standard Library Imports
# ================================
import glob
import json
import random
import itertools
import datetime
import pprint  # Pretty-printing

# ================================
# Scientific & Data Libraries
# ================================
import math
import numpy as np
import pandas as pd

import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader

from sklearn.metrics import (
    f1_score,
    precision_score,
    recall_score,
    accuracy_score
)
from sklearn.preprocessing import MultiLabelBinarizer

# ================================
# Audio Processing
# ================================
from pydub import AudioSegment
import speech_recognition as sr

# ================================
# NLP & Transformers (Hugging Face)
# ================================
import evaluate
from evaluate import load
import gradio as gr
import streamlit as st
from datasets import load_dataset, Dataset, concatenate_datasets

# Transformers - Tokenizers & Models
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    T5ForConditionalGeneration
)

# Transformers - Training Tools
from transformers import (
    Trainer,
    TrainingArguments,
    Seq2SeqTrainer,
    Seq2SeqTrainingArguments,
    TrainerCallback,
    DataCollatorWithPadding,
    DataCollatorForSeq2Seq,
    default_data_collator
)

# Transformers - Logging
from transformers import logging as hf_logging

# ================================
# PEFT / LoRA
# ================================
from peft import get_peft_model, LoraConfig, TaskType


In [428]:
# Environment Setup

USE_CPU = True  # Set to True to force CPU mode
device = torch.device('cpu' if USE_CPU else ('cuda' if torch.cuda.is_available() else 'cpu'))
print('Device in use:', device)

# FORCE CPU for the entire session

USE_CPU = True
device = torch.device("cpu")
print("FORCED device:", device)

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)


# Save directory setup

SAVE_ROOT = Path("./saved_models")
for sub in ["emotion_classifier", "flan_t5_response_generator", "t5_qa"]:
    (SAVE_ROOT / sub).mkdir(parents=True, exist_ok=True)

Device in use: cpu
FORCED device: cpu


## Data Loading and Preprocessing

In [429]:
# Toggle individual CSVs and provide their column mapping
# Format: name: (enabled, path, question_col, answer_col)

DATASETS = {
    "ds1": (True,  "./data/ds1_transformed_mental_health_chatbot_dataset.csv",  "question", "answer"),
    "ds2": (False, "./data/ds2_transformed_mental_health_chatbot.csv",         "question", "answer"),
    "ds3": (False, "./data/ds3_mental_health_faq_cleaned.csv",                 "Question", "Answer"),
    "ds4": (False, "./data/ds4_mental_health_chatbot_dataset_merged_modes.csv","prompt",   "response"),
    "ds5": (False, "./data/ds5_Mental_Health_FAQ.csv",                         "Question", "Answer"),
    "ds6": (False, "./data/ds6_mental_health_counseling.csv",                  "query",    "completion"),
}

In [430]:
# Cleaner that auto-maps columns to 'question' / 'answer'

def load_and_clean(path, q_col, a_col):
    df = pd.read_csv(path)

    # Normalize headers
    df.columns = [c.lower().strip() for c in df.columns]
    q_col = q_col.lower().strip()
    a_col = a_col.lower().strip()

    # Common renames
    rename_map = {
        "prompt": "question",
        "response": "answer",
        "questions": "question",
        "answers": "answer",
    }
    df = df.rename(columns=rename_map)

    # If provided cols exist, rename them to standard names
    
    if q_col in df.columns:
        df = df.rename(columns={q_col: "question"})
    if a_col in df.columns:
        df = df.rename(columns={a_col: "answer"})

    # Try to map 'context' -> 'question' if needed

    if "question" not in df.columns and "context" in df.columns:
        df = df.rename(columns={"context": "question"})

    # Verify if necessary columns exist

    if not {"question", "answer"}.issubset(df.columns):
        raise ValueError(f"Could not find 'question'/'answer' in {path}. Available columns: {list(df.columns)}")

    # Retain only required columns, drop missing values and duplicates

    df = df[["question", "answer"]].dropna()
    df["question"] = df["question"].astype(str).str.strip().str.replace(r"\s+", " ", regex=True)
    df["answer"]   = df["answer"].astype(str).str.strip().str.replace(r"\s+", " ", regex=True)
    df = df.drop_duplicates()

    # Convert to Hugging Face Dataset
    
    return Dataset.from_pandas(df.reset_index(drop=True))

In [431]:
# Load enabled datasets and create a unified dataset

datasets = []
for key, (enabled, path, q_col, a_col) in DATASETS.items():
    if enabled:
        print(f"Loading dataset '{key}' from {path} ...")
        ds = load_and_clean(path, q_col, a_col)
        print(f"Loaded {len(ds)} examples from '{key}'.")
        datasets.append(ds)
    else:
        print(f"Skipping dataset '{key}' as its toggle is off.")

Loading dataset 'ds1' from ./data/ds1_transformed_mental_health_chatbot_dataset.csv ...
Loaded 172 examples from 'ds1'.
Skipping dataset 'ds2' as its toggle is off.
Skipping dataset 'ds3' as its toggle is off.
Skipping dataset 'ds4' as its toggle is off.
Skipping dataset 'ds5' as its toggle is off.
Skipping dataset 'ds6' as its toggle is off.


In [432]:
# Create unified dataset

if datasets:
    combined_dataset = concatenate_datasets(datasets)
    print(f"\nCombined dataset contains {len(combined_dataset)} examples.")
else:
    raise ValueError("No datasets enabled. Please enable at least one dataset in DATASETS.")

# Shuffle and split into training and testing datasets (e.g., 90% train, 10% test)

combined_dataset = combined_dataset.shuffle(seed=42)
split_dataset = combined_dataset.train_test_split(test_size=0.1, seed=42)
train_dataset = split_dataset['train']
test_dataset = split_dataset['test']
print(f"Training set: {len(train_dataset)} examples, Testing set: {len(test_dataset)} examples.")


Combined dataset contains 172 examples.
Training set: 154 examples, Testing set: 18 examples.


### Multi-Label Emotion Annotation & Custom Trainer Setup

In [433]:
# Set a seed if not already defined

SEED = 42

# Data Collator: casts labels to float32

def float_label_collator(features):
    """
    Wrap the default HF collator but cast the `labels` tensor to float32
    so BCEWithLogitsLoss gets the right dtype.
    """
    batch = default_data_collator(features)
    if "labels" in batch:
        batch["labels"] = batch["labels"].to(torch.float32)

    # Uncomment next line to print label info during debugging
    # print("collator labels dtype/shape:", batch["labels"].dtype, batch["labels"].shape)
    
    return batch

In [434]:
# Custom Trainer for Multi-Label Classification

class MultiLabelTrainer(Trainer):
    """
    Custom Trainer that computes loss using binary cross‑entropy with logits.
    This ensures multi‑label targets (e.g., emotions) are correctly processed.
    """
    def compute_loss(self, model, inputs, return_outputs: bool = False, **kwargs):
        labels = inputs.pop("labels").float()
        outputs = model(**inputs)
        logits = outputs.logits

        # If labels' shape does not match logits, reshape to match
        if labels.shape != logits.shape:
            labels = labels.view_as(logits)
        
        loss = F.binary_cross_entropy_with_logits(logits, labels, reduction="mean")
        return (loss, outputs) if return_outputs else loss

In [435]:
# Callback: Print training progress (optional)

class StepPrinter(TrainerCallback):
    """
    A Trainer callback that prints step-wise loss and evaluation metrics
    while keeping the tqdm progress bar.
    """
    def on_log(self, args, state, control, logs=None, **kwargs):
        if not logs or not state.is_local_process_zero:
            return
        if "loss" in logs:
            print(f"Step {state.global_step:>6} • loss {logs['loss']:.4f}")
        if "eval_loss" in logs:
            metric = logs.get("micro_f1") or logs.get("bertscore_f1") or logs.get("rougeL")
            metric_str = f" • metric {metric:.4f}" if metric is not None else ""
            print(f"Epoch {int(state.epoch)}/{int(args.num_train_epochs)}"
                  f" • eval_loss {logs['eval_loss']:.4f}{metric_str}")

In [436]:
# Define Emotion Labels (28 total: 27 + neutral)

GO_EMOTION_LABELS = [
    'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring',
    'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval',
    'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief',
    'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief',
    'remorse', 'sadness', 'surprise', 'neutral'
]
num_labels = len(GO_EMOTION_LABELS)

In [437]:
# Annotate each example with multi-label emotion annotations.
# For this demo, we simulate emotion annotations: if an example has no emotion,
# we default to "neutral". Replace this with your real emotion annotations if available.

def annotate_emotions(example):
    emos = example.get("emotions", [])
    
    # If no explicit emotion annotations, default to ["neutral"]

    if not emos:
        emos = ["neutral"]
    
    # Save original emotions for visibility (optional)

    example["emotions"] = emos
    
    # Create a binary label vector for each of the 28 emotions
    
    example["labels"] = [1.0 if lbl in emos else 0.0 for lbl in GO_EMOTION_LABELS]
    return example

In [438]:
# Helper to retrieve input text from an example (for debugging)

def get_input_text(example):
    return example.get("text") or example.get("question") or "[NO TEXT FOUND]"

# (Re‑)load datasets for emotion annotation if not already loaded.

if 'train_dataset' not in globals() or 'test_dataset' not in globals():
    print("Reloading datasets for emotion annotation ...")
    datasets_list = []
    for name, (enabled, path, q_col, a_col) in DATASETS.items():
        if not enabled:
            continue
        ds = load_and_clean(path=path, q_col=q_col, a_col=a_col)
        datasets_list.append(ds)
    if not datasets_list:
        print("No datasets were enabled, using a fallback test dataset.")
        fallback_data = {
            "text": [
                "How are you?",
                "I feel really down today.",
                "I'm so happy with my progress!",
                "Why does nobody understand me?",
                "I'm feeling anxious about school.",
                "Life is good lately.",
                "Sometimes I just want to cry.",
                "Everything is falling apart.",
                "I’m grateful for my therapist.",
                "Can someone please just listen?"
            ]
        }
        ds = Dataset.from_dict(fallback_data)
        datasets_list.append(ds)
    full_ds = concatenate_datasets(datasets_list) if len(datasets_list) > 1 else datasets_list[0]
    full_ds = full_ds.shuffle(seed=SEED)
    split = full_ds.train_test_split(test_size=0.1, seed=SEED)
    train_dataset, test_dataset = split["train"], split["test"]

print(f"Train dataset: {len(train_dataset):,} examples • Test dataset: {len(test_dataset):,} examples")

Train dataset: 154 examples • Test dataset: 18 examples


In [439]:
# Map emotion annotations onto training and testing datasets

emo_train = train_dataset.map(annotate_emotions)
emo_test  = test_dataset.map(annotate_emotions)

print("Sample annotation:")
print(get_input_text(emo_train[0]), "->", emo_train[0]["emotions"])

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Sample annotation:
How does someone acquire a mental illness? -> ['neutral']


In [440]:
# Filter out any examples without at least one positive label (shouldn't happen after defaulting to neutral)

def has_nonzero_labels(example):
    return sum(example["labels"]) > 0

emo_train = emo_train.filter(has_nonzero_labels)
emo_test = emo_test.filter(has_nonzero_labels)

# Check for any problematic labels

bad_labels = [ex for ex in emo_train if "labels" not in ex or sum(ex["labels"]) == 0]
print("Number of examples with problematic labels:", len(bad_labels))
if bad_labels:
    print("Example with problematic labels:", bad_labels[0])

Filter:   0%|          | 0/154 [00:00<?, ? examples/s]

Filter:   0%|          | 0/18 [00:00<?, ? examples/s]

Number of examples with problematic labels: 0


In [441]:
# Load the tokenizer for emotion classification.
# We'll be using the 'SamLowe/roberta-base-go_emotions' tokenizer.

emo_tokenizer = AutoTokenizer.from_pretrained("SamLowe/roberta-base-go_emotions")

# Print columns before renaming for clarity

print("Before column rename:", emo_train.column_names)

# Rename 'question' to 'text' if needed for tokenization

if "question" in emo_train.column_names:
    emo_train = emo_train.rename_column("question", "text")
if "question" in emo_test.column_names:
    emo_test = emo_test.rename_column("question", "text")
print("After column rename:", emo_train.column_names)

Before column rename: ['question', 'answer', 'emotions', 'labels']
After column rename: ['text', 'answer', 'emotions', 'labels']


In [442]:
# Tokenization: tokenize using the 'text' column.

def emo_tokenize(batch):
    return emo_tokenizer(batch["text"], padding=True, truncation=True)

In [443]:
# Function to cast labels to float32 using numpy

def cast_to_float(example):
    example["labels"] = np.array(example["labels"], dtype=np.float32)
    return example

In [444]:
# Tokenize and cast labels for both training and testing sets (batched processing)

emo_train_tok = emo_train.map(emo_tokenize, batched=True).map(cast_to_float)
emo_test_tok  = emo_test.map(emo_tokenize, batched=True).map(cast_to_float)

# Set the dataset format for PyTorch

emo_train_tok.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
emo_test_tok.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

print(f"Tokenized training examples: {len(emo_train_tok)} • Tokenized testing examples: {len(emo_test_tok)}")


Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Tokenized training examples: 154 • Tokenized testing examples: 18


In [445]:
# Define paths for saving and loading models

model_paths = {
    "t5_qa": "./saved_models/t5_qa",
    "emotion_classifier": "./saved_models/emotion_classifier",
    "flan_t5_response_generator": "./saved_models/flan_t5_response_generator"
}

## 4. Train Emotion Classifier (RoBERTa)

In [446]:
# Define paths for saving and loading models

emotion_model_path = SAVE_ROOT / "emotion_classifier"

try:
    if emotion_model_path.is_dir() and any(emotion_model_path.iterdir()):
        print("Loading previously fine-tuned emotion model...")
        emo_model = AutoModelForSequenceClassification.from_pretrained(emotion_model_path).to(device)
    else:
        raise FileNotFoundError("Fine-tuned model not found. Falling back to base model...")
except Exception as e:
    print(f"{e}")
    print("⬇Loading base Roberta model for multi-label classification...")
    
    # Fallback labels if not defined yet
    
    if "GO_EMOTION_LABELS" not in globals():
        GO_EMOTION_LABELS = [
            'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring',
            'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval',
            'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy',
            'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief',
            'remorse', 'sadness', 'surprise', 'neutral'
        ]

    emo_model = AutoModelForSequenceClassification.from_pretrained(
        "SamLowe/roberta-base-go_emotions",
        problem_type="multi_label_classification",
        num_labels=len(GO_EMOTION_LABELS)).to(device)

Loading previously fine-tuned emotion model...


In [447]:
# Define device

device = torch.device("cpu" if USE_CPU else ("cuda" if torch.cuda.is_available() else "cpu"))
print("Training device:", device)


Training device: cpu


In [448]:
# Tokenization for Model Training: Use fixed max_length

def emo_tokenize(batch):
    return emo_tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=64
    )

In [449]:
# Float conversion: ensures labels become float32 arrays.

def cast_to_float(example):
    example["labels"] = np.array(example["labels"], dtype=np.float32)
    return example

In [450]:
# Re-tokenize the datasets using the new tokenization function

emo_train_tok = emo_train.map(emo_tokenize, batched=True).map(cast_to_float)
emo_test_tok  = emo_test.map(emo_tokenize, batched=True).map(cast_to_float)

# Set format to PyTorch tensors

emo_train_tok.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
emo_test_tok.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

print(f"After tokenization: {len(emo_train_tok)} training examples; {len(emo_test_tok)} testing examples.")

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

After tokenization: 154 training examples; 18 testing examples.


In [451]:
# Load the pre-trained emotion classification model

emo_model = AutoModelForSequenceClassification.from_pretrained(
    "SamLowe/roberta-base-go_emotions",
    problem_type="multi_label_classification",
    num_labels=num_labels,  # equals len(GO_EMOTION_LABELS)
).to(device)
print("Loaded model:", emo_model.config._name_or_path)

Loaded model: SamLowe/roberta-base-go_emotions


In [452]:
# Define Evaluation Metrics for Emotion Classification

def compute_emo_metrics(pred):
    logits, labels = pred
    # Compute probabilities using sigmoid on logits

    probs = torch.sigmoid(torch.tensor(logits))
    # Apply threshold (0.3) to decide positive labels

    preds = (probs > 0.3).int().numpy()
    labels = np.array(labels)

    # Defensive check: only consider rows with at least one positive label
    
    mask = labels.sum(axis=1) > 0
    if mask.sum() == 0:
        print("Warning: all evaluation labels are empty")
        return {"micro_f1": 0.0}

    try:
        f1 = f1_score(labels[mask], preds[mask], average="micro", zero_division=0)
    except ValueError as e:
        print("Metric error:", e)
        f1 = 0.0

    return {"micro_f1": f1}

In [453]:
# Set Up Training Arguments

# Define a root folder for saving model checkpoints if not defined already

SAVE_ROOT = Path("./saved_models")
emo_args = TrainingArguments(
    output_dir=str(SAVE_ROOT / "emotion_classifier"),
    
    # Logging & Reporting
    logging_strategy="steps",
    logging_steps=10,
    logging_dir="./logs",
    report_to="none",
    
    # Training hyper‑parameters
    
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    learning_rate=2e-5,
    num_train_epochs=1,
    
    # Evaluation and checkpointing settings

    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="micro_f1",
    greater_is_better=True,
    
    seed=SEED,
)



In [454]:
# Instantiate the Custom Trainer for Multi-Label Classification

trainer_emo = MultiLabelTrainer(
    model=emo_model,
    args=emo_args,
    train_dataset=emo_train_tok,
    eval_dataset=emo_test_tok,
    tokenizer=emo_tokenizer,
    data_collator=float_label_collator,
    compute_metrics=compute_emo_metrics,
    callbacks=[StepPrinter],
)

  trainer_emo = MultiLabelTrainer(


In [455]:
# Train Roberta Emotion Classifier and Save Safely

num_labels = len(emo_train_tok[0]['labels'])
emotion_model = AutoModelForSequenceClassification.from_pretrained(
    "roberta-base", num_labels=num_labels
).to(device)

# Training arguments

training_args = TrainingArguments(
    output_dir="./saved_models/emotion_classifier",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_strategy="steps",
    logging_steps=10,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="f1"
)

# Metrics

def compute_metrics(pred):
    logits, labels = pred
    preds = (logits > 0).astype(int)
    return {
        "f1": f1_score(labels, preds, average="micro"),
        "precision": precision_score(labels, preds, average="micro"),
        "recall": recall_score(labels, preds, average="micro"),
        "accuracy": accuracy_score(labels, preds)
    }

# Trainer

trainer_emo = Trainer(
    model=emotion_model,
    args=training_args,
    train_dataset=emo_train_tok,
    eval_dataset=emo_test_tok,
    tokenizer=emo_tokenizer,
    compute_metrics=compute_metrics
)

# Train the model

# trainer_emo.train()

# Clean up memory before saving

gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Safe saving using temp dir and retry logic

with tempfile.TemporaryDirectory() as tmpdir:
    tmp_path = Path(tmpdir)

    # Save model via trainer

    for _ in range(3):
        try:
            trainer_emo.save_model(tmp_path)
            break
        except Exception as e:
            print(f"Retrying model save due to: {e}")
            time.sleep(1)

    # Save tokenizer

    for _ in range(3):
        try:
            emo_tokenizer.save_pretrained(tmp_path)
            break
        except Exception as e:
            print(f"Retrying tokenizer save due to: {e}")
            time.sleep(1)

    # Final destination path

    final_path = SAVE_ROOT / "emotion_classifier"
    final_path.mkdir(parents=True, exist_ok=True)

    # Move files to destination safely
    
    for item in tmp_path.iterdir():
        dest = final_path / item.name
        for _ in range(3):
            try:
                if dest.exists():
                    if dest.is_file():
                        try:
                            dest.unlink()
                        except PermissionError:
                            time.sleep(1)
                            dest.unlink()
                    elif dest.is_dir():
                        shutil.rmtree(dest)
                shutil.copy2(item, dest)
                break
            except Exception as e:
                print(f"Retrying copy of {item.name} due to error: {e}")
                time.sleep(1)

print("Emotion classifier model and tokenizer saved to:", final_path)



Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer_emo = Trainer(


Retrying copy of model.safetensors due to error: [WinError 5] Access is denied: 'saved_models\\emotion_classifier\\model.safetensors'
Retrying copy of model.safetensors due to error: [WinError 5] Access is denied: 'saved_models\\emotion_classifier\\model.safetensors'
Retrying copy of model.safetensors due to error: [WinError 5] Access is denied: 'saved_models\\emotion_classifier\\model.safetensors'
Emotion classifier model and tokenizer saved to: saved_models\emotion_classifier


## 5. Train T5 for Response Generation

In [456]:
# Load T5 for Response Generation

MODEL_NAME = "google/flan-t5-large"
SAVE_DIR = SAVE_ROOT / "flan_t5_response_generator"

try:
    if SAVE_DIR.is_dir() and any(SAVE_DIR.iterdir()):
        print("Loading previously fine-tuned flan-t5-large response generator...")
        resp_model = T5ForConditionalGeneration.from_pretrained(SAVE_DIR).to(device)
    else:
        raise FileNotFoundError
except Exception:
    print("Loading base FLAN-T5-XL model...")
    resp_model = T5ForConditionalGeneration.from_pretrained(MODEL_NAME).to(device)

Loading previously fine-tuned flan-t5-large response generator...


In [457]:
# Build input/target pairs: user text -> helpful response
# For now we use 'question' as input and 'answer' as target.

def build_t5_pairs(example):

    # Retrieve the question from either "question" or "text" columns,
    # and the corresponding answer from either "answer" or "response" columns.
    
    question = example.get("question") or example.get("text") or ""
    answer = example.get("answer") or example.get("response") or ""
    example["input_text"] = "respond: " + question
    example["target_text"] = answer
    return example

In [458]:
# Map the original training and testing datasets to build T5 pairs.
# We're using the original QA datasets (train_dataset and test_dataset) from our data-loading cell.

resp_train = train_dataset.map(build_t5_pairs)
resp_test  = test_dataset.map(build_t5_pairs)

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

In [459]:
# Load T5 tokenizer and model.

t5_resp_model_name = "google/flan-t5-large"
tokenizer_t5 = AutoTokenizer.from_pretrained(t5_resp_model_name)

In [460]:
# Tokenize the training and testing T5 pairs.

def t5_tokenize(batch):
    inputs = tokenizer_t5(batch["input_text"], padding="max_length", truncation=True, max_length=64)
    with tokenizer_t5.as_target_tokenizer():
        labels = tokenizer_t5(batch["target_text"], padding="max_length", truncation=True, max_length=64)
    inputs["labels"] = labels["input_ids"]
    return inputs

In [461]:
# Tokenize the training and testing T5 pairs.

resp_train_tok = resp_train.map(t5_tokenize, batched=True, remove_columns=resp_train.column_names)
resp_test_tok  = resp_test.map(t5_tokenize, batched=True, remove_columns=resp_test.column_names)

# Set datasets' format to output PyTorch tensors.

resp_train_tok.set_format("torch")
resp_test_tok.set_format("torch")

print(f"Tokenized training examples: {len(resp_train_tok)}; Tokenized testing examples: {len(resp_test_tok)}")

Map:   0%|          | 0/154 [00:00<?, ? examples/s]



Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Tokenized training examples: 154; Tokenized testing examples: 18


In [462]:
# Load FLAN-T5 Response Generation Model & Tokenizer from metadata

flan_resp_path = Path(model_paths["flan_t5_response_generator"])

try:
    if flan_resp_path.is_dir() and any(flan_resp_path.iterdir()):
        print("Loading fine-tuned FLAN-T5 model...")
        resp_model = T5ForConditionalGeneration.from_pretrained(flan_resp_path).to(device)
        resp_tokenizer = AutoTokenizer.from_pretrained(flan_resp_path)
    else:
        raise FileNotFoundError
except Exception:
    print("Loading base flan-t5-large model...")
    resp_model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large").to(device)
    resp_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")

Loading fine-tuned FLAN-T5 model...


In [463]:
# Compute ROUGE metric

def compute_metrics(eval_preds):
    
    rouge = evaluate.load("rouge")

    preds, labels = eval_preds

    if isinstance(preds, tuple):
        preds = preds[0]

    preds = np.argmax(preds, axis=-1)

    if isinstance(labels, tuple):
        labels = labels[0]
    labels = np.where(labels != -100, labels, resp_tokenizer.pad_token_id)

    decoded_preds = resp_tokenizer.batch_decode(preds, skip_special_tokens=True)
    decoded_labels = resp_tokenizer.batch_decode(labels, skip_special_tokens=True)

    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [label.strip() for label in decoded_labels]

    result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)

    return {
        "rougeL": result["rougeL"]
    }

In [464]:
# Set up the Seq2Seq training arguments.

resp_training_args = Seq2SeqTrainingArguments(
    output_dir="./saved_models/flan_t5_response_generator",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=3e-4,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    weight_decay=0.01,
    predict_with_generate=False,
    logging_dir="./logs",
    logging_strategy="steps",
    logging_steps=10,
    save_total_limit=2,
    load_best_model_at_end=True,
    save_safetensors=False,
    metric_for_best_model="rougeL"
)

# Assert that tokenized response data exists

assert 'resp_train_tok' in globals() and 'resp_test_tok' in globals(), "Tokenized response data is not defined."

# Create the Seq2SeqTrainer instance

resp_trainer = Seq2SeqTrainer(
    model=resp_model,  
    args=resp_training_args,
    train_dataset=resp_train_tok,  
    eval_dataset=resp_test_tok,    
    tokenizer=resp_tokenizer,      
    data_collator=DataCollatorForSeq2Seq(tokenizer=resp_tokenizer, model=resp_model),
    compute_metrics=compute_metrics  
)

# Begin training

# resp_trainer.train()

# Optional: Free up memory (especially useful on GPU systems)

gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Save model and tokenizer safely using a temp directory

with tempfile.TemporaryDirectory() as tmpdir:
    tmp_path = Path(tmpdir)

    # Save model and tokenizer to temp directory

    resp_model.save_pretrained(tmp_path, safe_serialization=False)
    resp_tokenizer.save_pretrained(tmp_path)

    # Define final destination

    final_path = SAVE_ROOT / "flan_t5_response_generator"
    final_path.mkdir(exist_ok=True, parents=True)

    # Move files, overwriting if needed (with retry logic for Windows locks)
    
    for item in tmp_path.iterdir():
        dest = final_path / item.name
        for _ in range(3):  # Try up to 3 times
            try:
                if dest.exists():
                    if dest.is_file():
                        try:
                            dest.unlink()
                        except PermissionError:
                            time.sleep(1)
                            dest.unlink()
                    elif dest.is_dir():
                        shutil.rmtree(dest)
                shutil.copy2(item, dest) 
                break  # success
            except Exception as e:
                print(f"Retrying copy of {item.name} due to error: {e}")
                time.sleep(1)

print("FLAN-T5 response generator model and tokenizer saved successfully.")



  resp_trainer = Seq2SeqTrainer(


Retrying copy of pytorch_model.bin due to error: [WinError 5] Access is denied: 'saved_models\\flan_t5_response_generator\\pytorch_model.bin'
Retrying copy of pytorch_model.bin due to error: [WinError 5] Access is denied: 'saved_models\\flan_t5_response_generator\\pytorch_model.bin'
Retrying copy of pytorch_model.bin due to error: [WinError 5] Access is denied: 'saved_models\\flan_t5_response_generator\\pytorch_model.bin'
FLAN-T5 response generator model and tokenizer saved successfully.


In [465]:
# Load the saved FLAN-T5 response generator model from disk

resp_model = T5ForConditionalGeneration.from_pretrained("./saved_models/flan_t5_response_generator").to(device)
print("Loaded response model: flan_t5_response_generator")

Loaded response model: flan_t5_response_generator


## Train T5 for Question‑Answer

In [466]:
# Load fine-tuned or base T5 QA model

qa_model_path = Path(model_paths["t5_qa"])

try:
    if qa_model_path.is_dir() and any(qa_model_path.iterdir()):
        print("Loading fine-tuned T5 QA model from:", qa_model_path)
        t5_qa_model = T5ForConditionalGeneration.from_pretrained(qa_model_path).to(device)
        t5_qa_tokenizer = AutoTokenizer.from_pretrained(qa_model_path)
    else:
        raise FileNotFoundError("No fine-tuned QA model found.")
except Exception as e:
    print(f"{e}")
    print("Loading base T5 model (google/flan-t5-base)...")
    t5_qa_model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base").to(device)
    t5_qa_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

Loading fine-tuned T5 QA model from: saved_models\t5_qa


In [467]:
# Build QA-style input/target pairs for T5 (question → answer)

def build_qa_pairs(example):
    question = example.get("question", "") or example.get("text", "")
    answer = example.get("answer", "") or example.get("response", "")

    return {
        "input_text": f"question: {question.strip()}",
        "target_text": answer.strip()
    }

In [468]:
# Load fresh T5 QA model and tokenizer (not from disk — training from scratch)

qa_model = T5ForConditionalGeneration.from_pretrained("t5-base").to(device)
qa_tokenizer = AutoTokenizer.from_pretrained("t5-base")

In [469]:
# Build input/target pairs for T5 (question → answer)

def qa_tokenize(batch):
    context = batch.get("context", [""] * len(batch["question"]))

    # Tokenize inputs

    inputs = qa_tokenizer(
        batch["question"],
        context,
        padding="max_length",
        truncation=True,
        max_length=64
    )

    # Tokenize targets

    labels = qa_tokenizer(
        batch["answer"],
        padding="max_length",
        truncation=True,
        max_length=64
    )["input_ids"]

    # IMPORTANT: convert list-of-lists into a tensor-friendly format

    inputs["labels"] = labels

    return inputs

In [470]:
# Build training and testing datasets

qa_train = train_dataset.map(build_qa_pairs)
qa_test  = test_dataset.map(build_qa_pairs)

qa_train_tok = qa_train.map(qa_tokenize, batched=True, remove_columns=qa_train.column_names)
qa_test_tok  = qa_test.map(qa_tokenize, batched=True, remove_columns=qa_test.column_names)

qa_train_tok.set_format("torch")
qa_test_tok.set_format("torch")

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Map:   0%|          | 0/18 [00:00<?, ? examples/s]

In [471]:
# Define evaluation metrics

def compute_metrics(eval_preds):
    
    rouge = evaluate.load("rouge")

    preds, labels = eval_preds

    if isinstance(preds, tuple):
        preds = preds[0]

    preds = np.argmax(preds, axis=-1)

    if isinstance(labels, tuple):
        labels = labels[0]
    labels = np.where(labels != -100, labels, t5_qa_tokenizer.pad_token_id)

    decoded_preds = t5_qa_tokenizer.batch_decode(preds, skip_special_tokens=True)
    decoded_labels = t5_qa_tokenizer.batch_decode(labels, skip_special_tokens=True)

    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [label.strip() for label in decoded_labels]

    result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)

    return {
        "rougeL": result["rougeL"]
    }

In [472]:
# Set up training arguments

qa_training_args = Seq2SeqTrainingArguments(
    output_dir="./saved_models/t5_qa",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=3e-4,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    weight_decay=0.01,
    predict_with_generate=False,
    logging_dir="./logs",
    logging_strategy="steps",
    logging_steps=10,
    save_total_limit=2,
    load_best_model_at_end=True,
    save_safetensors=False,  
    metric_for_best_model="rougeL"
)

# Assert to ensure tokenized data is defined

assert 'qa_train_tok' in globals() and 'qa_test_tok' in globals(), "Tokenized QA data is not defined."

# Create the Seq2SeqTrainer instance

qa_trainer = Seq2SeqTrainer(
    model=t5_qa_model,
    args=qa_training_args,
    train_dataset=qa_train_tok,
    eval_dataset=qa_test_tok,
    tokenizer=t5_qa_tokenizer,
    data_collator=DataCollatorForSeq2Seq(tokenizer=t5_qa_tokenizer, model=t5_qa_model),
    compute_metrics=compute_metrics
)

# Train the QA model

# qa_trainer.train()

# Optional: Free up memory (especially useful on GPU systems)

gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Save model and tokenizer safely using a temp directory

with tempfile.TemporaryDirectory() as tmpdir:
    tmp_path = Path(tmpdir)

    # Save model + tokenizer to temp dir

    qa_trainer.save_model(tmp_path)
    t5_qa_tokenizer.save_pretrained(tmp_path)

    # Final destination

    final_path = SAVE_ROOT / "t5_qa"
    final_path.mkdir(parents=True, exist_ok=True)

    # Move files: overwrite by removing first if necessary

    for item in tmp_path.iterdir():
        dest = final_path / item.name
        try:
            if dest.exists():
                if dest.is_file():
                    dest.unlink()
                elif dest.is_dir():
                    shutil.rmtree(dest)
            shutil.copy2(item, dest)  
        except Exception as e:
            print(f"Failed to copy {item.name}: {e}")

print("T5 QA model and tokenizer saved safely.")



  qa_trainer = Seq2SeqTrainer(


Failed to copy pytorch_model.bin: [WinError 5] Access is denied: 'saved_models\\t5_qa\\pytorch_model.bin'
T5 QA model and tokenizer saved safely.


In [473]:
# Load the fine-tuned T5 QA model, or fall back to base if not found

qa_model_path = SAVE_ROOT / "t5_qa"

try:
    if qa_model_path.is_dir() and any(qa_model_path.iterdir()):
        print("Loading previously fine-tuned T5 QA model...")
        qa_model = T5ForConditionalGeneration.from_pretrained(qa_model_path).to(device)
    else:
        raise FileNotFoundError("T5 QA model directory is empty or missing.")
except Exception as e:
    print(f"Warning: {e}")
    print("Falling back to base T5 model (google/flan-t5-large)...")
    qa_model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large").to(device)

Loading previously fine-tuned T5 QA model...


## Model Deployment Setup with Combined Metadata

In [474]:
# Save Combined Model Metadata

metadata = {
    "emotion_classifier": str(SAVE_ROOT / "emotion_classifier"),
    "flan_t5_response_generator": str(SAVE_ROOT / "flan_t5_response_generator"), 
    "t5_qa": str(SAVE_ROOT / "t5_qa")
}

metadata_path = SAVE_ROOT / "combined_model_metadata.pt"
torch.save(metadata, metadata_path)
print("Saved combined model metadata to:", metadata_path)


Saved combined model metadata to: saved_models\combined_model_metadata.pt


In [475]:
# Load Models Using Combined Metadata

metadata_path = SAVE_ROOT / "combined_model_metadata.pt"

if metadata_path.exists():
    model_paths = torch.load(metadata_path, map_location=device)
    print("Loaded model metadata:", model_paths)
else:
    raise FileNotFoundError(f"Metadata file not found at {metadata_path}. Please ensure it exists.")

# Load Emotion Classification Model & Tokenizer from metadata

emo_model = AutoModelForSequenceClassification.from_pretrained(model_paths["emotion_classifier"]).to(device)
emo_tokenizer = AutoTokenizer.from_pretrained(model_paths["emotion_classifier"])

print("All models loaded from metadata.")

Loaded model metadata: {'emotion_classifier': 'saved_models\\emotion_classifier', 'flan_t5_response_generator': 'saved_models\\flan_t5_response_generator', 't5_qa': 'saved_models\\t5_qa'}
All models loaded from metadata.


## Unified Pipeline

In [476]:
# Load Fine‑Tuned Models Using Combined Metadata. Define the path to the combined metadata file.

metadata_path = SAVE_ROOT / "combined_model_metadata.pt"

if metadata_path.exists():
    model_paths = torch.load(metadata_path, map_location=device,)
    print("Loaded model metadata:", model_paths)
else:
    raise FileNotFoundError(f"Metadata file not found at {metadata_path}. Please ensure it exists.")

Loaded model metadata: {'emotion_classifier': 'saved_models\\emotion_classifier', 'flan_t5_response_generator': 'saved_models\\flan_t5_response_generator', 't5_qa': 'saved_models\\t5_qa'}


In [477]:
# Load Emotion Classification Model & Tokenizer from metadata.

emo_model = AutoModelForSequenceClassification.from_pretrained(model_paths["emotion_classifier"]).to(device)
emo_tokenizer = AutoTokenizer.from_pretrained(model_paths["emotion_classifier"])
emo_model.eval()  

# Load FLAN-T5 Response Generation Model & Tokenizer from metadata.

resp_model = T5ForConditionalGeneration.from_pretrained(model_paths["flan_t5_response_generator"]).to(device)
resp_tokenizer = AutoTokenizer.from_pretrained(model_paths["flan_t5_response_generator"])

In [478]:
# Define Emotion Labels. These labels match those used in our emotion annotation step.

DEFAULT_LABELS = [
    'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity',
    'desire', 'disappointment', 'disapproval', 'embarrassment', 'excitement', 'fear', 'gratitude',
    'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse',
    'sadness', 'surprise', 'neutral'
]
NUM_EMO_LABELS = emo_model.config.num_labels
EMOTION_LABELS = DEFAULT_LABELS[:NUM_EMO_LABELS]

In [479]:
# Optionally, define a subset for emotion-based routing.

emotion_router_labels = {'confusion', 'caring', 'nervousness', 'grief', 'sadness', 'fear', 'remorse', 'love', 'anger'}

In [480]:
# Helper Functions

def detect_emotions(text):
    """
    Detects emotions in the provided text using the fine‑tuned emotion classifier.
    Returns a list of emotion labels whose corresponding probabilities exceed 0.3.
    """
    inputs = emo_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(emo_model.device)
    with torch.no_grad():
        logits = emo_model(**inputs).logits
    probs = torch.sigmoid(logits).cpu().numpy()[0]
    detected = [EMOTION_LABELS[i] for i, p in enumerate(probs) if p > 0.3]
    return detected if detected else ["neutral"]

In [481]:
# Helper Functions

def format_input_prompt(user_input, language="English", history=None, emotions=None):
    emotion_note = ""
    if emotions and emotions != ["neutral"]:
        emotion_note = f"The user seems to feel {' and '.join(emotions)}. "
    if history:
        combined = "\n".join(history + [user_input])
        return (f"respond: {emotion_note}The conversation so far:\n{combined}")
    return (f"respond: {emotion_note}{user_input}")

In [482]:
# Unified Chatbot Pipeline Class

class MentalHealthChatbotPipeline:
    def __init__(self, labels, device="cpu"):
        self.device = device
        self.labels = labels
        self.chat_history = []

        # Load models (assumed already loaded globally from metadata)

        self.emo_model = emo_model.eval()
        self.qa_model = qa_model.eval()
        self.resp_model = resp_model.eval()

    def __call__(self, text, max_length=64):
        self.chat_history.append(("User", text))

        # Detect emotions

        emotions = detect_emotions(text)

        # Decide which model to use

        if "?" in text:
            model, tokenizer = self.qa_model, qa_tokenizer
        else:
            model, tokenizer = self.resp_model, resp_tokenizer

        try:
            # Prepare input
            
            inputs = tokenizer(
                text,
                return_tensors="pt",
                truncation=True,
                padding=True
            ).to(self.device)

            # Generate output

            with torch.no_grad():
                output_ids = model.generate(**inputs, max_length=max_length)
            reply = tokenizer.decode(output_ids[0], skip_special_tokens=True)

        except Exception as e:
            reply = "Sorry, something went wrong."

        self.chat_history.append(("Bot", reply))

        return {
            "Detected Emotions": emotions,
            "Response": reply,
            "History": self.chat_history
        }

In [483]:
# Unified Chatbot Pipeline Class

def generate_chatbot_response(user_text, language, use_history, history, route_by_emotion, persist):
    history = history or []
    emotions = detect_emotions(user_text)

    use_resp_model = any(e in emotion_router_labels for e in emotions) if route_by_emotion else False

    if use_resp_model:
        prompt = format_input_prompt(
            user_text,
            language=language,
            history=history if use_history else None,
            emotions=emotions
        )
        inputs = resp_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True).to(resp_model.device)
        model, tokenizer = resp_model, resp_tokenizer
    else:
        prompt = "question: " + user_text
        inputs = t5_qa_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True).to(qa_model.device)
        model, tokenizer = qa_model, t5_qa_tokenizer

    with torch.no_grad():
        output_ids = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=64,
            num_beams=4,
            no_repeat_ngram_size=2,
            early_stopping=True
        )

    response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    full_history = history + [f"User: {user_text}", f"Bot: {response}"]

    if persist:
        with open("chatlog.txt", "a", encoding="utf-8") as log:
            log.write(f"\n[{datetime.datetime.now()}]\n{full_history[-2]}\n{full_history[-1]}\nDetected emotions: {emotions}\n")

    return response, ", ".join(emotions), full_history

In [484]:
# Instantiate the chatbot pipeline.

chatbot = MentalHealthChatbotPipeline(labels=EMOTION_LABELS, device=device)

## Gradio Interface for the Mental Health Chatbot (for testing)

In [485]:
# Load Fine-Tuned Models Using Combined Metadata

metadata_path = SAVE_ROOT / "combined_model_metadata.pt"

if metadata_path.exists():
    model_paths = torch.load(metadata_path, map_location=device)
    print("Loaded model metadata:", model_paths)
else:
    raise FileNotFoundError(f"Metadata file not found at {metadata_path}. Please ensure it exists.")

Loaded model metadata: {'emotion_classifier': 'saved_models\\emotion_classifier', 'flan_t5_response_generator': 'saved_models\\flan_t5_response_generator', 't5_qa': 'saved_models\\t5_qa'}


In [486]:
# Load Response Generation Model & Tokenizer

resp_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
base_model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large").to(device)

# Apply LoRA

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q", "v"], 
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM
)

resp_model = get_peft_model(base_model, lora_config)
resp_model.print_trainable_parameters() 

trainable params: 2,359,296 || all params: 785,509,376 || trainable%: 0.3004


In [487]:
# Define the complete list of 28 emotion labels (as in the GoEmotions baseline)

DEFAULT_LABELS = [
    'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity',
    'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear',
    'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief',
    'remorse', 'sadness', 'surprise', 'neutral'
]

In [488]:
# Load Fine-Tuned Emotion Classifier

NUM_EMO_LABELS = emo_model.config.num_labels
EMOTION_LABELS = DEFAULT_LABELS[:NUM_EMO_LABELS]

# Define a subset of emotions for routing decisions (if needed)

emotion_router_labels = set(EMOTION_LABELS) & {'confusion', 'caring', 'nervousness', 'grief', 'sadness', 'fear', 'remorse', 'love', 'anger'}

In [489]:
# Helper Functions

def format_input_prompt(user_input, language="English", history=None, emotions=None):
    emotion_note = ""
    if emotions and emotions != ["neutral"]:
        emotion_note = f"The user seems to feel {' and '.join(emotions)}. "
    if history:
        combined = "\n".join(history + [user_input])
        return (f"respond: {emotion_note}The conversation so far:\n{combined}")
    return (f"respond: {emotion_note}{user_input}")

In [490]:
# Helper Functions

def detect_emotions(text):
    """
    Detects emotions in the provided text using the fine‑tuned emotion classifier.
    Returns a list of emotion labels whose corresponding probabilities exceed 0.3.
    """
    inputs = emo_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(emo_model.device)
    with torch.no_grad():
        logits = emo_model(**inputs).logits
    probs = torch.sigmoid(logits).cpu().numpy()[0]
    
    # Safety check: trim probabilities if there are more than expected.
    
    if len(probs) > len(EMOTION_LABELS):
        print(f"Warning: Received {len(probs)} probabilities; expected {len(EMOTION_LABELS)}. Trimming extra values.")
        probs = probs[:len(EMOTION_LABELS)]
    
    detected = [EMOTION_LABELS[i] for i, p in enumerate(probs) if p > 0.3]
    return detected if detected else ["neutral"]

In [491]:
def generate_chatbot_response(user_input, language="English", use_history=False, history=None, route_by_emotion=False, persist=False):
    history = history or []
    emotions = detect_emotions(user_input)

    use_resp_model = any(e in emotion_router_labels for e in emotions) if route_by_emotion else False

    if use_resp_model:
        
        # Add emotion and history into the prompt

        emotion_note = f"The user seems to feel {', '.join(emotions)}. " if emotions and emotions != ["neutral"] else ""
        if use_history and history:
            conversation = "\n".join(history + [user_input])
            prompt = f"respond: {emotion_note}The conversation so far:\n{conversation}"
        else:
            prompt = f"respond: {emotion_note}{user_input}"
        inputs = resp_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True).to(resp_model.device)
        model, tokenizer = resp_model, resp_tokenizer
    else:
        prompt = "question: " + user_input
        inputs = qa_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True).to(qa_model.device)
        model, tokenizer = qa_model, qa_tokenizer

    with torch.no_grad():
        output_ids = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=100,
            min_length=20,
            num_beams=4,
            no_repeat_ngram_size=2,
            early_stopping=True
        )

    response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    full_history = history + [f"User: {user_input}", f"Bot: {response}"]

    if persist:
        with open("chatlog.txt", "a", encoding="utf-8") as log:
            log.write(f"\n[{datetime.datetime.now()}]\n{full_history[-2]}\n{full_history[-1]}\nDetected emotions: {emotions}\n")

    return response, emotions, full_history



In [492]:
# Generate a chatbot response (emotions will be detected inside the function)

def chatbot_interface(user_input):
    
    response, emotions, _ = generate_chatbot_response(
        user_input=user_input,
        language="English",
        use_history=False,
        history=[],
        route_by_emotion=True,
        persist=False
    )

    return f"Detected Emotions: {', '.join(emotions)}\n\nBot: {response}"

gr.Interface(
    fn=chatbot_interface,
    inputs=gr.Textbox(label="User Input"),
    outputs=gr.Textbox(label="Chatbot Response"),
    title="🧠 Happy Brain Mental Health Chatbot",
    description="Emotion-aware supportive chatbot (text only)"
).launch()

* Running on local URL:  http://127.0.0.1:7867

To create a public link, set `share=True` in `launch()`.




## Metrics and Evaluation

In [493]:
# Evaluation for Emotion Classification

def evaluate_emotion_classifier(model, tokenizer, dataset, batch_size=16):
    """
    Evaluate the emotion classifier over the provided dataset.
    Computes micro-averaged F1, Precision, Recall, and Subset Accuracy.
    Assumes that dataset is formatted with columns "input_ids", "attention_mask", "labels"
    and that labels is a multi-label binary vector.
    """
    model.eval()
    all_preds = []
    all_labels = []
    
    # Create a DataLoader for batch processing (if dataset is not huge, you can loop through it directly)

    dataloader = DataLoader(dataset, batch_size=batch_size)
    
    for batch in dataloader:
        
        # Move inputs and labels to device

        inputs = {k: v.to(device) for k, v in batch.items() if k != "labels"}
        labels = batch["labels"].numpy()
        
        with torch.no_grad():
            logits = model(**inputs).logits

        # Apply sigmoid for multi-label classification and threshold at 0.3

        preds = (torch.sigmoid(logits) > 0.3).cpu().numpy().astype(int)
        
        all_preds.append(preds)
        all_labels.append(labels)
    
    all_preds = np.concatenate(all_preds, axis=0)
    all_labels = np.concatenate(all_labels, axis=0)
    
    # Compute micro-averaged metrics
    
    micro_f1 = f1_score(all_labels, all_preds, average="micro", zero_division=0)
    micro_precision = precision_score(all_labels, all_preds, average="micro", zero_division=0)
    micro_recall = recall_score(all_labels, all_preds, average="micro", zero_division=0)
    subset_acc = accuracy_score(all_labels, all_preds)  # subset accuracy is strict
    
    return {
        "Emotion Classifier Micro-F1": micro_f1,
        "Emotion Classifier Micro-Precision": micro_precision,
        "Emotion Classifier Micro-Recall": micro_recall,
        "Emotion Classifier Subset Accuracy": subset_acc,
    }

In [494]:
# Evaluation

print("Evaluating Emotion Classifier...")
emo_metrics = evaluate_emotion_classifier(emo_model, emo_tokenizer, emo_test_tok)
for metric, value in emo_metrics.items():
    print(f"{metric}: {value:.4f}")

Evaluating Emotion Classifier...
Emotion Classifier Micro-F1: 0.2000
Emotion Classifier Micro-Precision: 0.1562
Emotion Classifier Micro-Recall: 0.2778
Emotion Classifier Subset Accuracy: 0.0000


In [495]:
# Evaluation for Generation Models (Response Generation and QA). We already defined a compute_resp_metrics function in the training cells.
# Here, we define a helper to compute additional perplexity based on the evaluation loss.

def evaluate_generation_model(trainer, test_dataset):
    """
    Uses the Seq2SeqTrainer to compute evaluation metrics over the given test dataset.
    Adds perplexity (exp(eval_loss)) to the standard metrics.
    """
    # Predict returns a dictionary with metrics: eval_loss, and any metrics computed in compute_metrics.

    result = trainer.predict(test_dataset)
    eval_loss = result.metrics.get("eval_loss")
    
    # Compute perplexity if loss is available. (If eval_loss is zero or not available, perplexity is undefined.)
    
    if eval_loss is not None and eval_loss > 0:
        perplexity = math.exp(eval_loss)
    else:
        perplexity = float("inf")
    
    # Add perplexity to the metrics dictionary.
    
    result.metrics["perplexity"] = perplexity
    return result.metrics

In [496]:
# Evaluation

print("\nEvaluating T5 Response Generation Model...")
resp_metrics = evaluate_generation_model(resp_trainer, resp_test_tok)
for metric, value in resp_metrics.items():
    print(f"T5 Response Generation {metric}: {value:.4f}")

print("\nEvaluating T5 QA Model...")
qa_metrics = evaluate_generation_model(qa_trainer, qa_test_tok)
for metric, value in qa_metrics.items():
    print(f"T5 QA {metric}: {value:.4f}")


Evaluating T5 Response Generation Model...


T5 Response Generation test_loss: 14.2457
T5 Response Generation test_model_preparation_time: 0.0240
T5 Response Generation test_rougeL: 0.3537
T5 Response Generation test_runtime: 10.7264
T5 Response Generation test_samples_per_second: 1.6780
T5 Response Generation test_steps_per_second: 0.4660
T5 Response Generation perplexity: inf

Evaluating T5 QA Model...


T5 QA test_loss: 2.7052
T5 QA test_model_preparation_time: 0.0080
T5 QA test_rougeL: 0.4148
T5 QA test_runtime: 3.7120
T5 QA test_samples_per_second: 4.8490
T5 QA test_steps_per_second: 1.3470
T5 QA perplexity: inf


In [497]:
# Evaluation

def compute_generation_metrics(trainer, dataset, tokenizer):
    rouge = load("rouge")
    bertscore = load("bertscore")
    
    results = trainer.predict(dataset)
    predictions, labels = results.predictions, results.label_ids

    # If predictions come as a tuple (e.g., logits), extract the actual token IDs

    if isinstance(predictions, tuple):
        predictions = predictions[0]
    
    # Convert logits to token IDs if needed (argmax across vocab dim)
    
    if predictions.ndim == 3:
        predictions = np.argmax(predictions, axis=-1)
    
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)

    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [label.strip() for label in decoded_labels]

    r = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    b = bertscore.compute(predictions=decoded_preds, references=decoded_labels, lang="en")

    return {
        "rougeL": r["rougeL"],
        "bertscore_f1": np.mean(b["f1"])
    }