# TabLLM Fine-Tuning for Postpartum Depression Classification

This notebook implements the fine-tuning approach from the TabLLM paper using **BigScience T0** models.

**Paper**: [TabLLM: Few-shot Classification of Tabular Data with Large Language Models](https://arxiv.org/pdf/2210.10723)

**GitHub**: [https://github.com/clinicalml/TabLLM](https://github.com/clinicalml/TabLLM)

## Overview

This notebook uses the **bigscience/T0_3B** model (as per TabLLM paper) with parameter-efficient fine-tuning.

**Model Options**:
- `bigscience/T0_3B` (3 billion parameters) - Recommended for Colab
- `bigscience/T0` or `bigscience/T0pp` (11 billion parameters) - Requires A100 GPU

## Requirements

- **Runtime**: GPU (V100 or A100 recommended for T0_3B)
- **RAM**: High-RAM runtime **required**
- **Estimated Time**: 45-90 minutes for full training

## Setup Instructions

1. **Enable High-RAM GPU**: Runtime → Change runtime type → GPU + High-RAM
2. Upload dataset files or mount Google Drive
3. Run all cells in order


## 1. Environment Setup

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install required packages
!pip install -q transformers==4.30.0 datasets==2.14.0 accelerate==0.20.0 
!pip install -q sentencepiece protobuf scikit-learn pandas numpy
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

In [None]:
import os
import re
import json
import numpy as np
import pandas as pd
from pathlib import Path
from typing import List, Dict
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import (
    AutoTokenizer, AutoModelForSeq2SeqLM,
    Trainer, TrainingArguments, DataCollatorForSeq2Seq
)
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, confusion_matrix, classification_report
)
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## 2. Data Upload

**Option 1**: Upload files directly
- Click the folder icon in the left sidebar
- Upload `train_postpartum_depression.csv` and `test_postpartum_depression.csv`

**Option 2**: Mount Google Drive

In [None]:
# Option 2: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Set data directory (adjust path as needed)
DATA_DIR = '/content'  # If uploaded directly
# DATA_DIR = '/content/drive/MyDrive/your_folder'  # If using Google Drive

## 3. Template Definition

Define TabLLM templates for text serialization

In [None]:
# Define feature names for template
postpartum_feature_names = [
    ('age', 'age range'),
    ('feeling_sad_or_tearful', 'feeling sad or tearful'),
    ('irritable_towards_baby_partner', 'irritable towards baby and partner'),
    ('trouble_sleeping_at_night', 'trouble sleeping at night'),
    ('problems_concentrating_or_making_decision', 'problems concentrating or making decision'),
    ('overeating_or_loss_of_appetite', 'overeating or loss of appetite'),
    ('feeling_of_guilt', 'feeling of guilt'),
    ('problems_of_bonding_with_baby', 'problems of bonding with baby'),
    ('suicide_attempt', 'suicide attempt history'),
]

# Create template
template_parts = [f"- {desc}: {{{var}}}" for var, desc in postpartum_feature_names]
template = "\n".join(template_parts)

print("Template:")
print(template)

In [None]:
def clean_note(note):
    """Clean generated note text"""
    note = re.sub(r"[ \t]+", " ", note)
    note = re.sub("\n\n\n+", "\n\n", note)
    note = re.sub(r"^[ \t]+", "", note)
    note = re.sub(r"\n[ \t]+", "\n", note)
    note = re.sub(r"[ \t]$", "", note)
    note = re.sub(r"[ \t]\n", "\n", note)
    return note.strip()


def serialize_row(row, template_str):
    """Serialize a single row to text"""
    text = template_str
    for var, _ in postpartum_feature_names:
        value = str(row[var]) if var in row else "N/A"
        text = text.replace(f"{{{var}}}", value)
    return clean_note(text)


def load_and_preprocess_data(data_dir):
    """Load and preprocess postpartum depression dataset"""
    print("Loading dataset...")
    
    train_df = pd.read_csv(f"{data_dir}/train_postpartum_depression.csv")
    test_df = pd.read_csv(f"{data_dir}/test_postpartum_depression.csv")
    
    print(f"Train samples: {len(train_df)}, Test samples: {len(test_df)}")
    
    # Column mapping
    column_mapping = {
        'timestamp': 'timestamp',
        'age': 'age',
        'feeling sad or tearful': 'feeling_sad_or_tearful',
        'irritable towards baby & partner': 'irritable_towards_baby_partner',
        'trouble sleeping at night': 'trouble_sleeping_at_night',
        'problems concentrating or making decision': 'problems_concentrating_or_making_decision',
        'overeating or loss of appetite': 'overeating_or_loss_of_appetite',
        'feeling anxious': 'label',
        'feeling of guilt': 'feeling_of_guilt',
        'problems of bonding with baby': 'problems_of_bonding_with_baby',
        'suicide attempt': 'suicide_attempt'
    }
    
    train_df = train_df.rename(columns=column_mapping)
    test_df = test_df.rename(columns=column_mapping)
    
    # Drop timestamp
    train_df = train_df.drop(columns=['timestamp'])
    test_df = test_df.drop(columns=['timestamp'])
    
    # Convert labels
    label_mapping = {'Yes': 'Yes', 'No': 'No'}
    train_df['label'] = train_df['label'].map(label_mapping).fillna('No')
    test_df['label'] = test_df['label'].map(label_mapping).fillna('No')
    
    print(f"Train label distribution:\n{train_df['label'].value_counts()}")
    print(f"Test label distribution:\n{test_df['label'].value_counts()}")
    
    return train_df, test_df

## 4. Data Loading and Serialization

In [None]:
# Load data
train_df, test_df = load_and_preprocess_data(DATA_DIR)

# Serialize to text
print("\nSerializing data to text...")
train_texts = [serialize_row(row, template) for _, row in train_df.iterrows()]
test_texts = [serialize_row(row, template) for _, row in test_df.iterrows()]

print(f"\nExample serialized text:\n{train_texts[0]}")
print(f"\nLabel: {train_df.iloc[0]['label']}")

## 5. Dataset Preparation

In [None]:
class PostpartumDataset(Dataset):
    """Dataset for postpartum depression classification"""
    
    def __init__(self, texts, labels, tokenizer, max_length=512):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        
        # T0 models don't need explicit task prefix
        # They were instruction-tuned and can handle raw text
        input_text = text
        
        # Tokenize
        inputs = self.tokenizer(
            input_text,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )
        
        # Tokenize label
        labels = self.tokenizer(
            label,
            max_length=10,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )
        
        return {
            'input_ids': inputs['input_ids'].squeeze(),
            'attention_mask': inputs['attention_mask'].squeeze(),
            'labels': labels['input_ids'].squeeze()
        }


# Initialize tokenizer and model
print("Loading tokenizer and model...")
print("⚠️ This will download ~12GB for T0_3B. Please wait...")

# MODEL OPTIONS (uncomment one):
MODEL_NAME = "bigscience/T0_3B"  # 3B params - Works on V100/A100 with High-RAM
# MODEL_NAME = "bigscience/T0pp"   # 11B params - Requires A100 40GB+
# MODEL_NAME = "bigscience/T0"     # 11B params - Requires A100 40GB+

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Create datasets
train_dataset = PostpartumDataset(
    train_texts,
    train_df['label'].tolist(),
    tokenizer
)

test_dataset = PostpartumDataset(
    test_texts,
    test_df['label'].tolist(),
    tokenizer
)

print(f"Train dataset size: {len(train_dataset)}")
print(f"Test dataset size: {len(test_dataset)}")

## 6. Model Fine-Tuning

Using **BigScience T0** model as per TabLLM paper.

**Note**: T0_3B requires ~12GB GPU memory + High-RAM runtime.

In [None]:
# Load model
print("Loading T0 model (this may take a few minutes)...")
model = AutoModelForSeq2SeqLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.bfloat16,  # Use bf16 as per TabLLM paper config
    device_map="auto"  # Automatically distribute across available devices
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Model loaded on {device}")
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
# Training arguments (adjusted for T0_3B)
OUTPUT_DIR = "/content/tabllm_t0_finetuned"

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=5,  # Fewer epochs for large model
    per_device_train_batch_size=4,  # Smaller batch for T0_3B
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=4,  # Effective batch size = 4*4=16
    learning_rate=3e-5,  # Lower LR for large pre-trained model
    warmup_steps=100,
    weight_decay=0.01,
    logging_dir=f"{OUTPUT_DIR}/logs",
    logging_steps=20,
    eval_steps=100,
    save_steps=200,
    evaluation_strategy="steps",
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    bf16=True,  # Use BF16 as per TabLLM config
    report_to="none",
    remove_unused_columns=False
)

print("Training configuration:")
print(f"Model: {MODEL_NAME}")
print(f"Epochs: {training_args.num_train_epochs}")
print(f"Batch size: {training_args.per_device_train_batch_size}")
print(f"Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"Learning rate: {training_args.learning_rate}")
print(f"BF16: {training_args.bf16}")

In [None]:
# Data collator
data_collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer,
    model=model,
    padding=True
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
)

print("Trainer initialized")
print("\n⚠️ Training will take 45-90 minutes on V100/A100...")

In [None]:
# Train model
print("Starting training with T0_3B model...")
train_result = trainer.train()

print("\nTraining completed!")
print(f"Training loss: {train_result.training_loss:.4f}")

# Save model
trainer.save_model(f"{OUTPUT_DIR}/final_model")
tokenizer.save_pretrained(f"{OUTPUT_DIR}/final_model")
print(f"Model saved to {OUTPUT_DIR}/final_model")

## 7. Evaluation

In [None]:
def predict_batch(model, tokenizer, texts, batch_size=8, device='cuda'):
    """Make predictions on a batch of texts"""
    model.eval()
    predictions = []
    
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]
        
        # Prepare inputs (no task prefix needed for T0)
        encoded = tokenizer(
            batch_texts,
            padding=True,
            truncation=True,
            max_length=512,
            return_tensors='pt'
        ).to(device)
        
        # Generate predictions
        with torch.no_grad():
            outputs = model.generate(
                **encoded,
                max_length=10,
                num_beams=4,
                early_stopping=True
            )
        
        # Decode predictions
        batch_preds = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        predictions.extend(batch_preds)
        
        if (i // batch_size + 1) % 10 == 0:
            print(f"Processed {i+len(batch_texts)}/{len(texts)} samples")
    
    return predictions


# Make predictions on test set
print("Making predictions on test set...")
predictions = predict_batch(model, tokenizer, test_texts, device=device)

# Convert predictions to binary
def parse_prediction(pred):
    pred = pred.strip().lower()
    return 1 if 'yes' in pred else 0

pred_binary = [parse_prediction(p) for p in predictions]
true_binary = [1 if label == 'Yes' else 0 for label in test_df['label'].tolist()]

print(f"\nExample predictions:")
for i in range(5):
    print(f"True: {test_df.iloc[i]['label']}, Predicted: {predictions[i]}")

In [None]:
# Calculate metrics
accuracy = accuracy_score(true_binary, pred_binary)
precision = precision_score(true_binary, pred_binary, average='binary', zero_division=0)
recall = recall_score(true_binary, pred_binary, average='binary', zero_division=0)
f1 = f1_score(true_binary, pred_binary, average='binary', zero_division=0)

try:
    auc_roc = roc_auc_score(true_binary, pred_binary)
except:
    auc_roc = 0.5

cm = confusion_matrix(true_binary, pred_binary)

print("="*80)
print("TABLLM FINE-TUNING RESULTS (T0_3B) - Postpartum Depression")
print("="*80)
print(f"Model: {MODEL_NAME}")
print(f"Accuracy:  {accuracy:.4f} ({accuracy*100:.2f}%)")
print(f"Precision: {precision:.4f} ({precision*100:.2f}%)")
print(f"Recall:    {recall:.4f} ({recall*100:.2f}%)")
print(f"F1-Score:  {f1:.4f} ({f1*100:.2f}%)")
print(f"AUC-ROC:   {auc_roc:.4f} ({auc_roc*100:.2f}%)")
print(f"\nConfusion Matrix:")
print(f"TN: {cm[0][0]}, FP: {cm[0][1]}")
print(f"FN: {cm[1][0]}, TP: {cm[1][1]}")
print("="*80)

print("\nDetailed Classification Report:")
print(classification_report(true_binary, pred_binary, target_names=['No Anxiety', 'Anxiety']))

## 8. Save Results

In [None]:
# Save metrics
results = {
    'model': MODEL_NAME,
    'metrics': {
        'accuracy': float(accuracy),
        'precision': float(precision),
        'recall': float(recall),
        'f1_score': float(f1),
        'auc_roc': float(auc_roc)
    },
    'confusion_matrix': cm.tolist(),
    'classification_report': classification_report(true_binary, pred_binary, output_dict=True)
}

with open(f"{OUTPUT_DIR}/finetuning_metrics.json", 'w') as f:
    json.dump(results, f, indent=2)

# Save predictions
predictions_df = pd.DataFrame({
    'true_label': test_df['label'].tolist(),
    'predicted_label': predictions,
    'true_binary': true_binary,
    'predicted_binary': pred_binary
})
predictions_df.to_csv(f"{OUTPUT_DIR}/finetuning_predictions.csv", index=False)

print(f"Results saved to {OUTPUT_DIR}")
print(f"\nFiles created:")
print(f"- {OUTPUT_DIR}/final_model/ (trained model)")
print(f"- {OUTPUT_DIR}/finetuning_metrics.json")
print(f"- {OUTPUT_DIR}/finetuning_predictions.csv")

## 9. Download Results

⚠️ **Note**: Model is ~12GB. Consider saving only metrics/predictions if space is limited.

In [None]:
# Option 1: Zip only metrics and predictions (small)
!zip -r /content/tabllm_results_metrics_only.zip {OUTPUT_DIR}/*.json {OUTPUT_DIR}/*.csv

print("Metrics and predictions zipped!")
print("Download 'tabllm_results_metrics_only.zip' from the files panel")

# Option 2: Zip everything including model (WARNING: ~12GB)
# !zip -r /content/tabllm_results_full.zip {OUTPUT_DIR}

# Option 3: Copy to Google Drive
# !cp -r {OUTPUT_DIR} /content/drive/MyDrive/tabllm_results

## Model Information

### BigScience T0 Models

The TabLLM paper uses **T0** models which are instruction-tuned variants of T5:

| Model | Parameters | GPU Required | Training Time |
|-------|-----------|--------------|---------------|
| `bigscience/T0_3B` | 3B | V100/A100 + High-RAM | 45-90 min |
| `bigscience/T0pp` | 11B | A100 40GB+ | 2-4 hours |
| `bigscience/T0` | 11B | A100 40GB+ | 2-4 hours |

### T0 vs T5

**T0 advantages**:
- Instruction-tuned on diverse NLP tasks
- Better zero/few-shot performance
- As used in TabLLM paper

**T5 advantages** (for comparison):
- Smaller models available (t5-small, t5-base)
- Faster training on limited hardware
- Lower memory requirements

### Alternative: T5 Models

If T0_3B doesn't fit in memory, you can try:
```python
MODEL_NAME = "t5-base"  # 220M params - Works on T4
MODEL_NAME = "t5-large" # 770M params - Works on V100
```

Performance may be 2-5% lower than T0, but still competitive.

## Troubleshooting

**Out of Memory with T0_3B**:
- Enable High-RAM runtime
- Reduce batch size to 2
- Increase gradient accumulation steps
- Try t5-large instead

**Slow Training**:
- Verify BF16 is enabled
- Check GPU utilization
- Reduce max_length if possible

## References

- [TabLLM Paper](https://arxiv.org/pdf/2210.10723)
- [T0 Paper](https://arxiv.org/abs/2110.08207)
- [BigScience T0 Models](https://huggingface.co/bigscience/T0)
