# Lab 5: Neural Network Classification with scikit-learn

---
## 1. Notebook Overview

### 1.1 Objective
- Re-use the most frequent words (optional: per class) you found for
your Naive Bayes classifier last week.

- Construct binary vectors for your whole dataset. Each dimension states
whether the word is part of the sample or not.

- Create a small neural network using scikit-learn: https://scikit-learn.org/
stable/modules/neural_networks_supervised.html. Start with three
hidden layers of 128/64/128 neurons. Consider what your input and
output layers should look like.

- Train your network on your training set and test it on your test set.
Calculate evaluation measures and compare with your previous
classifier.

- Optional: Experiment with different network sizes.

### 1.2 Prerequisites
This notebook assumes you have already executed:
- **Lab 2**: Data preprocessing → `../Data/tweets_preprocessed_train.parquet`, `../Data/tweets_preprocessed_test.parquet`, `../Data/tweets_preprocessed_validation.parquet`
- **Lab 3**: Language modeling
- **Lab 4**: Feature extraction → `../Data/top_1000_vocabulary.json`

### 1.3 Architecture
We implement neural networks with:
- **Input layer**: 1000 features (Top 1000 vocabulary from Lab 4)
- **Hidden layers**: 128 → 64 → 128 neurons (as specified)
- **Output layer**: 
  - Multi-label: 19 binary classifiers (one per topic class, using OneVsRestClassifier)
  - Single-label: 19 classes with Softmax activation

### 1.4 Neural Network Fundamentals (From Lecture)
- A single neuron computes: ŷ = g(w₀ + Σ xᵢwᵢ) where g is a non-linear activation function
- **Activation functions are critical** - they introduce non-linearities that make multi-layer networks powerful (universal approximators)
- Common activations: ReLU (g(z) = max(0,z)), Sigmoid, Tanh
- For multi-class (single-label): use **Softmax** to convert outputs to probabilities
- For multi-label: use **Sigmoid** per class via OneVsRestClassifier
- **Loss function for classification**: Cross-entropy loss
- Weights should NOT be initialized to all zeros (breaks symmetry)

---
## 2. Task 1: Establish Context

### 2.1 Review Preprocessing from Lab 2
In Lab 2, we preprocessed tweets with the following pipeline:
- Remove RT indicators, URLs, usernames, and mentions
- Convert emojis to text descriptions
- Extract hashtag text and segment CamelCase words
- Normalize whitespace and lowercase
- Tokenize with SpaCy and filter/lemmatize tokens

The output is stored in parquet files with columns: `text`, `label_name`, `label`

Two approaches for label handling are supported:
- Parse `label_name` (string list format) into Python lists
- Use `label` column directly (pre-computed binary vectors)

In [None]:
# Import required libraries
import json
import ast
from typing import List

import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import MultiLabelBinarizer, LabelEncoder
from sklearn.metrics import (
    accuracy_score, 
    f1_score, 
    precision_score, 
    recall_score, 
    hamming_loss,
    classification_report
)
from sklearn.naive_bayes import MultinomialNB
from sklearn.multiclass import OneVsRestClassifier

# Constants
TRAIN_DATA_PATH = "../Data/tweets_preprocessed_train.parquet"
TEST_DATA_PATH = "../Data/tweets_preprocessed_test.parquet"
VALIDATION_DATA_PATH = "../Data/tweets_preprocessed_validation.parquet"
VOCABULARY_PATH = "../Data/top_1000_vocabulary.json"
RANDOM_STATE = 42

print("✓ Libraries imported successfully")

### 2.2 Load and Verify Vocabulary from Lab 4

In [None]:
# Load the top 1000 vocabulary from Lab 4
with open(VOCABULARY_PATH, 'r', encoding='utf-8') as f:
    vocab_data = json.load(f)

VOCABULARY = vocab_data['tokens']
vocab_set = set(VOCABULARY)

print(f"✓ Loaded vocabulary from: {VOCABULARY_PATH}")
print(f"✓ Description: {vocab_data['description']}")
print(f"✓ Vocabulary size: {len(VOCABULARY)}")
print(f"✓ First 20 tokens: {VOCABULARY[:20]}")
print(f"✓ Last 10 tokens: {VOCABULARY[-10:]}")

### 2.3 Load Preprocessed Datasets

In [None]:
def parse_labels(value) -> List[str]:
    """Parse label_name column into consistent Python lists."""
    if isinstance(value, (list, np.ndarray)):
        return [str(v) for v in value]
    if isinstance(value, tuple):
        return [str(v) for v in value]
    if isinstance(value, str):
        value = value.strip()
        if value.startswith('[') and value.endswith(']'):
            # Remove brackets
            inner = value[1:-1].strip()
            if not inner:
                return []
            # Remove quotes and split by whitespace (handles both formats)
            inner = inner.replace("'", "").replace('"', '')
            labels = [l.strip() for l in inner.split() if l.strip()]
            return labels
        try:
            parsed = ast.literal_eval(value)
            if isinstance(parsed, (list, tuple)):
                return [str(v) for v in parsed]
        except (ValueError, SyntaxError):
            pass
        return [value] if value else []
    return [str(value)] if value else []

def parse_binary_label(value) -> np.ndarray:
    """Parse binary label array from string representation."""
    if isinstance(value, np.ndarray):
        return value
    if isinstance(value, str):
        # Parse "[0 0 1 0 ...]" format
        inner = value.strip()[1:-1]
        return np.array([int(x) for x in inner.split()])
    return np.array(value)

def load_dataset(path: str) -> pd.DataFrame:
    """Load tweets from parquet and normalize the label columns."""
    df = pd.read_parquet(path)
    df = df.copy()
    df["labels"] = df["label_name"].apply(parse_labels)
    df["label_binary"] = df["label"].apply(parse_binary_label)
    return df

# Load all datasets
df_train = load_dataset(TRAIN_DATA_PATH)
df_test = load_dataset(TEST_DATA_PATH)
df_validation = load_dataset(VALIDATION_DATA_PATH)

print(f"✓ Training set: {len(df_train):,} samples")
print(f"✓ Test set: {len(df_test):,} samples")
print(f"✓ Validation set: {len(df_validation):,} samples")
print(f"\nSample preprocessed text:")
print(f"  {df_train['text'].iloc[0][:80]}...")
print(f"  Labels: {df_train['labels'].iloc[0]}")

### 2.4 Convert Multi-Label to Single-Label

For comparison with a single-label classifier, we convert multi-label samples to single-label by keeping only the primary (first) label. This transformation is applied to both the `label_name` and `label` columns.

In [None]:
# Define the 19 topic classes (from the original dataset)
TOPIC_CLASSES = [
    'arts_&_culture', 'business_&_entrepreneurs', 'celebrity_&_pop_culture',
    'diaries_&_daily_life', 'family', 'fashion_&_style', 'film_tv_&_video',
    'fitness_&_health', 'food_&_dining', 'gaming', 'learning_&_educational',
    'music', 'news_&_social_concern', 'other_hobbies', 'relationships',
    'science_&_technology', 'sports', 'travel_&_adventure', 'youth_&_student_life'
]

def convert_to_single_label(df: pd.DataFrame, topic_classes: List[str]) -> pd.DataFrame:
    """
    Convert multi-label dataset to single-label by keeping only the primary (first) label.
    
    Parameters:
    -----------
    df : pd.DataFrame
        DataFrame with 'labels' (list of label names) and 'label_binary' (binary vector) columns
    topic_classes : List[str]
        List of all possible class names in order
    
    Returns:
    --------
    pd.DataFrame
        DataFrame with added 'single_label' (string) and 'single_label_binary' (binary vector) columns
    """
    df = df.copy()
    
    # Extract primary (first) label from the labels list
    df['single_label'] = df['labels'].apply(lambda x: x[0] if x else 'unknown')
    
    # Create single-label binary vector (one-hot encoding)
    def create_single_label_binary(label: str) -> np.ndarray:
        binary = np.zeros(len(topic_classes), dtype=int)
        if label in topic_classes:
            binary[topic_classes.index(label)] = 1
        return binary
    
    df['single_label_binary'] = df['single_label'].apply(create_single_label_binary)
    
    return df

# Convert all datasets to single-label
df_train = convert_to_single_label(df_train, TOPIC_CLASSES)
df_test = convert_to_single_label(df_test, TOPIC_CLASSES)
df_validation = convert_to_single_label(df_validation, TOPIC_CLASSES)

# Show conversion statistics
multi_label_count = (df_train['labels'].apply(len) > 1).sum()
print(f"✓ Converted datasets to single-label format")
print(f"\nMulti-label statistics (training set):")
print(f"  Samples with multiple labels: {multi_label_count:,} ({100*multi_label_count/len(df_train):.1f}%)")
print(f"  Samples with single label: {len(df_train) - multi_label_count:,} ({100*(len(df_train) - multi_label_count)/len(df_train):.1f}%)")

print(f"\nSingle-label distribution (training set):")
print(df_train['single_label'].value_counts())

print(f"\nExample conversion:")
print(f"  Original labels: {df_train['labels'].iloc[0]}")
print(f"  Single label: {df_train['single_label'].iloc[0]}")

---
## 3. Task 2: Implementation Plan

### 3.1 Binary Feature Vector Construction
For each sample, we create a binary vector of size 1000 (vocabulary size):
- For each word in the vocabulary, set dimension to 1 if word is present in sample, 0 otherwise
- This is a Bag-of-Words style encoding (word order is lost)

### 3.2 MLPClassifier Configuration
- **hidden_layer_sizes**: (128, 64, 128) - three hidden layers as specified
- **activation**: 'relu' - ReLU activation (most commonly used)
- **solver**: 'adam' - Adam optimizer (handles mini-batch gradient descent)
- **max_iter**: 300 - sufficient iterations for convergence
- **random_state**: 42 - for reproducibility
- **early_stopping**: Disabled for multi-label (some classes have few samples), enabled for single-label

### 3.3 Evaluation Metrics
For multi-label classification:
- Subset Accuracy (exact match)
- Hamming Loss
- Micro/Macro F1-Score

For single-label classification:
- Accuracy
- Macro/Weighted F1-Score

---
## 4. Task 3: Multi-Label Classification

### 4.1 Feature Engineering: Binary Vector Construction

In [None]:
def create_binary_features(texts: pd.Series, vocabulary: List[str]) -> np.ndarray:
    """
    Create binary feature vectors for text samples.
    
    Each dimension represents whether a word from the vocabulary
    is present (1) or absent (0) in the sample.
    
    Parameters:
    -----------
    texts : pd.Series
        Series of preprocessed text strings (whitespace-tokenized)
    vocabulary : List[str]
        List of vocabulary words (top 1000 from Lab 4)
    
    Returns:
    --------
    np.ndarray
        Binary feature matrix of shape (n_samples, vocab_size)
    """
    vocab_set = set(vocabulary)
    vocab_to_idx = {word: idx for idx, word in enumerate(vocabulary)}
    
    n_samples = len(texts)
    n_features = len(vocabulary)
    
    # Initialize feature matrix with zeros
    features = np.zeros((n_samples, n_features), dtype=np.int8)
    
    # Fill in binary features
    for i, text in enumerate(texts):
        if isinstance(text, str):
            words = set(text.split())
            for word in words:
                if word in vocab_to_idx:
                    features[i, vocab_to_idx[word]] = 1
    
    return features

# Create binary feature vectors for all datasets
print("Creating binary feature vectors...")
X_train = create_binary_features(df_train['text'], VOCABULARY)
X_test = create_binary_features(df_test['text'], VOCABULARY)
X_validation = create_binary_features(df_validation['text'], VOCABULARY)

print(f"\n✓ Feature matrix shapes:")
print(f"  X_train: {X_train.shape}")
print(f"  X_test: {X_test.shape}")
print(f"  X_validation: {X_validation.shape}")

# Show sample feature statistics
print(f"\nFeature statistics (training set):")
print(f"  Average features per sample: {X_train.sum(axis=1).mean():.2f}")
print(f"  Max features in a sample: {X_train.sum(axis=1).max()}")
print(f"  Min features in a sample: {X_train.sum(axis=1).min()}")

### 4.2 Label Encoding (Multi-Label Binarization)

In [None]:
# Use the pre-parsed binary labels directly from the label column
y_train_multi = np.vstack(df_train['label_binary'].values)
y_test_multi = np.vstack(df_test['label_binary'].values)
y_validation_multi = np.vstack(df_validation['label_binary'].values)

# Create MultiLabelBinarizer for inverse_transform (label names)
mlb = MultiLabelBinarizer(classes=TOPIC_CLASSES)
mlb.fit([TOPIC_CLASSES])  # Fit with all classes

print(f"✓ Number of classes: {len(TOPIC_CLASSES)}")
print(f"✓ Classes: {TOPIC_CLASSES}")
print(f"\n✓ Multi-label matrix shapes:")
print(f"  y_train_multi: {y_train_multi.shape}")
print(f"  y_test_multi: {y_test_multi.shape}")
print(f"  y_validation_multi: {y_validation_multi.shape}")

# Verify label distribution
print(f"\n✓ Label distribution (training set):")
print(f"  Average labels per sample: {y_train_multi.sum(axis=1).mean():.2f}")
print(f"  Samples per class: {y_train_multi.sum(axis=0)}")

### 4.3 Multi-Label Neural Network Training

In [None]:
# Create MLPClassifier with specified architecture
# Using OneVsRestClassifier for multi-label classification
# Note: early_stopping is disabled because some classes have very few samples
# which causes issues with the validation split in OneVsRest multi-label setting
mlp_base = MLPClassifier(
    hidden_layer_sizes=(128, 64, 128),  # Three hidden layers as specified
    activation='relu',                   # ReLU activation function
    solver='adam',                       # Adam optimizer (mini-batch gradient descent)
    max_iter=300,                        # Maximum iterations
    random_state=RANDOM_STATE,           # For reproducibility
    early_stopping=False,                # Disabled for multi-label compatibility
    verbose=True                         # Show training progress
)

# Wrap with OneVsRestClassifier for multi-label support
mlp_clf_multi = OneVsRestClassifier(mlp_base, n_jobs=-1)

print("="*60)
print("MULTI-LABEL NEURAL NETWORK ARCHITECTURE")
print("="*60)
print(f"Input layer:  {X_train.shape[1]} neurons (vocabulary size)")
print(f"Hidden layer 1: 128 neurons (ReLU activation)")
print(f"Hidden layer 2: 64 neurons (ReLU activation)")
print(f"Hidden layer 3: 128 neurons (ReLU activation)")
print(f"Output layer: {len(TOPIC_CLASSES)} neurons (19 binary classifiers)")
print("="*60)

print("\nTraining Multi-Label Neural Network...")
mlp_clf_multi.fit(X_train, y_train_multi)
print("\n✓ Multi-Label Neural Network training complete!")

### 4.4 Multi-Label Neural Network Evaluation

In [None]:
# Make predictions
y_pred_nn_multi = mlp_clf_multi.predict(X_test)

# Calculate metrics
nn_multi_metrics = {
    'Subset Accuracy': accuracy_score(y_test_multi, y_pred_nn_multi),
    'Hamming Loss': hamming_loss(y_test_multi, y_pred_nn_multi),
    'Micro F1': f1_score(y_test_multi, y_pred_nn_multi, average='micro', zero_division=0),
    'Macro F1': f1_score(y_test_multi, y_pred_nn_multi, average='macro', zero_division=0),
    'Micro Precision': precision_score(y_test_multi, y_pred_nn_multi, average='micro', zero_division=0),
    'Micro Recall': recall_score(y_test_multi, y_pred_nn_multi, average='micro', zero_division=0)
}

print("="*60)
print("MULTI-LABEL NEURAL NETWORK EVALUATION (Test Set)")
print("="*60)
for metric, value in nn_multi_metrics.items():
    print(f"{metric:<20}: {value:.4f}")

In [None]:
# Show sample predictions
y_pred_labels = mlb.inverse_transform(y_pred_nn_multi)
y_true_labels = mlb.inverse_transform(y_test_multi)

print("\nSample Multi-Label Neural Network Predictions:")
print("-" * 60)
for i in range(5):
    text = df_test['text'].iloc[i][:60]
    true = y_true_labels[i] if y_true_labels[i] else ('none',)
    pred = y_pred_labels[i] if y_pred_labels[i] else ('none',)
    match = "✓" if set(true) == set(pred) else "✗"
    print(f"\n{match} Sample {i+1}:")
    print(f"   Text: {text}...")
    print(f"   True: {true}")
    print(f"   Pred: {pred}")

### 4.5 Naive Bayes Classifier (for Comparison)

In [None]:
# Train Naive Bayes classifier with same features
nb_clf = OneVsRestClassifier(MultinomialNB(alpha=1.0))
nb_clf.fit(X_train, y_train_multi)

# Make predictions
y_pred_nb = nb_clf.predict(X_test)

# Calculate metrics
nb_metrics = {
    'Subset Accuracy': accuracy_score(y_test_multi, y_pred_nb),
    'Hamming Loss': hamming_loss(y_test_multi, y_pred_nb),
    'Micro F1': f1_score(y_test_multi, y_pred_nb, average='micro', zero_division=0),
    'Macro F1': f1_score(y_test_multi, y_pred_nb, average='macro', zero_division=0),
    'Micro Precision': precision_score(y_test_multi, y_pred_nb, average='micro', zero_division=0),
    'Micro Recall': recall_score(y_test_multi, y_pred_nb, average='micro', zero_division=0)
}

print("="*60)
print("NAIVE BAYES EVALUATION (Test Set)")
print("="*60)
for metric, value in nb_metrics.items():
    print(f"{metric:<20}: {value:.4f}")

---
## 5. Task 4: Single-Label Classification

For comparison, we train a neural network using single-label classification. Each tweet is assigned only its primary (first) label, converting the multi-label problem to a standard multi-class classification problem.

### 5.1 Single-Label Encoding

In [None]:
# Create label encoder for single-label classification
le = LabelEncoder()
le.fit(TOPIC_CLASSES)

# Encode single labels as integers
y_train_single = le.transform(df_train['single_label'])
y_test_single = le.transform(df_test['single_label'])
y_validation_single = le.transform(df_validation['single_label'])

print(f"✓ Single-label encoding complete")
print(f"\n✓ Label shapes:")
print(f"  y_train_single: {y_train_single.shape}")
print(f"  y_test_single: {y_test_single.shape}")
print(f"  y_validation_single: {y_validation_single.shape}")

print(f"\n✓ Class mapping:")
for i, cls in enumerate(le.classes_):
    count = (y_train_single == i).sum()
    print(f"  {i}: {cls} ({count} samples)")

### 5.2 Single-Label Neural Network Training

In [None]:
# Create MLPClassifier for single-label classification
# For single-label, MLPClassifier uses softmax output automatically
mlp_clf_single = MLPClassifier(
    hidden_layer_sizes=(128, 64, 128),  # Same architecture as multi-label
    activation='relu',                   # ReLU activation function
    solver='adam',                       # Adam optimizer
    max_iter=300,                        # Maximum iterations
    random_state=RANDOM_STATE,           # For reproducibility
    early_stopping=True,                 # Enable early stopping for single-label
    validation_fraction=0.1,             # Use 10% for validation
    verbose=True                         # Show training progress
)

print("="*60)
print("SINGLE-LABEL NEURAL NETWORK ARCHITECTURE")
print("="*60)
print(f"Input layer:  {X_train.shape[1]} neurons (vocabulary size)")
print(f"Hidden layer 1: 128 neurons (ReLU activation)")
print(f"Hidden layer 2: 64 neurons (ReLU activation)")
print(f"Hidden layer 3: 128 neurons (ReLU activation)")
print(f"Output layer: {len(TOPIC_CLASSES)} neurons (Softmax activation)")
print("="*60)

print("\nTraining Single-Label Neural Network...")
mlp_clf_single.fit(X_train, y_train_single)
print("\n✓ Single-Label Neural Network training complete!")

### 5.3 Single-Label Neural Network Evaluation

In [None]:
# Make predictions
y_pred_nn_single = mlp_clf_single.predict(X_test)

# Calculate metrics
nn_single_metrics = {
    'Accuracy': accuracy_score(y_test_single, y_pred_nn_single),
    'Macro F1': f1_score(y_test_single, y_pred_nn_single, average='macro', zero_division=0),
    'Weighted F1': f1_score(y_test_single, y_pred_nn_single, average='weighted', zero_division=0),
    'Macro Precision': precision_score(y_test_single, y_pred_nn_single, average='macro', zero_division=0),
    'Macro Recall': recall_score(y_test_single, y_pred_nn_single, average='macro', zero_division=0)
}

print("="*60)
print("SINGLE-LABEL NEURAL NETWORK EVALUATION (Test Set)")
print("="*60)
for metric, value in nn_single_metrics.items():
    print(f"{metric:<20}: {value:.4f}")

In [None]:
# Show sample predictions for single-label
print("\nSample Single-Label Neural Network Predictions:")
print("-" * 60)
for i in range(5):
    text = df_test['text'].iloc[i][:60]
    true_label = le.inverse_transform([y_test_single[i]])[0]
    pred_label = le.inverse_transform([y_pred_nn_single[i]])[0]
    original_labels = df_test['labels'].iloc[i]
    match = "✓" if true_label == pred_label else "✗"
    print(f"\n{match} Sample {i+1}:")
    print(f"   Text: {text}...")
    print(f"   Original labels: {original_labels}")
    print(f"   Single label (true): {true_label}")
    print(f"   Single label (pred): {pred_label}")

---
## 6. Model Comparison

### 6.1 Multi-Label Models Comparison

In [None]:
# Create comparison table for multi-label models
comparison_df = pd.DataFrame({
    'Metric': list(nn_multi_metrics.keys()),
    'Neural Network (Multi-Label)': list(nn_multi_metrics.values()),
    'Naive Bayes (Multi-Label)': list(nb_metrics.values())
})

# Calculate improvement
comparison_df['Difference'] = comparison_df['Neural Network (Multi-Label)'] - comparison_df['Naive Bayes (Multi-Label)']
comparison_df['Better Model'] = comparison_df.apply(
    lambda row: 'Neural Network' if (row['Difference'] > 0 and row['Metric'] != 'Hamming Loss') 
                or (row['Difference'] < 0 and row['Metric'] == 'Hamming Loss')
                else 'Naive Bayes' if row['Difference'] != 0 else 'Tie',
    axis=1
)

print("="*80)
print("MULTI-LABEL MODEL COMPARISON: Neural Network vs Naive Bayes")
print("="*80)
print(comparison_df.to_string(index=False))
print("="*80)
print("\nNote: For Hamming Loss, lower is better. For all other metrics, higher is better.")

### 6.2 Single-Label vs Multi-Label Comparison

In [None]:
# Compare single-label NN predictions against multi-label ground truth
# Convert single-label predictions to multi-label format for comparison
y_pred_single_as_multi = np.zeros((len(y_pred_nn_single), len(TOPIC_CLASSES)), dtype=int)
for i, pred in enumerate(y_pred_nn_single):
    y_pred_single_as_multi[i, pred] = 1

# Calculate metrics for single-label NN on multi-label test set
single_on_multi_metrics = {
    'Subset Accuracy': accuracy_score(y_test_multi, y_pred_single_as_multi),
    'Hamming Loss': hamming_loss(y_test_multi, y_pred_single_as_multi),
    'Micro F1': f1_score(y_test_multi, y_pred_single_as_multi, average='micro', zero_division=0),
    'Macro F1': f1_score(y_test_multi, y_pred_single_as_multi, average='macro', zero_division=0),
}

print("="*80)
print("SINGLE-LABEL VS MULTI-LABEL NEURAL NETWORK COMPARISON")
print("="*80)
print(f"\n{'Metric':<20} {'Multi-Label NN':<18} {'Single-Label NN':<18} {'Difference':<12}")
print("-"*70)
for metric in ['Subset Accuracy', 'Hamming Loss', 'Micro F1', 'Macro F1']:
    multi = nn_multi_metrics[metric]
    single = single_on_multi_metrics[metric]
    diff = multi - single
    print(f"{metric:<20} {multi:<18.4f} {single:<18.4f} {diff:+.4f}")

print("\n" + "="*80)
print("\nNote: Single-label NN can only predict one class per sample.")
print("Comparison is made against the original multi-label ground truth.")

### 6.3 All Models Summary

In [None]:
# Create comprehensive summary table
print("="*90)
print("COMPREHENSIVE MODEL COMPARISON SUMMARY")
print("="*90)

print("\n" + "-"*90)
print("MULTI-LABEL CLASSIFICATION RESULTS (evaluated on multi-label test set)")
print("-"*90)
print(f"{'Model':<35} {'Accuracy':<12} {'Micro F1':<12} {'Macro F1':<12} {'Hamming Loss':<12}")
print("-"*90)
print(f"{'Multi-Label Neural Network':<35} {nn_multi_metrics['Subset Accuracy']:<12.4f} {nn_multi_metrics['Micro F1']:<12.4f} {nn_multi_metrics['Macro F1']:<12.4f} {nn_multi_metrics['Hamming Loss']:<12.4f}")
print(f"{'Naive Bayes (Multi-Label)':<35} {nb_metrics['Subset Accuracy']:<12.4f} {nb_metrics['Micro F1']:<12.4f} {nb_metrics['Macro F1']:<12.4f} {nb_metrics['Hamming Loss']:<12.4f}")
print(f"{'Single-Label NN (on multi-label)':<35} {single_on_multi_metrics['Subset Accuracy']:<12.4f} {single_on_multi_metrics['Micro F1']:<12.4f} {single_on_multi_metrics['Macro F1']:<12.4f} {single_on_multi_metrics['Hamming Loss']:<12.4f}")

print("\n" + "-"*90)
print("SINGLE-LABEL CLASSIFICATION RESULTS (evaluated on single-label test set)")
print("-"*90)
print(f"{'Model':<35} {'Accuracy':<12} {'Weighted F1':<12} {'Macro F1':<12}")
print("-"*90)
print(f"{'Single-Label Neural Network':<35} {nn_single_metrics['Accuracy']:<12.4f} {nn_single_metrics['Weighted F1']:<12.4f} {nn_single_metrics['Macro F1']:<12.4f}")

print("\n" + "="*90)

---
## 7. Optional: Experiment with Different Network Sizes

In [None]:
# Define different architectures to test
architectures = {
    'Small (64-32-64)': (64, 32, 64),
    'Medium (128-64-128)': (128, 64, 128),  # Original
    'Large (256-128-256)': (256, 128, 256),
    'Deep (128-128-64-64-128-128)': (128, 128, 64, 64, 128, 128),
    'Wide (512-256-512)': (512, 256, 512)
}

results = []

print("Experimenting with different network architectures (Multi-Label)...")
print("="*60)

for name, layers in architectures.items():
    print(f"\nTraining: {name}...")
    
    # Create and train model
    mlp = MLPClassifier(
        hidden_layer_sizes=layers,
        activation='relu',
        solver='adam',
        max_iter=200,
        random_state=RANDOM_STATE,
        early_stopping=False,  # Disabled for multi-label compatibility
        verbose=False
    )
    
    clf = OneVsRestClassifier(mlp, n_jobs=-1)
    clf.fit(X_train, y_train_multi)
    
    # Evaluate
    y_pred = clf.predict(X_test)
    
    results.append({
        'Architecture': name,
        'Layers': str(layers),
        'Accuracy': accuracy_score(y_test_multi, y_pred),
        'Micro F1': f1_score(y_test_multi, y_pred, average='micro', zero_division=0),
        'Macro F1': f1_score(y_test_multi, y_pred, average='macro', zero_division=0)
    })
    
    print(f"  Accuracy: {results[-1]['Accuracy']:.4f}, Micro F1: {results[-1]['Micro F1']:.4f}")

# Display results
results_df = pd.DataFrame(results)
print("\n" + "="*80)
print("ARCHITECTURE COMPARISON RESULTS")
print("="*80)
print(results_df.to_string(index=False))

---
## 8. Summary

### What was accomplished
1. Loaded preprocessed data from Lab 2 and vocabulary from Lab 4
2. Created binary feature vectors (Bag-of-Words encoding) for all samples
3. Trained a Multi-Label Neural Network with 128→64→128 hidden layers using MLPClassifier and OneVsRestClassifier
4. Converted multi-label data to single-label by keeping only the primary label
5. Trained a Single-Label Neural Network with the same architecture
6. Compared Multi-Label NN, Single-Label NN, and Naive Bayes classifiers
7. Experimented with different network architectures

### Key Findings
- Multi-label classification allows predicting multiple topics per tweet
- Single-label classification simplifies the problem but loses information about secondary topics
- Neural networks can capture non-linear relationships in text classification
- The MLPClassifier with ReLU activation and Adam optimizer provides good results
- For multi-label tasks, OneVsRestClassifier trains separate binary classifiers per class
- For single-label tasks, MLPClassifier uses softmax output for probability distribution
- Network architecture affects performance, but larger isn't always better

In [None]:
print("="*60)
print("LAB 5 SUMMARY")
print("="*60)
print(f"Input vocabulary: {VOCABULARY_PATH}")
print(f"Training samples: {len(df_train):,}")
print(f"Test samples: {len(df_test):,}")
print(f"Feature vector size: {X_train.shape[1]}")
print(f"Number of classes: {len(TOPIC_CLASSES)}")
print(f"\nMulti-Label Neural Network Metrics:")
print(f"  Subset Accuracy: {nn_multi_metrics['Subset Accuracy']:.4f}")
print(f"  Micro F1: {nn_multi_metrics['Micro F1']:.4f}")
print(f"  Macro F1: {nn_multi_metrics['Macro F1']:.4f}")
print(f"\nSingle-Label Neural Network Metrics:")
print(f"  Accuracy: {nn_single_metrics['Accuracy']:.4f}")
print(f"  Weighted F1: {nn_single_metrics['Weighted F1']:.4f}")
print(f"  Macro F1: {nn_single_metrics['Macro F1']:.4f}")
print("="*60)