# Lab 5: Neural Network Classification with scikit-learn

---
## 1. Notebook Overview

### 1.1 Objective
- Re-use the most frequent words (optional: per class) you found for
your Naive Bayes classifier last week.

- Construct binary vectors for your whole dataset. Each dimension states
whether the word is part of the sample or not.

- Create a small neural network using scikit-learn: https://scikit-learn.org/
stable/modules/neural_networks_supervised.html. Start with three
hidden layers of 128/64/128 neurons. Consider what your input and
output layers should look like.

- Train your network on your training set and test it on your test set.
Calculate evaluation measures and compare with your previous
classifier.

- Optional: Experiment with different network sizes.

### 1.2 Prerequisites
This notebook assumes you have already executed:
- **Lab 2**: Data preprocessing ‚Üí `../Data/multi_label/tweets_preprocessed_*.parquet`
- **Lab 3**: Language modeling
- **Lab 4**: Feature extraction ‚Üí `../Data/top_1000_vocabulary.json`
- **Single-Label**: `../Data/single_label/tweets_single_label_*.parquet`

### 1.3 Architecture
We implement neural networks with:
- **Input layer**: 1000 features (Top 1000 vocabulary from Lab 4)
- **Hidden layers**: 128 ‚Üí 64 ‚Üí 128 neurons (as specified)
- **Output layer**: 
  - Multi-label: 14 binary classifiers (one per topic class, using OneVsRestClassifier)
  - Single-label: 14 classes with Softmax activation

### 1.4 Neural Network Fundamentals (From Lecture)
- A single neuron computes: ≈∑ = g(w‚ÇÄ + Œ£ x·µ¢w·µ¢) where g is a non-linear activation function
- **Activation functions are critical** - they introduce non-linearities that make multi-layer networks powerful (universal approximators)
- Common activations: ReLU (g(z) = max(0,z)), Sigmoid, Tanh
- For multi-class (single-label): use **Softmax** to convert outputs to probabilities
- For multi-label: use **Sigmoid** per class via OneVsRestClassifier
- **Loss function for classification**: Cross-entropy loss
- Weights should NOT be initialized to all zeros (breaks symmetry)

---
## 2. Task 1: Establish Context

### 2.1 Review Preprocessing from Lab 2
In Lab 2, we preprocessed tweets with the following pipeline:
- Remove RT indicators, URLs, usernames, and mentions
- Convert emojis to text descriptions
- Extract hashtag text and segment CamelCase words
- Normalize whitespace and lowercase
- Tokenize with SpaCy and filter/lemmatize tokens

The output is stored in parquet files with columns: `text`, `label_name`, `label`

Two approaches for label handling are supported:
- Parse `label_name` (string list format) into Python lists
- Use `label` column directly (pre-computed binary vectors)

In [374]:
# Import required libraries
import json
import ast
import os
import hashlib
import time
from typing import List
from pathlib import Path

import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import MultiLabelBinarizer, LabelEncoder
from sklearn.metrics import (
    accuracy_score, 
    f1_score, 
    precision_score, 
    recall_score, 
    hamming_loss,
    classification_report
)
from sklearn.naive_bayes import MultinomialNB
from sklearn.multiclass import OneVsRestClassifier
from tqdm import tqdm

# Constants - Updated paths to new folder structure
TRAIN_DATA_PATH = "../Data/multi_label/tweets_preprocessed_train.parquet"
TEST_DATA_PATH = "../Data/multi_label/tweets_preprocessed_test.parquet"
VALIDATION_DATA_PATH = "../Data/multi_label/tweets_preprocessed_validation.parquet"
VOCABULARY_PATH = "../Data/top_1000_vocabulary.json"
RANDOM_STATE = 42

print("‚úì Libraries imported successfully")

‚úì Libraries imported successfully


### 2.2 Load and Verify Vocabulary from Lab 4

In [375]:
# Load the top 1000 vocabulary from Lab 4
with open(VOCABULARY_PATH, 'r', encoding='utf-8') as f:
    vocab_data = json.load(f)

VOCABULARY = vocab_data['tokens']
vocab_set = set(VOCABULARY)

print(f"‚úì Loaded vocabulary from: {VOCABULARY_PATH}")
print(f"‚úì Description: {vocab_data['description']}")
print(f"‚úì Vocabulary size: {len(VOCABULARY)}")
print(f"‚úì First 20 tokens: {VOCABULARY[:20]}")
print(f"‚úì Last 10 tokens: {VOCABULARY[-10:]}")

‚úì Loaded vocabulary from: ../Data/top_1000_vocabulary.json
‚úì Description: Top 1000 most frequent tokens from preprocessed tweets (Lab 4)
‚úì Vocabulary size: 1000
‚úì First 20 tokens: ['new', 'game', 'day', 'good', 'year', 'love', 'time', 'win', 'come', 'happy', 'like', 'watch', 'go', 'world', 'live', 'today', 'red', 'team', 'great', 'heart']
‚úì Last 10 tokens: ['straight', 'google', 'december', 'thankful', 'oklahoma', 'donald', 'army', 'beverage', 'education', 'titan']


### 2.3 Load Preprocessed Datasets

In [376]:
def parse_labels(value) -> List[str]:
    """Parse label_name column into consistent Python lists."""
    if isinstance(value, (list, np.ndarray)):
        return [str(v) for v in value]
    if isinstance(value, tuple):
        return [str(v) for v in value]
    if isinstance(value, str):
        value = value.strip()
        if value.startswith('[') and value.endswith(']'):
            # Remove brackets
            inner = value[1:-1].strip()
            if not inner:
                return []
            # Remove quotes and split by whitespace (handles both formats)
            inner = inner.replace("'", "").replace('"', '')
            labels = [l.strip() for l in inner.split() if l.strip()]
            return labels
        try:
            parsed = ast.literal_eval(value)
            if isinstance(parsed, (list, tuple)):
                return [str(v) for v in parsed]
        except (ValueError, SyntaxError):
            pass
        return [value] if value else []
    return [str(value)] if value else []

def parse_binary_label(value) -> np.ndarray:
    """Parse binary label array from string representation."""
    if isinstance(value, np.ndarray):
        return value
    if isinstance(value, str):
        # Parse "[0 0 1 0 ...]" format
        inner = value.strip()[1:-1]
        return np.array([int(x) for x in inner.split()])
    return np.array(value)

def load_dataset(path: str) -> pd.DataFrame:
    """Load tweets from parquet and normalize the label columns."""
    df = pd.read_parquet(path)
    df = df.copy()
    df["labels"] = df["label_name"].apply(parse_labels)
    df["label_binary"] = df["label"].apply(parse_binary_label)
    return df

# ============================================================
# MULTI-LABEL DATASETS LADEN
# ============================================================
# Diese werden f√ºr Multi-Label NN, Naive Bayes Training und f√ºr
# die Multi-Label Evaluation ALLER Modelle verwendet.

print("="*70)
print("MULTI-LABEL DATASETS LADEN")
print("="*70)

df_train_multi = load_dataset(TRAIN_DATA_PATH)
df_test_multi = load_dataset(TEST_DATA_PATH)
df_validation_multi = load_dataset(VALIDATION_DATA_PATH)

print(f"‚úì Training set: {len(df_train_multi):,} samples")
print(f"‚úì Test set: {len(df_test_multi):,} samples")
print(f"‚úì Validation set: {len(df_validation_multi):,} samples")
print(f"\nSample preprocessed text:")
print(f"  {df_train_multi['text'].iloc[0][:80]}...")
print(f"  Labels: {df_train_multi['labels'].iloc[0]}")

MULTI-LABEL DATASETS LADEN
‚úì Training set: 5,465 samples
‚úì Test set: 1,511 samples
‚úì Validation set: 178 samples

Sample preprocessed text:
  lumber beat rapid game western division final evan edwards hit hr wp josh robers...
  Labels: ['sports']


In [377]:
# ============================================================
# DYNAMISCHE KLASSEN-ERKENNUNG AUS DEN MULTI-LABEL DATEN
# ============================================================
# Diese Zelle passt sich automatisch an die Daten an, 
# unabh√§ngig davon wie viele Klassen nach dem Preprocessing √ºbrig sind.

print("="*60)
print("AUTOMATISCHE KLASSEN-ERKENNUNG")
print("="*60)

# 1. Bestimme die Anzahl der Klassen aus den bin√§ren Label-Vektoren
num_classes = len(df_train_multi['label_binary'].iloc[0])
print(f"\n‚úì Anzahl Klassen (aus label_binary): {num_classes}")

# 2. Extrahiere alle einzigartigen Klassennamen aus label_name
all_class_names = set()
for df in [df_train_multi, df_test_multi, df_validation_multi]:
    for labels in df['labels']:
        all_class_names.update(labels)

TOPIC_CLASSES = sorted(list(all_class_names))
print(f"‚úì Klassennamen aus Daten extrahiert: {len(TOPIC_CLASSES)}")
print(f"‚úì Klassen: {TOPIC_CLASSES}")

# 3. Verifiziere Konsistenz
if len(TOPIC_CLASSES) != num_classes:
    print(f"\n‚ö†Ô∏è WARNUNG: Anzahl Klassennamen ({len(TOPIC_CLASSES)}) != Anzahl Spalten in label_binary ({num_classes})")
    print("   Das kann passieren wenn label_name und label nicht synchron sind.")
    print("   Verwende Anzahl aus label_binary als ma√ügeblich.")
    
# 4. Zeige Beispiel-Daten
print(f"\n‚úì Beispiel-Daten:")
print(f"  Text: {df_train_multi['text'].iloc[0][:60]}...")
print(f"  Labels (Namen): {df_train_multi['labels'].iloc[0]}")
print(f"  Labels (Bin√§r): {df_train_multi['label_binary'].iloc[0]}")

# 5. Statistiken
print(f"\n‚úì Dataset-Statistiken:")
print(f"  Training: {len(df_train_multi):,} Samples")
print(f"  Test: {len(df_test_multi):,} Samples")
print(f"  Validation: {len(df_validation_multi):,} Samples")
print(f"  Gesamt: {len(df_train_multi) + len(df_test_multi) + len(df_validation_multi):,} Samples")

print("\n" + "="*60)

AUTOMATISCHE KLASSEN-ERKENNUNG

‚úì Anzahl Klassen (aus label_binary): 6
‚úì Klassennamen aus Daten extrahiert: 6
‚úì Klassen: ['celebrity_&_pop_culture', 'diaries_&_daily_life', 'film_tv_&_video', 'music', 'news_&_social_concern', 'sports']

‚úì Beispiel-Daten:
  Text: lumber beat rapid game western division final evan edwards h...
  Labels (Namen): ['sports']
  Labels (Bin√§r): [0 0 0 0 0 1]

‚úì Dataset-Statistiken:
  Training: 5,465 Samples
  Test: 1,511 Samples
  Validation: 178 Samples
  Gesamt: 7,154 Samples



### 2.4 Load Single-Label Datasets (from Lab 2)

The single-label datasets were created in **Lab 2** using Claude Haiku for intelligent label assignment.
We simply load them here for use in the Single-Label Neural Network classifier.

In [378]:
# ============================================================
# SINGLE-LABEL DATEN LADEN (aus Lab 2)
# ============================================================
# Die Single-Label Datasets wurden in Lab 2 mit Claude Haiku erstellt.
# Diese werden NUR f√ºr das Training des Single-Label NN verwendet.
# Zum TESTEN werden Multi-Label UND Single-Label Test-Sets verwendet.

from pathlib import Path

# Definiere Pfade
SINGLE_LABEL_TRAIN_PATH = "../Data/single_label/tweets_single_label_train.parquet"
SINGLE_LABEL_TEST_PATH = "../Data/single_label/tweets_single_label_test.parquet"
SINGLE_LABEL_VALIDATION_PATH = "../Data/single_label/tweets_single_label_validation.parquet"

print("=" * 70)
print("SINGLE-LABEL DATEN LADEN (aus Lab 2)")
print("=" * 70)

# Pr√ºfe ob alle Dateien existieren
train_exists = Path(SINGLE_LABEL_TRAIN_PATH).exists()
test_exists = Path(SINGLE_LABEL_TEST_PATH).exists()
val_exists = Path(SINGLE_LABEL_VALIDATION_PATH).exists()

if not (train_exists and test_exists and val_exists):
    missing = []
    if not train_exists: missing.append("train")
    if not test_exists: missing.append("test")
    if not val_exists: missing.append("validation")
    print(f"\n‚ùå FEHLER: Single-Label Dateien nicht gefunden: {missing}")
    print("   Bitte zuerst Lab 2 ausf√ºhren um die Single-Label Datasets zu erstellen!")
    raise FileNotFoundError("Single-Label Datasets m√ºssen zuerst in Lab 2 erstellt werden.")

# Lade Single-Label Datasets (SEPARATE Variablen - nicht √ºberschreiben!)
print("\nüìÇ Lade Single-Label Dateien...")

df_train_single = pd.read_parquet(SINGLE_LABEL_TRAIN_PATH)
df_test_single = pd.read_parquet(SINGLE_LABEL_TEST_PATH)
df_validation_single = pd.read_parquet(SINGLE_LABEL_VALIDATION_PATH)

# Parse labels falls n√∂tig (f√ºr Kompatibilit√§t mit bestehendem Code)
for df in [df_train_single, df_test_single, df_validation_single]:
    if 'labels' not in df.columns and 'label_name' in df.columns:
        df['labels'] = df['label_name'].apply(parse_labels)

print(f"‚úì Training Set: {len(df_train_single):,} Samples")
print(f"‚úì Test Set: {len(df_test_single):,} Samples")
print(f"‚úì Validation Set: {len(df_validation_single):,} Samples")

# Zeige Single-Label Verteilung
print(f"\n‚úì Single-Label Verteilung (Training):")
print(df_train_single['single_label'].value_counts())

print("\n" + "=" * 70)
print("‚úì Single-Label Daten geladen!")
print("=" * 70)
print(f"\nüìä Datenstruktur-√úbersicht:")
print(f"  MULTI-LABEL Daten (f√ºr Multi-Label NN, Naive Bayes, Testing):")
print(f"    df_train_multi:      {len(df_train_multi):,} Samples")
print(f"    df_test_multi:       {len(df_test_multi):,} Samples")
print(f"    df_validation_multi: {len(df_validation_multi):,} Samples")
print(f"\n  SINGLE-LABEL Daten (NUR f√ºr Single-Label NN Training):")
print(f"    df_train_single:      {len(df_train_single):,} Samples")
print(f"    df_test_single:       {len(df_test_single):,} Samples")
print(f"    df_validation_single: {len(df_validation_single):,} Samples")
print(f"\nüí° Die Single-Label Zuweisung erfolgte in Lab 2 mit Claude Haiku")

SINGLE-LABEL DATEN LADEN (aus Lab 2)

üìÇ Lade Single-Label Dateien...
‚úì Training Set: 5,465 Samples
‚úì Test Set: 1,511 Samples
‚úì Validation Set: 178 Samples

‚úì Single-Label Verteilung (Training):
single_label
sports                      1587
news_&_social_concern       1487
music                       1000
diaries_&_daily_life         553
film_tv_&_video              502
celebrity_&_pop_culture      281
family                         9
arts_&_culture                 9
gaming                         7
business_&_entrepreneurs       6
science_&_technology           5
relationships                  5
food_&_dining                  4
youth_&_student_life           3
other_hobbies                  3
fitness_&_health               2
learning_&_educational         1
fashion_&_style                1
Name: count, dtype: int64

‚úì Single-Label Daten geladen!

üìä Datenstruktur-√úbersicht:
  MULTI-LABEL Daten (f√ºr Multi-Label NN, Naive Bayes, Testing):
    df_train_multi:      5,465 S

---
## 3. Task 2: Implementation Plan

### 3.1 Binary Feature Vector Construction
For each sample, we create a binary vector of size 1000 (vocabulary size):
- For each word in the vocabulary, set dimension to 1 if word is present in sample, 0 otherwise
- This is a Bag-of-Words style encoding (word order is lost)

### 3.2 MLPClassifier Configuration
- **hidden_layer_sizes**: (128, 64, 128) - three hidden layers as specified
- **activation**: 'relu' - ReLU activation (most commonly used)
- **solver**: 'adam' - Adam optimizer (handles mini-batch gradient descent)
- **max_iter**: 300 - sufficient iterations for convergence
- **random_state**: 42 - for reproducibility
- **early_stopping**: Disabled for multi-label (some classes have few samples), enabled for single-label

### 3.3 Evaluation Metrics
For multi-label classification:
- Subset Accuracy (exact match)
- Hamming Loss
- Micro/Macro F1-Score

For single-label classification:
- Accuracy
- Macro/Weighted F1-Score

---
## 4. Task 3: Multi-Label Classification

### 4.1 Feature Engineering: Binary Vector Construction

In [379]:
from sklearn.feature_extraction.text import CountVectorizer

def create_binary_features(texts: pd.Series, vocabulary: List[str]) -> np.ndarray:
    """
    Create binary feature vectors for text samples.
    
    Each dimension represents whether a word from the vocabulary
    is present (1) or absent (0) in the sample.
    
    Parameters:
    -----------
    texts : pd.Series
        Series of preprocessed text strings (whitespace-tokenized)
    vocabulary : List[str]
        List of vocabulary words (top 1000 from Lab 4)
    
    Returns:
    --------
    np.ndarray
        Binary feature matrix of shape (n_samples, vocab_size)
    """
    vocab_set = set(vocabulary)
    vocab_to_idx = {word: idx for idx, word in enumerate(vocabulary)}
    
    n_samples = len(texts)
    n_features = len(vocabulary)
    
    # Initialize feature matrix with zeros
    features = np.zeros((n_samples, n_features), dtype=np.int8)
    
    # Fill in binary features
    for i, text in enumerate(texts):
        if isinstance(text, str):
            words = set(text.split())
            for word in words:
                if word in vocab_to_idx:
                    features[i, vocab_to_idx[word]] = 1
    
    return features

# ============================================================
# FEATURE EXTRACTION: Separate Feature-Sets f√ºr Multi-Label und Single-Label
# ============================================================

print("="*70)
print("FEATURE EXTRACTION")
print("="*70)

# ============================================================
# 1. MULTI-LABEL Features (f√ºr Multi-Label NN, Naive Bayes, Testing)
# ============================================================
print("\nüìä MULTI-LABEL Features (aus df_train_multi, df_test_multi, df_validation_multi):")

# Bin√§re Features f√ºr Multi-Label Neural Network
X_train_multi = create_binary_features(df_train_multi['text'], VOCABULARY)
X_test_multi = create_binary_features(df_test_multi['text'], VOCABULARY)
X_validation_multi = create_binary_features(df_validation_multi['text'], VOCABULARY)

print(f"  X_train_multi: {X_train_multi.shape}")
print(f"  X_test_multi: {X_test_multi.shape}")
print(f"  X_validation_multi: {X_validation_multi.shape}")
print(f"  Average features per sample: {X_train_multi.sum(axis=1).mean():.2f}")

# Count Features f√ºr Naive Bayes (wie in Lab 4)
vectorizer_multi = CountVectorizer(
    vocabulary=VOCABULARY,
    lowercase=True,
    token_pattern=r"(?u)\b\w+\b"
)

X_train_bow_multi = vectorizer_multi.fit_transform(df_train_multi['text'])
X_test_bow_multi = vectorizer_multi.transform(df_test_multi['text'])
X_validation_bow_multi = vectorizer_multi.transform(df_validation_multi['text'])

print(f"\n  X_train_bow_multi (Count): {X_train_bow_multi.shape}")
print(f"  X_test_bow_multi (Count): {X_test_bow_multi.shape}")

# ============================================================
# 2. SINGLE-LABEL Features (NUR f√ºr Single-Label NN Training)
# ============================================================
print("\nüìä SINGLE-LABEL Features (aus df_train_single, df_test_single, df_validation_single):")

# Bin√§re Features f√ºr Single-Label Neural Network
X_train_single = create_binary_features(df_train_single['text'], VOCABULARY)
X_test_single = create_binary_features(df_test_single['text'], VOCABULARY)
X_validation_single = create_binary_features(df_validation_single['text'], VOCABULARY)

print(f"  X_train_single: {X_train_single.shape}")
print(f"  X_test_single: {X_test_single.shape}")
print(f"  X_validation_single: {X_validation_single.shape}")
print(f"  Average features per sample: {X_train_single.sum(axis=1).mean():.2f}")

print("\n" + "="*70)
print("‚úì Feature Extraction abgeschlossen!")
print("="*70)
print("\nüìã √úbersicht:")
print("  Multi-Label NN:    X_train_multi (bin√§r)     ‚Üí Training")
print("  Naive Bayes:       X_train_bow_multi (count) ‚Üí Training")
print("  Single-Label NN:   X_train_single (bin√§r)    ‚Üí Training")
print("\n  Testing (Multi-Label):  X_test_multi")
print("  Testing (Single-Label): X_test_single")

FEATURE EXTRACTION

üìä MULTI-LABEL Features (aus df_train_multi, df_test_multi, df_validation_multi):
  X_train_multi: (5465, 1000)
  X_test_multi: (1511, 1000)
  X_validation_multi: (178, 1000)
  Average features per sample: 7.71

  X_train_bow_multi (Count): (5465, 1000)
  X_test_bow_multi (Count): (1511, 1000)

üìä SINGLE-LABEL Features (aus df_train_single, df_test_single, df_validation_single):
  X_train_single: (5465, 1000)
  X_test_single: (1511, 1000)
  X_validation_single: (178, 1000)
  Average features per sample: 7.71

‚úì Feature Extraction abgeschlossen!

üìã √úbersicht:
  Multi-Label NN:    X_train_multi (bin√§r)     ‚Üí Training
  Naive Bayes:       X_train_bow_multi (count) ‚Üí Training
  Single-Label NN:   X_train_single (bin√§r)    ‚Üí Training

  Testing (Multi-Label):  X_test_multi
  Testing (Single-Label): X_test_single


### 4.2 Label Encoding (Multi-Label Binarization)

In [380]:
# ============================================================
# MULTI-LABEL ENCODING (aus Multi-Label Daten)
# ============================================================

# Verwende die vorbereiteten bin√§ren Labels aus Multi-Label Daten
y_train_multi = np.vstack(df_train_multi['label_binary'].values)
y_test_multi = np.vstack(df_test_multi['label_binary'].values)
y_validation_multi = np.vstack(df_validation_multi['label_binary'].values)

# Bestimme die tats√§chliche Anzahl der Klassen aus den Daten
NUM_CLASSES = y_train_multi.shape[1]

# Erstelle MultiLabelBinarizer f√ºr inverse_transform
# Wenn TOPIC_CLASSES nicht die richtige L√§nge hat, erstelle generische Namen
if len(TOPIC_CLASSES) != NUM_CLASSES:
    print(f"‚ö†Ô∏è TOPIC_CLASSES hat {len(TOPIC_CLASSES)} Eintr√§ge, aber Daten haben {NUM_CLASSES} Klassen")
    print("   Erstelle generische Klassennamen...")
    TOPIC_CLASSES = [f"class_{i}" for i in range(NUM_CLASSES)]

mlb = MultiLabelBinarizer(classes=TOPIC_CLASSES)
mlb.fit([TOPIC_CLASSES])

print(f"‚úì Anzahl Klassen: {NUM_CLASSES}")
print(f"‚úì Klassennamen: {TOPIC_CLASSES}")
print(f"\n‚úì Multi-Label Matrix Shapes (aus Multi-Label Daten):")
print(f"  y_train_multi: {y_train_multi.shape}")
print(f"  y_test_multi: {y_test_multi.shape}")
print(f"  y_validation_multi: {y_validation_multi.shape}")

# Label-Verteilung
print(f"\n‚úì Label-Verteilung (Training Multi-Label):")
print(f"  Durchschnitt Labels pro Sample: {y_train_multi.sum(axis=1).mean():.2f}")
print(f"  Samples pro Klasse:")
for i, class_name in enumerate(TOPIC_CLASSES):
    count = y_train_multi[:, i].sum()
    print(f"    {class_name}: {count}")

‚úì Anzahl Klassen: 6
‚úì Klassennamen: ['celebrity_&_pop_culture', 'diaries_&_daily_life', 'film_tv_&_video', 'music', 'news_&_social_concern', 'sports']

‚úì Multi-Label Matrix Shapes (aus Multi-Label Daten):
  y_train_multi: (5465, 6)
  y_test_multi: (1511, 6)
  y_validation_multi: (178, 6)

‚úì Label-Verteilung (Training Multi-Label):
  Durchschnitt Labels pro Sample: 1.34
  Samples pro Klasse:
    celebrity_&_pop_culture: 924
    diaries_&_daily_life: 866
    film_tv_&_video: 953
    music: 1131
    news_&_social_concern: 1782
    sports: 1683


### 4.2.1 Single-Label Encoding

In [381]:
# ============================================================
# SINGLE-LABEL ENCODING (aus Single-Label Daten f√ºr Single-Label NN)
# ============================================================
# Verwende die in Lab 2 mit Claude Haiku erstellten Single-Labels

# Verwende die dynamisch erkannten TOPIC_CLASSES
# Stelle sicher, dass alle Labels in TOPIC_CLASSES vorkommen
unique_single_labels = set(df_train_single['single_label'].unique()) | \
                       set(df_test_single['single_label'].unique()) | \
                       set(df_validation_single['single_label'].unique())

# Pr√ºfe ob alle Labels bekannt sind
unknown_labels = unique_single_labels - set(TOPIC_CLASSES)
if unknown_labels:
    print(f"‚ö†Ô∏è Unbekannte Labels gefunden: {unknown_labels}")
    print(f"   F√ºge sie zu TOPIC_CLASSES hinzu...")
    TOPIC_CLASSES = sorted(list(set(TOPIC_CLASSES) | unknown_labels))

# Create label encoder for single-label classification
le = LabelEncoder()
le.fit(TOPIC_CLASSES)

# Encode single labels as integers (aus Single-Label Daten)
y_train_single_enc = le.transform(df_train_single['single_label'])
y_test_single_enc = le.transform(df_test_single['single_label'])
y_validation_single_enc = le.transform(df_validation_single['single_label'])

print(f"‚úì Single-label encoding complete (aus Single-Label Daten)")
print(f"\n‚úì Label shapes:")
print(f"  y_train_single_enc: {y_train_single_enc.shape}")
print(f"  y_test_single_enc: {y_test_single_enc.shape}")
print(f"  y_validation_single_enc: {y_validation_single_enc.shape}")

print(f"\n‚úì Class mapping (dynamisch erkannt):")
for i, cls in enumerate(le.classes_):
    count = (y_train_single_enc == i).sum()
    print(f"  {i}: {cls} ({count} samples)")

‚ö†Ô∏è Unbekannte Labels gefunden: {'arts_&_culture', 'youth_&_student_life', 'science_&_technology', 'fitness_&_health', 'family', 'learning_&_educational', 'fashion_&_style', 'gaming', 'food_&_dining', 'business_&_entrepreneurs', 'relationships', 'other_hobbies'}
   F√ºge sie zu TOPIC_CLASSES hinzu...
‚úì Single-label encoding complete (aus Single-Label Daten)

‚úì Label shapes:
  y_train_single_enc: (5465,)
  y_test_single_enc: (1511,)
  y_validation_single_enc: (178,)

‚úì Class mapping (dynamisch erkannt):
  0: arts_&_culture (9 samples)
  1: business_&_entrepreneurs (6 samples)
  2: celebrity_&_pop_culture (281 samples)
  3: diaries_&_daily_life (553 samples)
  4: family (9 samples)
  5: fashion_&_style (1 samples)
  6: film_tv_&_video (502 samples)
  7: fitness_&_health (2 samples)
  8: food_&_dining (4 samples)
  9: gaming (7 samples)
  10: learning_&_educational (1 samples)
  11: music (1000 samples)
  12: news_&_social_concern (1487 samples)
  13: other_hobbies (3 samples)
  

### 4.3 Multi-Label Neural Network Training

In [382]:
# ============================================================
# MULTI-LABEL NEURAL NETWORK TRAINING
# ============================================================
# Trainiert auf Multi-Label Daten (X_train_multi, y_train_multi)

# Create MLPClassifier with specified architecture
# Using OneVsRestClassifier for multi-label classification
mlp_base = MLPClassifier(
    hidden_layer_sizes=(128, 64, 128),  # Three hidden layers as specified
    activation='relu',                   # ReLU activation function
    solver='adam',                       # Adam optimizer (mini-batch gradient descent)
    max_iter=300,                        # Maximum iterations
    random_state=RANDOM_STATE,           # For reproducibility
    early_stopping=False,                # Disabled for multi-label compatibility
    verbose=True                         # Show training progress
)

# Wrap with OneVsRestClassifier for multi-label support
mlp_clf_multi = OneVsRestClassifier(mlp_base, n_jobs=-1)

print("="*60)
print("MULTI-LABEL NEURAL NETWORK ARCHITECTURE")
print("="*60)
print(f"Input layer:  {X_train_multi.shape[1]} neurons (vocabulary size)")
print(f"Hidden layer 1: 128 neurons (ReLU activation)")
print(f"Hidden layer 2: 64 neurons (ReLU activation)")
print(f"Hidden layer 3: 128 neurons (ReLU activation)")
print(f"Output layer: {len(TOPIC_CLASSES)} neurons ({len(TOPIC_CLASSES)} binary classifiers)")
print("="*60)
print(f"\nüìä Training auf MULTI-LABEL Daten:")
print(f"   X_train_multi: {X_train_multi.shape}")
print(f"   y_train_multi: {y_train_multi.shape}")

print("\nTraining Multi-Label Neural Network...")
mlp_clf_multi.fit(X_train_multi, y_train_multi)
print("\n‚úì Multi-Label Neural Network training complete!")

MULTI-LABEL NEURAL NETWORK ARCHITECTURE
Input layer:  1000 neurons (vocabulary size)
Hidden layer 1: 128 neurons (ReLU activation)
Hidden layer 2: 64 neurons (ReLU activation)
Hidden layer 3: 128 neurons (ReLU activation)
Output layer: 18 neurons (18 binary classifiers)

üìä Training auf MULTI-LABEL Daten:
   X_train_multi: (5465, 1000)
   y_train_multi: (5465, 6)

Training Multi-Label Neural Network...
Iteration 1, loss = 0.53642916
Iteration 1, loss = 0.55208228
Iteration 1, loss = 0.63143498
Iteration 1, loss = 0.54292549
Iteration 1, loss = 0.60748537
Iteration 1, loss = 0.53747628
Iteration 2, loss = 0.38432390
Iteration 2, loss = 0.42907287
Iteration 2, loss = 0.35055481
Iteration 2, loss = 0.40901103
Iteration 2, loss = 0.38086755
Iteration 2, loss = 0.32566156
Iteration 3, loss = 0.29974619
Iteration 3, loss = 0.21221044
Iteration 3, loss = 0.25179557
Iteration 3, loss = 0.30495521
Iteration 3, loss = 0.35494303
Iteration 3, loss = 0.13805326
Iteration 4, loss = 0.23095439
Ite

### 4.4 Multi-Label Neural Network Evaluation

In [383]:
# ============================================================
# MULTI-LABEL NEURAL NETWORK EVALUATION (auf Multi-Label Test-Set)
# ============================================================

# Make predictions auf Multi-Label Test-Set
y_pred_nn_multi = mlp_clf_multi.predict(X_test_multi)

# Calculate metrics
nn_multi_metrics = {
    'Subset Accuracy': accuracy_score(y_test_multi, y_pred_nn_multi),
    'Hamming Loss': hamming_loss(y_test_multi, y_pred_nn_multi),
    'Micro F1': f1_score(y_test_multi, y_pred_nn_multi, average='micro', zero_division=0),
    'Macro F1': f1_score(y_test_multi, y_pred_nn_multi, average='macro', zero_division=0),
    'Micro Precision': precision_score(y_test_multi, y_pred_nn_multi, average='micro', zero_division=0),
    'Micro Recall': recall_score(y_test_multi, y_pred_nn_multi, average='micro', zero_division=0)
}

print("="*60)
print("MULTI-LABEL NEURAL NETWORK EVALUATION")
print("(Test auf Multi-Label Test-Set: X_test_multi, y_test_multi)")
print("="*60)
for metric, value in nn_multi_metrics.items():
    print(f"{metric:<20}: {value:.4f}")

MULTI-LABEL NEURAL NETWORK EVALUATION
(Test auf Multi-Label Test-Set: X_test_multi, y_test_multi)
Subset Accuracy     : 0.4527
Hamming Loss        : 0.1442
Micro F1            : 0.6447
Macro F1            : 0.5636
Micro Precision     : 0.6985
Micro Recall        : 0.5987


In [384]:
# Show sample predictions
y_pred_labels = mlb.inverse_transform(y_pred_nn_multi)
y_true_labels = mlb.inverse_transform(y_test_multi)

print("\nSample Multi-Label Neural Network Predictions:")
print("-" * 60)
for i in range(5):
    text = df_test_multi['text'].iloc[i][:60]
    true = y_true_labels[i] if y_true_labels[i] else ('none',)
    pred = y_pred_labels[i] if y_pred_labels[i] else ('none',)
    match = "‚úì" if set(true) == set(pred) else "‚úó"
    print(f"\n{match} Sample {i+1}:")
    print(f"   Text: {text}...")
    print(f"   True: {true}")
    print(f"   Pred: {pred}")


Sample Multi-Label Neural Network Predictions:
------------------------------------------------------------

‚úó Sample 1:
   Text: philadelphia clearly page game playbook fire net oppose goal...
   True: ('news_&_social_concern', 'sports')
   Pred: ('sports',)

‚úó Sample 2:
   Text: sure bay face flyer man experience versus blue jacket year h...
   True: ('sports',)
   Pred: ('none',)

‚úó Sample 3:
   Text: tizamagician put cherry kentucky derby day winner pie take d...
   True: ('news_&_social_concern', 'sports')
   Pred: ('sports',)

‚úó Sample 4:
   Text: flyer give false hope absolutely destroy islander go to dest...
   True: ('news_&_social_concern', 'sports')
   Pred: ('sports',)

‚úó Sample 5:
   Text: flyer tremendous season face excited season go to well thank...
   True: ('news_&_social_concern', 'sports')
   Pred: ('sports',)


### 4.5 Naive Bayes Classifier (for Comparison)

In [385]:
# ============================================================
# NAIVE BAYES TRAINING UND EVALUATION
# ============================================================
# Trainiert auf Multi-Label Daten (wie in Lab 4)

from sklearn.feature_extraction.text import CountVectorizer as CV_LAB4
from sklearn.preprocessing import MultiLabelBinarizer as MLB_LAB4

print("="*60)
print("NAIVE BAYES (Multi-Label) - wie Lab 4")
print("="*60)
print(f"\nüìä Training auf MULTI-LABEL Daten:")
print(f"   X_train_bow_multi: {X_train_bow_multi.shape}")

# Multi-Label Binarizer (f√ºr Labels)
mlb_nb = MLB_LAB4()
y_train_nb = mlb_nb.fit_transform(df_train_multi["labels"])
y_test_nb = mlb_nb.transform(df_test_multi["labels"])

print(f"   y_train_nb: {y_train_nb.shape}")
print(f"   Classes: {len(mlb_nb.classes_)}")

# Train Naive Bayes
nb_clf = OneVsRestClassifier(MultinomialNB(alpha=1.0))
nb_clf.fit(X_train_bow_multi, y_train_nb)

# Predictions auf Multi-Label Test-Set
y_pred_nb = nb_clf.predict(X_test_bow_multi)

# Metrics
nb_metrics = {
    'Subset Accuracy': accuracy_score(y_test_nb, y_pred_nb),
    'Hamming Loss': hamming_loss(y_test_nb, y_pred_nb),
    'Micro F1': f1_score(y_test_nb, y_pred_nb, average='micro', zero_division=0),
    'Macro F1': f1_score(y_test_nb, y_pred_nb, average='macro', zero_division=0),
    'Micro Precision': precision_score(y_test_nb, y_pred_nb, average='micro', zero_division=0),
    'Micro Recall': recall_score(y_test_nb, y_pred_nb, average='micro', zero_division=0)
}

print("\n" + "="*60)
print("NAIVE BAYES EVALUATION")
print("(Test auf Multi-Label Test-Set)")
print("="*60)
for metric, value in nb_metrics.items():
    print(f"{metric:<20}: {value:.4f}")

NAIVE BAYES (Multi-Label) - wie Lab 4

üìä Training auf MULTI-LABEL Daten:
   X_train_bow_multi: (5465, 1000)
   y_train_nb: (5465, 6)
   Classes: 6

NAIVE BAYES EVALUATION
(Test auf Multi-Label Test-Set)
Subset Accuracy     : 0.4818
Hamming Loss        : 0.1370
Micro F1            : 0.6802
Macro F1            : 0.6237
Micro Precision     : 0.6942
Micro Recall        : 0.6668


---
## 5. Task 4: Single-Label Classification

For comparison, we train a neural network using single-label classification. Each tweet is assigned only its primary (first) label, converting the multi-label problem to a standard multi-class classification problem.

### 5.1 Single-Label Encoding

In [386]:
# ============================================================
# SINGLE-LABEL ENCODING - Vorbereitung f√ºr Single-Label NN
# ============================================================
# Die Labels wurden bereits in Zelle 17 encodiert (y_train_single_enc, etc.)
# Hier nur Zusammenfassung f√ºr Klarheit

print("="*60)
print("SINGLE-LABEL DATEN ZUSAMMENFASSUNG")
print("="*60)
print(f"\nüìä Single-Label Features (f√ºr Training):")
print(f"  X_train_single: {X_train_single.shape}")
print(f"  y_train_single_enc: {y_train_single_enc.shape}")

print(f"\nüìä Single-Label Features (f√ºr Testing):")
print(f"  X_test_single: {X_test_single.shape}")
print(f"  y_test_single_enc: {y_test_single_enc.shape}")

print(f"\n‚úì Class mapping:")
for i, cls in enumerate(le.classes_):
    train_count = (y_train_single_enc == i).sum()
    test_count = (y_test_single_enc == i).sum()
    print(f"  {i}: {cls} (train: {train_count}, test: {test_count})")

SINGLE-LABEL DATEN ZUSAMMENFASSUNG

üìä Single-Label Features (f√ºr Training):
  X_train_single: (5465, 1000)
  y_train_single_enc: (5465,)

üìä Single-Label Features (f√ºr Testing):
  X_test_single: (1511, 1000)
  y_test_single_enc: (1511,)

‚úì Class mapping:
  0: arts_&_culture (train: 9, test: 0)
  1: business_&_entrepreneurs (train: 6, test: 0)
  2: celebrity_&_pop_culture (train: 281, test: 81)
  3: diaries_&_daily_life (train: 553, test: 111)
  4: family (train: 9, test: 0)
  5: fashion_&_style (train: 1, test: 4)
  6: film_tv_&_video (train: 502, test: 164)
  7: fitness_&_health (train: 2, test: 1)
  8: food_&_dining (train: 4, test: 1)
  9: gaming (train: 7, test: 0)
  10: learning_&_educational (train: 1, test: 0)
  11: music (train: 1000, test: 340)
  12: news_&_social_concern (train: 1487, test: 241)
  13: other_hobbies (train: 3, test: 2)
  14: relationships (train: 5, test: 1)
  15: science_&_technology (train: 5, test: 0)
  16: sports (train: 1587, test: 565)
  17: you

### 5.2 Single-Label Neural Network Training

In [387]:
# ============================================================
# SINGLE-LABEL NEURAL NETWORK TRAINING
# ============================================================
# Trainiert auf SINGLE-LABEL Daten (X_train_single, y_train_single_enc)

# Create MLPClassifier for single-label classification
mlp_clf_single = MLPClassifier(
    hidden_layer_sizes=(128, 64, 128),  # Same architecture as multi-label
    activation='relu',                   # ReLU activation function
    solver='adam',                       # Adam optimizer
    max_iter=300,                        # Maximum iterations
    random_state=RANDOM_STATE,           # For reproducibility
    early_stopping=True,                 # Enable early stopping for single-label
    validation_fraction=0.1,             # Use 10% for validation
    verbose=True                         # Show training progress
)

print("="*60)
print("SINGLE-LABEL NEURAL NETWORK ARCHITECTURE")
print("="*60)
print(f"Input layer:  {X_train_single.shape[1]} neurons (vocabulary size)")
print(f"Hidden layer 1: 128 neurons (ReLU activation)")
print(f"Hidden layer 2: 64 neurons (ReLU activation)")
print(f"Hidden layer 3: 128 neurons (ReLU activation)")
print(f"Output layer: {len(le.classes_)} neurons (Softmax activation)")
print("="*60)
print(f"\nüìä Training auf SINGLE-LABEL Daten:")
print(f"   X_train_single: {X_train_single.shape}")
print(f"   y_train_single_enc: {y_train_single_enc.shape}")

print("\nTraining Single-Label Neural Network...")
mlp_clf_single.fit(X_train_single, y_train_single_enc)
print("\n‚úì Single-Label Neural Network training complete!")

SINGLE-LABEL NEURAL NETWORK ARCHITECTURE
Input layer:  1000 neurons (vocabulary size)
Hidden layer 1: 128 neurons (ReLU activation)
Hidden layer 2: 64 neurons (ReLU activation)
Hidden layer 3: 128 neurons (ReLU activation)
Output layer: 18 neurons (Softmax activation)

üìä Training auf SINGLE-LABEL Daten:
   X_train_single: (5465, 1000)
   y_train_single_enc: (5465,)

Training Single-Label Neural Network...
Iteration 1, loss = 2.41015208
Validation score: 0.323583
Iteration 2, loss = 1.52025911
Validation score: 0.627057
Iteration 3, loss = 0.97154871
Validation score: 0.702011
Iteration 4, loss = 0.67134946
Validation score: 0.727605
Iteration 5, loss = 0.52192845
Validation score: 0.742230
Iteration 6, loss = 0.42555599
Validation score: 0.733090
Iteration 7, loss = 0.34532969
Validation score: 0.733090
Iteration 8, loss = 0.28407572
Validation score: 0.723949
Iteration 9, loss = 0.22955961
Validation score: 0.729433
Iteration 10, loss = 0.18512835
Validation score: 0.727605
Iterati

### 5.3 Single-Label Neural Network Evaluation

In [388]:
# ============================================================
# SINGLE-LABEL NEURAL NETWORK EVALUATION (auf Single-Label Test-Set)
# ============================================================

# Make predictions auf Single-Label Test-Set
y_pred_nn_single = mlp_clf_single.predict(X_test_single)

# Calculate metrics
nn_single_metrics = {
    'Accuracy': accuracy_score(y_test_single_enc, y_pred_nn_single),
    'Macro F1': f1_score(y_test_single_enc, y_pred_nn_single, average='macro', zero_division=0),
    'Weighted F1': f1_score(y_test_single_enc, y_pred_nn_single, average='weighted', zero_division=0),
    'Macro Precision': precision_score(y_test_single_enc, y_pred_nn_single, average='macro', zero_division=0),
    'Macro Recall': recall_score(y_test_single_enc, y_pred_nn_single, average='macro', zero_division=0)
}

print("="*60)
print("SINGLE-LABEL NEURAL NETWORK EVALUATION")
print("(Test auf Single-Label Test-Set: X_test_single, y_test_single_enc)")
print("="*60)
for metric, value in nn_single_metrics.items():
    print(f"{metric:<20}: {value:.4f}")

SINGLE-LABEL NEURAL NETWORK EVALUATION
(Test auf Single-Label Test-Set: X_test_single, y_test_single_enc)
Accuracy            : 0.7101
Macro F1            : 0.3004
Weighted F1         : 0.6906
Macro Precision     : 0.2962
Macro Recall        : 0.3091


In [389]:
# ============================================================
# PARTIAL MATCH EVALUATION: Single-Label NN vs Original Multi-Labels
# ============================================================
# Pr√ºft ob die Single-Label Vorhersage IRGENDEINEM der Original Multi-Labels entspricht

def calculate_partial_match_accuracy(y_pred_single: np.ndarray, 
                                      original_labels_list: pd.Series,
                                      label_encoder: LabelEncoder) -> dict:
    """
    Calculate partial match accuracy for single-label predictions.
    
    A prediction is considered a 'hit' if the predicted label matches
    at least one of the original multi-labels.
    """
    # Convert predictions to label names
    pred_labels = label_encoder.inverse_transform(y_pred_single)
    
    # Count matches
    total = len(pred_labels)
    hits = 0
    
    for pred, original_labels in zip(pred_labels, original_labels_list):
        # Check if prediction matches ANY of the original labels
        if pred in original_labels:
            hits += 1
    
    partial_match_accuracy = hits / total if total > 0 else 0.0
    
    return {
        'total_samples': total,
        'hits': hits,
        'misses': total - hits,
        'partial_match_accuracy': partial_match_accuracy
    }

# Calculate partial match accuracy for Single-Label NN (auf Single-Label Test-Set)
partial_match_results = calculate_partial_match_accuracy(
    y_pred_nn_single, 
    df_test_single['labels'],  # Original multi-labels aus Single-Label Test-Set
    le
)

print("=" * 70)
print("PARTIAL MATCH EVALUATION: Single-Label NN vs Original Multi-Labels")
print("=" * 70)
print(f"\nA 'hit' occurs when the predicted single label matches ANY of the")
print(f"original multi-labels (not just the first/primary label).")
print("-" * 70)
print(f"Total test samples:         {partial_match_results['total_samples']:,}")
print(f"Hits (partial matches):     {partial_match_results['hits']:,}")
print(f"Misses:                     {partial_match_results['misses']:,}")
print(f"\nPartial Match Accuracy:     {partial_match_results['partial_match_accuracy']:.4f} ({partial_match_results['partial_match_accuracy']*100:.2f}%)")
print("-" * 70)

# Compare with exact single-label accuracy
print(f"\nComparison:")
print(f"  Exact Single-Label Accuracy:    {nn_single_metrics['Accuracy']:.4f}")
print(f"  Partial Match Accuracy:         {partial_match_results['partial_match_accuracy']:.4f}")
print(f"  Improvement:                    +{(partial_match_results['partial_match_accuracy'] - nn_single_metrics['Accuracy']):.4f}")
print("=" * 70)

# Show examples of partial matches
print("\nExamples of Partial Matches (pred matches non-primary label):")
print("-" * 70)
example_count = 0
for i in range(len(y_pred_nn_single)):
    pred_label = le.inverse_transform([y_pred_nn_single[i]])[0]
    true_primary = df_test_single['single_label'].iloc[i]
    original_labels = df_test_single['labels'].iloc[i]
    
    # Show cases where prediction doesn't match primary but matches another label
    if pred_label != true_primary and pred_label in original_labels:
        example_count += 1
        if example_count <= 5:
            text = df_test_single['text'].iloc[i][:50]
            print(f"\n‚úì Sample {i+1}:")
            print(f"   Text: {text}...")
            print(f"   Original labels: {original_labels}")
            print(f"   Primary label: {true_primary}")
            print(f"   Predicted: {pred_label} (matches secondary label!)")

print(f"\n" + "-" * 70)
print(f"Total samples where prediction matched a secondary label: {example_count}")

PARTIAL MATCH EVALUATION: Single-Label NN vs Original Multi-Labels

A 'hit' occurs when the predicted single label matches ANY of the
original multi-labels (not just the first/primary label).
----------------------------------------------------------------------
Total test samples:         1,511
Hits (partial matches):     1,138
Misses:                     373

Partial Match Accuracy:     0.7531 (75.31%)
----------------------------------------------------------------------

Comparison:
  Exact Single-Label Accuracy:    0.7101
  Partial Match Accuracy:         0.7531
  Improvement:                    +0.0430

Examples of Partial Matches (pred matches non-primary label):
----------------------------------------------------------------------

‚úì Sample 11:
   Text: min st belt plenty positive thank have finish litt...
   Original labels: ['news_&_social_concern' 'sports']
   Primary label: sports
   Predicted: news_&_social_concern (matches secondary label!)

‚úì Sample 40:
   Text: go 

In [390]:
# Show sample predictions for single-label
print("\nSample Single-Label Neural Network Predictions:")
print("-" * 60)
for i in range(5):
    text = df_test_single['text'].iloc[i][:60]
    true_label = le.inverse_transform([y_test_single_enc[i]])[0]
    pred_label = le.inverse_transform([y_pred_nn_single[i]])[0]
    original_labels = df_test_single['labels'].iloc[i]
    match = "‚úì" if true_label == pred_label else "‚úó"
    print(f"\n{match} Sample {i+1}:")
    print(f"   Text: {text}...")
    print(f"   Original labels: {original_labels}")
    print(f"   Single label (true): {true_label}")
    print(f"   Single label (pred): {pred_label}")


Sample Single-Label Neural Network Predictions:
------------------------------------------------------------

‚úì Sample 1:
   Text: philadelphia clearly page game playbook fire net oppose goal...
   Original labels: ['news_&_social_concern' 'sports']
   Single label (true): sports
   Single label (pred): sports

‚úì Sample 2:
   Text: sure bay face flyer man experience versus blue jacket year h...
   Original labels: ['sports']
   Single label (true): sports
   Single label (pred): sports

‚úì Sample 3:
   Text: tizamagician put cherry kentucky derby day winner pie take d...
   Original labels: ['news_&_social_concern' 'sports']
   Single label (true): sports
   Single label (pred): sports

‚úì Sample 4:
   Text: flyer give false hope absolutely destroy islander go to dest...
   Original labels: ['news_&_social_concern' 'sports']
   Single label (true): sports
   Single label (pred): sports

‚úì Sample 5:
   Text: flyer tremendous season face excited season go to well thank...
   Ori

---
## 6. Model Comparison

### 6.1 Multi-Label Models Comparison

In [391]:
# Create comparison table for multi-label models
comparison_df = pd.DataFrame({
    'Metric': list(nn_multi_metrics.keys()),
    'Neural Network (Multi-Label)': list(nn_multi_metrics.values()),
    'Naive Bayes (Multi-Label)': list(nb_metrics.values())
})

# Calculate improvement
comparison_df['Difference'] = comparison_df['Neural Network (Multi-Label)'] - comparison_df['Naive Bayes (Multi-Label)']
comparison_df['Better Model'] = comparison_df.apply(
    lambda row: 'Neural Network' if (row['Difference'] > 0 and row['Metric'] != 'Hamming Loss') 
                or (row['Difference'] < 0 and row['Metric'] == 'Hamming Loss')
                else 'Naive Bayes' if row['Difference'] != 0 else 'Tie',
    axis=1
)

print("="*80)
print("MULTI-LABEL MODEL COMPARISON: Neural Network vs Naive Bayes")
print("="*80)
print(comparison_df.to_string(index=False))
print("="*80)
print("\nNote: For Hamming Loss, lower is better. For all other metrics, higher is better.")

MULTI-LABEL MODEL COMPARISON: Neural Network vs Naive Bayes
         Metric  Neural Network (Multi-Label)  Naive Bayes (Multi-Label)  Difference   Better Model
Subset Accuracy                      0.452680                   0.481800   -0.029120    Naive Bayes
   Hamming Loss                      0.144165                   0.136995    0.007170    Naive Bayes
       Micro F1                      0.644740                   0.680227   -0.035486    Naive Bayes
       Macro F1                      0.563573                   0.623651   -0.060079    Naive Bayes
Micro Precision                      0.698469                   0.694167    0.004302 Neural Network
   Micro Recall                      0.598688                   0.666835   -0.068147    Naive Bayes

Note: For Hamming Loss, lower is better. For all other metrics, higher is better.


### 6.2 Single-Label vs Multi-Label Comparison

In [392]:
# ============================================================
# CROSS-EVALUATION: Alle Modelle auf Multi-Label Test-Set
# ============================================================
# Hier evaluieren wir ALLE Modelle auf dem Multi-Label Test-Set
# f√ºr einen fairen Vergleich.

print("="*80)
print("CROSS-EVALUATION: Alle Modelle auf Multi-Label Test-Set")
print("="*80)

# 1. Multi-Label NN auf Multi-Label Test-Set ‚Üí bereits in y_pred_nn_multi
print("\n‚úì Multi-Label NN: bereits evaluiert (nn_multi_metrics)")

# 2. Naive Bayes auf Multi-Label Test-Set ‚Üí bereits in y_pred_nb
print("‚úì Naive Bayes: bereits evaluiert (nb_metrics)")

# 3. Single-Label NN auf Multi-Label Test-Set ‚Üí NEUE Predictions n√∂tig!
print("\nüìä Single-Label NN auf Multi-Label Test-Set:")
print(f"   Predicting auf X_test_multi mit Single-Label NN...")

# Predictions des Single-Label NN auf Multi-Label Test-Features
y_pred_single_on_multi = mlp_clf_single.predict(X_test_multi)

# Konvertiere Single-Label Predictions zu Multi-Label Format
num_multi_classes = y_test_multi.shape[1]
num_single_classes = len(le.classes_)

# Mapping: Single-Label Index ‚Üí Multi-Label Index
single_to_multi_idx = {}
for sl_idx, sl_class in enumerate(le.classes_):
    if sl_class in mlb.classes_:
        ml_idx = list(mlb.classes_).index(sl_class)
        single_to_multi_idx[sl_idx] = ml_idx

print(f"   Klassen-Mapping: {len(single_to_multi_idx)}/{num_single_classes} Klassen gemappt")

# Konvertiere zu Multi-Label Format
y_pred_single_as_multi = np.zeros((len(y_pred_single_on_multi), num_multi_classes), dtype=int)
for i, pred in enumerate(y_pred_single_on_multi):
    if pred in single_to_multi_idx:
        y_pred_single_as_multi[i, single_to_multi_idx[pred]] = 1

# Berechne Metriken f√ºr Single-Label NN auf Multi-Label Test-Set
single_on_multi_metrics = {
    'Subset Accuracy': accuracy_score(y_test_multi, y_pred_single_as_multi),
    'Hamming Loss': hamming_loss(y_test_multi, y_pred_single_as_multi),
    'Micro F1': f1_score(y_test_multi, y_pred_single_as_multi, average='micro', zero_division=0),
    'Macro F1': f1_score(y_test_multi, y_pred_single_as_multi, average='macro', zero_division=0),
    'Micro Precision': precision_score(y_test_multi, y_pred_single_as_multi, average='micro', zero_division=0),
    'Micro Recall': recall_score(y_test_multi, y_pred_single_as_multi, average='micro', zero_division=0)
}

print(f"\n{'='*80}")
print("VERGLEICH AUF MULTI-LABEL TEST-SET")
print("="*80)
print(f"\n{'Metric':<20} {'Multi-Label NN':<18} {'Naive Bayes':<18} {'Single-Label NN':<18}")
print("-"*76)
for metric in ['Subset Accuracy', 'Hamming Loss', 'Micro F1', 'Macro F1']:
    multi = nn_multi_metrics[metric]
    nb = nb_metrics[metric]
    single = single_on_multi_metrics[metric]
    print(f"{metric:<20} {multi:<18.4f} {nb:<18.4f} {single:<18.4f}")

print("\nüí° Interpretation:")
print("  - Single-Label NN kann nur EINE Klasse pro Sample vorhersagen")
print("  - Multi-Label Ground Truth kann MEHRERE Klassen pro Sample haben")
print("  - Daher sind Multi-Label NN und Naive Bayes bei Multi-Label Daten im Vorteil")

CROSS-EVALUATION: Alle Modelle auf Multi-Label Test-Set

‚úì Multi-Label NN: bereits evaluiert (nn_multi_metrics)
‚úì Naive Bayes: bereits evaluiert (nb_metrics)

üìä Single-Label NN auf Multi-Label Test-Set:
   Predicting auf X_test_multi mit Single-Label NN...
   Klassen-Mapping: 6/18 Klassen gemappt

VERGLEICH AUF MULTI-LABEL TEST-SET

Metric               Multi-Label NN     Naive Bayes        Single-Label NN   
----------------------------------------------------------------------------
Subset Accuracy      0.4527             0.4818             0.5453            
Hamming Loss         0.1442             0.1370             0.1341            
Micro F1             0.6447             0.6802             0.6518            
Macro F1             0.5636             0.6237             0.5257            

üí° Interpretation:
  - Single-Label NN kann nur EINE Klasse pro Sample vorhersagen
  - Multi-Label Ground Truth kann MEHRERE Klassen pro Sample haben
  - Daher sind Multi-Label NN und Naiv

### 6.3 Hybrid Model: Combining Single-Label and Multi-Label Networks

**Idee:** Nutze das beste aus beiden Welten:
- Tweets mit **genau 1 Label** ‚Üí Single-Label NN (optimiert f√ºr eindeutige Klassifikation)
- Tweets mit **2+ Labels** ‚Üí Multi-Label NN (kann mehrere Labels vorhersagen)

**Warum das sinnvoll ist:**
- Single-Label NN ist spezialisiert auf eindeutige Entscheidungen
- Multi-Label NN kann komplexe √úberlappungen von Themen erfassen
- Kombiniert man beide, nutzt man die St√§rken beider Ans√§tze

In [393]:
# ============================================================
# HYBRID MODEL: Single-Label NN + Multi-Label NN Kombination
# ============================================================
# Strategie:
# - Tweets mit genau 1 Label ‚Üí Single-Label NN Prediction
# - Tweets mit 2+ Labels ‚Üí Multi-Label NN Prediction
# 
# HINWEIS: Hybrid Model wird NUR auf Multi-Label Test-Set evaluiert

print("="*80)
print("HYBRID MODEL EVALUATION (nur Multi-Label Test-Set)")
print("="*80)

# 1. Identifiziere Single-Label vs Multi-Label Samples im Multi-Label Test-Set
num_labels_per_sample = y_test_multi.sum(axis=1)
is_single_label = num_labels_per_sample == 1
is_multi_label = num_labels_per_sample > 1

n_single = is_single_label.sum()
n_multi = is_multi_label.sum()
n_total = len(y_test_multi)

print(f"\nüìä Multi-Label Test-Set Zusammensetzung:")
print(f"  Single-Label Samples (1 Label):  {n_single:,} ({100*n_single/n_total:.1f}%)")
print(f"  Multi-Label Samples (2+ Labels): {n_multi:,} ({100*n_multi/n_total:.1f}%)")
print(f"  Gesamt:                          {n_total:,}")

# 2. Erstelle Hybrid-Predictions
# F√ºr Single-Label Samples: Verwende Single-Label NN (auf Multi-Label Test-Features)
# F√ºr Multi-Label Samples: Verwende Multi-Label NN

y_pred_hybrid = np.zeros_like(y_test_multi)

# Single-Label Samples ‚Üí Single-Label NN Prediction (konvertiert zu Multi-Label Format)
single_label_indices = np.where(is_single_label)[0]
for idx in single_label_indices:
    pred_class = y_pred_single_on_multi[idx]  # Predictions von Single-Label NN auf Multi-Label Test
    # Mappe Single-Label Index zu Multi-Label Index
    if pred_class in single_to_multi_idx:
        y_pred_hybrid[idx, single_to_multi_idx[pred_class]] = 1

# Multi-Label Samples ‚Üí Multi-Label NN Prediction
multi_label_indices = np.where(is_multi_label)[0]
y_pred_hybrid[multi_label_indices] = y_pred_nn_multi[multi_label_indices]

print(f"\n‚úì Hybrid-Predictions erstellt:")
print(f"  {n_single} Samples ‚Üí Single-Label NN")
print(f"  {n_multi} Samples ‚Üí Multi-Label NN")

# 3. Evaluiere das Hybrid-Modell
hybrid_metrics = {
    'Subset Accuracy': accuracy_score(y_test_multi, y_pred_hybrid),
    'Hamming Loss': hamming_loss(y_test_multi, y_pred_hybrid),
    'Micro F1': f1_score(y_test_multi, y_pred_hybrid, average='micro', zero_division=0),
    'Macro F1': f1_score(y_test_multi, y_pred_hybrid, average='macro', zero_division=0),
    'Micro Precision': precision_score(y_test_multi, y_pred_hybrid, average='micro', zero_division=0),
    'Micro Recall': recall_score(y_test_multi, y_pred_hybrid, average='micro', zero_division=0)
}

print(f"\n{'='*80}")
print("HYBRID MODEL METRICS (Multi-Label Test-Set)")
print("="*80)
for metric, value in hybrid_metrics.items():
    print(f"{metric:<20}: {value:.4f}")

# 4. Vergleiche alle Modelle auf Multi-Label Test-Set
print(f"\n{'='*80}")
print("VERGLEICH AUF MULTI-LABEL TEST-SET (alle Modelle)")
print("="*80)
print(f"\n{'Metric':<20} {'Multi-Label NN':<16} {'Hybrid':<16} {'Naive Bayes':<16} {'Single-Label NN':<16}")
print("-"*84)
for metric in ['Subset Accuracy', 'Hamming Loss', 'Micro F1', 'Macro F1']:
    multi = nn_multi_metrics[metric]
    hybrid = hybrid_metrics[metric]
    nb = nb_metrics[metric]
    single = single_on_multi_metrics[metric]
    
    # Markiere das beste Modell
    if metric == 'Hamming Loss':
        best = min(multi, hybrid, nb, single)
    else:
        best = max(multi, hybrid, nb, single)
    
    multi_str = f"{multi:.4f}" + (" ‚òÖ" if multi == best else "")
    hybrid_str = f"{hybrid:.4f}" + (" ‚òÖ" if hybrid == best else "")
    nb_str = f"{nb:.4f}" + (" ‚òÖ" if nb == best else "")
    single_str = f"{single:.4f}" + (" ‚òÖ" if single == best else "")
    
    print(f"{metric:<20} {multi_str:<16} {hybrid_str:<16} {nb_str:<16} {single_str:<16}")

print(f"\n‚òÖ = Bestes Modell f√ºr diese Metrik")
print(f"(Bei Hamming Loss: niedriger ist besser)")

# 5. Detaillierte Analyse: Performance auf Subsets
print(f"\n{'='*80}")
print("DETAILANALYSE: Performance auf Single-Label vs Multi-Label Subsets")
print("="*80)

# Single-Label Subset
if n_single > 0:
    print(f"\nüìå Single-Label Subset ({n_single} Samples):")
    
    # Hybrid auf Single-Label Subset
    hybrid_single_acc = accuracy_score(y_test_multi[is_single_label], y_pred_hybrid[is_single_label])
    hybrid_single_f1 = f1_score(y_test_multi[is_single_label], y_pred_hybrid[is_single_label], average='micro', zero_division=0)
    
    # Multi-Label NN auf Single-Label Subset
    multi_single_acc = accuracy_score(y_test_multi[is_single_label], y_pred_nn_multi[is_single_label])
    multi_single_f1 = f1_score(y_test_multi[is_single_label], y_pred_nn_multi[is_single_label], average='micro', zero_division=0)
    
    print(f"  Hybrid Model (Single-Label NN):  Acc={hybrid_single_acc:.4f}, F1={hybrid_single_f1:.4f}")
    print(f"  Multi-Label NN:                  Acc={multi_single_acc:.4f}, F1={multi_single_f1:.4f}")
    diff_acc = hybrid_single_acc - multi_single_acc
    print(f"  ‚Üí Hybrid ist {'besser' if diff_acc > 0 else 'schlechter'} um {abs(diff_acc):.4f}")

# Multi-Label Subset  
if n_multi > 0:
    print(f"\nüìå Multi-Label Subset ({n_multi} Samples):")
    
    # Hybrid auf Multi-Label Subset (verwendet Multi-Label NN)
    hybrid_multi_acc = accuracy_score(y_test_multi[is_multi_label], y_pred_hybrid[is_multi_label])
    hybrid_multi_f1 = f1_score(y_test_multi[is_multi_label], y_pred_hybrid[is_multi_label], average='micro', zero_division=0)
    
    # Single-Label NN auf Multi-Label Subset
    single_as_multi_subset = y_pred_single_as_multi[is_multi_label]
    single_multi_acc = accuracy_score(y_test_multi[is_multi_label], single_as_multi_subset)
    single_multi_f1 = f1_score(y_test_multi[is_multi_label], single_as_multi_subset, average='micro', zero_division=0)
    
    print(f"  Hybrid Model (Multi-Label NN):   Acc={hybrid_multi_acc:.4f}, F1={hybrid_multi_f1:.4f}")
    print(f"  Single-Label NN:                 Acc={single_multi_acc:.4f}, F1={single_multi_f1:.4f}")
    diff_f1 = hybrid_multi_f1 - single_multi_f1
    print(f"  ‚Üí Hybrid ist {'besser' if diff_f1 > 0 else 'schlechter'} um {abs(diff_f1):.4f}")

print("\n" + "="*80)
print("\nüí° Interpretation:")
print("  Das Hybrid-Modell kombiniert die St√§rken beider Ans√§tze:")
print("  - Single-Label NN f√ºr Tweets mit nur einem Topic")
print("  - Multi-Label NN f√ºr Tweets mit mehreren Topics")

HYBRID MODEL EVALUATION (nur Multi-Label Test-Set)

üìä Multi-Label Test-Set Zusammensetzung:
  Single-Label Samples (1 Label):  1,104 (73.1%)
  Multi-Label Samples (2+ Labels): 407 (26.9%)
  Gesamt:                          1,511

‚úì Hybrid-Predictions erstellt:
  1104 Samples ‚Üí Single-Label NN
  407 Samples ‚Üí Multi-Label NN

HYBRID MODEL METRICS (Multi-Label Test-Set)
Subset Accuracy     : 0.5758
Hamming Loss        : 0.1296
Micro F1            : 0.6741
Macro F1            : 0.5896
Micro Precision     : 0.7482
Micro Recall        : 0.6133

VERGLEICH AUF MULTI-LABEL TEST-SET (alle Modelle)

Metric               Multi-Label NN   Hybrid           Naive Bayes      Single-Label NN 
------------------------------------------------------------------------------------
Subset Accuracy      0.4527           0.5758 ‚òÖ         0.4818           0.5453          
Hamming Loss         0.1442           0.1296 ‚òÖ         0.1370           0.1341          
Micro F1             0.6447           0

### 6.3 All Models Summary

In [394]:
# ============================================================
# COMPREHENSIVE MODEL COMPARISON SUMMARY
# ============================================================
# Evaluiere ALLE Modelle auf BEIDEN Test-Sets f√ºr fairen Vergleich
# (Hybrid Model nur auf Multi-Label Test-Set)

print("="*95)
print("COMPREHENSIVE MODEL COMPARISON SUMMARY")
print("="*95)

# ============================================================
# 1. Evaluiere Multi-Label NN und Naive Bayes auf Single-Label Test-Set
# ============================================================
# Daf√ºr m√ºssen wir Predictions auf Single-Label Test-Set machen und zu Single-Label konvertieren

print("\nüìä Cross-Evaluation: Multi-Label Modelle auf Single-Label Test-Set")

# Multi-Label NN auf Single-Label Test-Features predicten
y_pred_multi_nn_on_single_test = mlp_clf_multi.predict(X_test_single)
# Konvertiere zu Single-Label (argmax)
y_pred_multi_nn_as_single = np.array([np.argmax(row) if row.sum() > 0 else 0 for row in y_pred_multi_nn_on_single_test])

# Naive Bayes auf Single-Label Test-Features predicten
vectorizer_single = CountVectorizer(vocabulary=VOCABULARY, lowercase=True, token_pattern=r"(?u)\b\w+\b")
X_test_single_bow = vectorizer_single.fit_transform(df_test_single['text'])
y_pred_nb_on_single_test = nb_clf.predict(X_test_single_bow)
# Konvertiere zu Single-Label (argmax)
y_pred_nb_as_single = np.array([np.argmax(row) if row.sum() > 0 else 0 for row in y_pred_nb_on_single_test])

# Berechne Single-Label Metriken f√ºr Multi-Label NN
multi_nn_single_metrics = {
    'Accuracy': accuracy_score(y_test_single_enc, y_pred_multi_nn_as_single),
    'Weighted F1': f1_score(y_test_single_enc, y_pred_multi_nn_as_single, average='weighted', zero_division=0),
    'Macro F1': f1_score(y_test_single_enc, y_pred_multi_nn_as_single, average='macro', zero_division=0),
}

# Berechne Single-Label Metriken f√ºr Naive Bayes
nb_single_metrics = {
    'Accuracy': accuracy_score(y_test_single_enc, y_pred_nb_as_single),
    'Weighted F1': f1_score(y_test_single_enc, y_pred_nb_as_single, average='weighted', zero_division=0),
    'Macro F1': f1_score(y_test_single_enc, y_pred_nb_as_single, average='macro', zero_division=0),
}

print("‚úì Cross-Evaluation abgeschlossen")

# ============================================================
# 2. MULTI-LABEL TEST-SET ERGEBNISSE
# ============================================================
print("\n" + "-"*95)
print("MULTI-LABEL TEST-SET EVALUATION (Ground Truth hat 1 oder mehr Labels)")
print("-"*95)
print(f"{'Model':<35} {'Subset Acc':<12} {'Micro F1':<12} {'Macro F1':<12} {'Hamming Loss':<12}")
print("-"*95)

# Sortiere nach Micro F1 (absteigend)
multi_results = [
    ('Multi-Label Neural Network', nn_multi_metrics),
    ('Hybrid Model (Single+Multi NN)', hybrid_metrics),
    ('Naive Bayes (Multi-Label)', nb_metrics),
    ('Single-Label NN', single_on_multi_metrics),
]

for name, metrics in sorted(multi_results, key=lambda x: x[1]['Micro F1'], reverse=True):
    print(f"{name:<35} {metrics['Subset Accuracy']:<12.4f} {metrics['Micro F1']:<12.4f} {metrics['Macro F1']:<12.4f} {metrics['Hamming Loss']:<12.4f}")

print("-"*95)
print("(Sortiert nach Micro F1, absteigend)")

# ============================================================
# 3. SINGLE-LABEL TEST-SET ERGEBNISSE
# ============================================================
print("\n" + "-"*95)
print("SINGLE-LABEL TEST-SET EVALUATION (Ground Truth hat genau 1 Label)")
print("-"*95)
print(f"{'Model':<35} {'Accuracy':<12} {'Weighted F1':<12} {'Macro F1':<12}")
print("-"*95)

# Sortiere nach Accuracy (absteigend)
single_results = [
    ('Single-Label Neural Network', nn_single_metrics),
    ('Multi-Label Neural Network', multi_nn_single_metrics),
    ('Naive Bayes (Multi-Label)', nb_single_metrics),
]

for name, metrics in sorted(single_results, key=lambda x: x[1]['Accuracy'], reverse=True):
    print(f"{name:<35} {metrics['Accuracy']:<12.4f} {metrics['Weighted F1']:<12.4f} {metrics['Macro F1']:<12.4f}")

print("-"*95)
print("(Sortiert nach Accuracy, absteigend)")
print("(Hinweis: Hybrid-Modell wird nur auf Multi-Label Test-Set evaluiert)")

# ============================================================
# 4. ZUSAMMENFASSUNG: Bestes Modell pro Metrik
# ============================================================
print("\n" + "="*95)
print("BEST MODEL PER METRIC")
print("="*95)

print("\nüìä Multi-Label Test-Set:")
metrics_multi = ['Subset Accuracy', 'Micro F1', 'Macro F1', 'Hamming Loss']
for metric in metrics_multi:
    if metric == 'Hamming Loss':
        best_model = min(multi_results, key=lambda x: x[1][metric])
        print(f"  {metric:<20}: {best_model[0]} ({best_model[1][metric]:.4f}) ‚Üê niedriger ist besser")
    else:
        best_model = max(multi_results, key=lambda x: x[1][metric])
        print(f"  {metric:<20}: {best_model[0]} ({best_model[1][metric]:.4f})")

print("\nüìä Single-Label Test-Set:")
metrics_single = ['Accuracy', 'Weighted F1', 'Macro F1']
for metric in metrics_single:
    best_model = max(single_results, key=lambda x: x[1][metric])
    print(f"  {metric:<20}: {best_model[0]} ({best_model[1][metric]:.4f})")

print("\n" + "="*95)

COMPREHENSIVE MODEL COMPARISON SUMMARY

üìä Cross-Evaluation: Multi-Label Modelle auf Single-Label Test-Set
‚úì Cross-Evaluation abgeschlossen

-----------------------------------------------------------------------------------------------
MULTI-LABEL TEST-SET EVALUATION (Ground Truth hat 1 oder mehr Labels)
-----------------------------------------------------------------------------------------------
Model                               Subset Acc   Micro F1     Macro F1     Hamming Loss
-----------------------------------------------------------------------------------------------
Naive Bayes (Multi-Label)           0.4818       0.6802       0.6237       0.1370      
Hybrid Model (Single+Multi NN)      0.5758       0.6741       0.5896       0.1296      
Single-Label NN                     0.5453       0.6518       0.5257       0.1341      
Multi-Label Neural Network          0.4527       0.6447       0.5636       0.1442      
---------------------------------------------------------

---
## 7. Optional: Experiment with Different Network Sizes

In [395]:
# Define different architectures to test
architectures = {
    'Small (64-32-64)': (64, 32, 64),
    'Medium (128-64-128)': (128, 64, 128),  # Original
    'Large (256-128-256)': (256, 128, 256),
    'Deep (128-128-64-64-128-128)': (128, 128, 64, 64, 128, 128),
    'Wide (512-256-512)': (512, 256, 512)
}

results = []

print("Experimenting with different network architectures (Multi-Label)...")
print("="*60)

for name, layers in architectures.items():
    print(f"\nTraining: {name}...")
    
    # Create and train model
    mlp = MLPClassifier(
        hidden_layer_sizes=layers,
        activation='relu',
        solver='adam',
        max_iter=200,
        random_state=RANDOM_STATE,
        early_stopping=False,  # Disabled for multi-label compatibility
        verbose=False
    )
    
    clf = OneVsRestClassifier(mlp, n_jobs=-1)
    clf.fit(X_train_multi, y_train_multi)  # Training auf Multi-Label Daten
    
    # Evaluate auf Multi-Label Test-Set
    y_pred = clf.predict(X_test_multi)
    
    results.append({
        'Architecture': name,
        'Layers': str(layers),
        'Accuracy': accuracy_score(y_test_multi, y_pred),
        'Micro F1': f1_score(y_test_multi, y_pred, average='micro', zero_division=0),
        'Macro F1': f1_score(y_test_multi, y_pred, average='macro', zero_division=0)
    })
    
    print(f"  Accuracy: {results[-1]['Accuracy']:.4f}, Micro F1: {results[-1]['Micro F1']:.4f}")

# Display results
results_df = pd.DataFrame(results)
print("\n" + "="*80)
print("ARCHITECTURE COMPARISON RESULTS (Multi-Label Test-Set)")
print("="*80)
print(results_df.to_string(index=False))

Experimenting with different network architectures (Multi-Label)...

Training: Small (64-32-64)...
  Accuracy: 0.4520, Micro F1: 0.6409

Training: Medium (128-64-128)...
  Accuracy: 0.4527, Micro F1: 0.6447

Training: Large (256-128-256)...
  Accuracy: 0.4573, Micro F1: 0.6455

Training: Deep (128-128-64-64-128-128)...
  Accuracy: 0.4725, Micro F1: 0.6535

Training: Wide (512-256-512)...
  Accuracy: 0.4626, Micro F1: 0.6485

ARCHITECTURE COMPARISON RESULTS (Multi-Label Test-Set)
                Architecture                       Layers  Accuracy  Micro F1  Macro F1
            Small (64-32-64)                 (64, 32, 64)  0.452019  0.640916  0.569563
         Medium (128-64-128)               (128, 64, 128)  0.452680  0.644740  0.563573
         Large (256-128-256)              (256, 128, 256)  0.457313  0.645541  0.567340
Deep (128-128-64-64-128-128) (128, 128, 64, 64, 128, 128)  0.472535  0.653532  0.573370
          Wide (512-256-512)              (512, 256, 512)  0.462608  0.64847

---
## 8. Summary

### What was accomplished
1. Loaded preprocessed data from Lab 2 and vocabulary from Lab 4
2. Created binary feature vectors (Bag-of-Words encoding) for all samples
3. Trained a Multi-Label Neural Network with 128‚Üí64‚Üí128 hidden layers using MLPClassifier and OneVsRestClassifier
4. Converted multi-label data to single-label by keeping only the primary label
5. Trained a Single-Label Neural Network with the same architecture
6. Compared Multi-Label NN, Single-Label NN, and Naive Bayes classifiers
7. Experimented with different network architectures

### Key Findings
- Multi-label classification allows predicting multiple topics per tweet
- Single-label classification simplifies the problem but loses information about secondary topics
- Neural networks can capture non-linear relationships in text classification
- The MLPClassifier with ReLU activation and Adam optimizer provides good results
- For multi-label tasks, OneVsRestClassifier trains separate binary classifiers per class
- For single-label tasks, MLPClassifier uses softmax output for probability distribution
- Network architecture affects performance, but larger isn't always better

In [396]:
print("="*70)
print("LAB 5 SUMMARY")
print("="*70)
print(f"Input vocabulary: {VOCABULARY_PATH}")
print(f"\nüìä DATENSTRUKTUR:")
print(f"  Multi-Label Training:  {len(df_train_multi):,} Samples")
print(f"  Multi-Label Test:      {len(df_test_multi):,} Samples")
print(f"  Single-Label Training: {len(df_train_single):,} Samples")
print(f"  Single-Label Test:     {len(df_test_single):,} Samples")
print(f"  Feature vector size:   {X_train_multi.shape[1]}")
print(f"  Number of classes:     {len(TOPIC_CLASSES)}")

print(f"\nüìä TRAINING:")
print(f"  Multi-Label NN:  trainiert auf Multi-Label Daten ({len(df_train_multi):,} Samples)")
print(f"  Naive Bayes:     trainiert auf Multi-Label Daten ({len(df_train_multi):,} Samples)")
print(f"  Single-Label NN: trainiert auf Single-Label Daten ({len(df_train_single):,} Samples)")

print(f"\nüìä TESTING:")
print(f"  Multi-Label Test-Set:  alle Modelle (+ Hybrid)")
print(f"  Single-Label Test-Set: alle au√üer Hybrid")

print(f"\nüìä MULTI-LABEL NN METRICS (Multi-Label Test-Set):")
print(f"  Subset Accuracy: {nn_multi_metrics['Subset Accuracy']:.4f}")
print(f"  Micro F1: {nn_multi_metrics['Micro F1']:.4f}")
print(f"  Macro F1: {nn_multi_metrics['Macro F1']:.4f}")

print(f"\nüìä SINGLE-LABEL NN METRICS (Single-Label Test-Set):")
print(f"  Accuracy: {nn_single_metrics['Accuracy']:.4f}")
print(f"  Weighted F1: {nn_single_metrics['Weighted F1']:.4f}")
print(f"  Macro F1: {nn_single_metrics['Macro F1']:.4f}")
print("="*70)

LAB 5 SUMMARY
Input vocabulary: ../Data/top_1000_vocabulary.json

üìä DATENSTRUKTUR:
  Multi-Label Training:  5,465 Samples
  Multi-Label Test:      1,511 Samples
  Single-Label Training: 5,465 Samples
  Single-Label Test:     1,511 Samples
  Feature vector size:   1000
  Number of classes:     18

üìä TRAINING:
  Multi-Label NN:  trainiert auf Multi-Label Daten (5,465 Samples)
  Naive Bayes:     trainiert auf Multi-Label Daten (5,465 Samples)
  Single-Label NN: trainiert auf Single-Label Daten (5,465 Samples)

üìä TESTING:
  Multi-Label Test-Set:  alle Modelle (+ Hybrid)
  Single-Label Test-Set: alle au√üer Hybrid

üìä MULTI-LABEL NN METRICS (Multi-Label Test-Set):
  Subset Accuracy: 0.4527
  Micro F1: 0.6447
  Macro F1: 0.5636

üìä SINGLE-LABEL NN METRICS (Single-Label Test-Set):
  Accuracy: 0.7101
  Weighted F1: 0.6906
  Macro F1: 0.3004


Exception ignored in: <function ResourceTracker.__del__ at 0x1034dd8a0>
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.13/multiprocessing/resource_tracker.py", line 84, in __del__
  File "/opt/anaconda3/lib/python3.13/multiprocessing/resource_tracker.py", line 93, in _stop
  File "/opt/anaconda3/lib/python3.13/multiprocessing/resource_tracker.py", line 118, in _stop_locked
ChildProcessError: [Errno 10] No child processes
Exception ignored in: <function ResourceTracker.__del__ at 0x106fd98a0>
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.13/multiprocessing/resource_tracker.py", line 84, in __del__
  File "/opt/anaconda3/lib/python3.13/multiprocessing/resource_tracker.py", line 93, in _stop
  File "/opt/anaconda3/lib/python3.13/multiprocessing/resource_tracker.py", line 118, in _stop_locked
ChildProcessError: [Errno 10] No child processes
Exception ignored in: <function ResourceTracker.__del__ at 0x1086f18a0>
Traceback (most recent call last