### 1. Data Loading and Preprocessing

This cell handles the initial setup, including mounting Google Drive, loading the dataset, and performing essential preprocessing steps.

- **Drive Mount:** Mounts the Google Drive to access the dataset file.
- **Data Loading:** Loads the movie data from a CSV file into a pandas DataFrame.
- **Association Rule Mining:**
    - The `Output` column, containing comma-separated genres, is split into a list of genres for each movie.
    - `TransactionEncoder` converts this list into a one-hot encoded format suitable for association rule mining.
    - `fpgrowth` is used to find frequent itemsets of genres.
    - `association_rules` generates rules based on these itemsets, which are then filtered for high confidence and support.
- **Multi-Label Classification Preprocessing:**
    - The `description` for each movie is extracted from the `Input` column.
    - The `Output` column is converted into a list of genre labels.
    - `MultiLabelBinarizer` transforms these genre lists into a binary matrix format, which is the standard for multi-label classification tasks.

In [1]:
!pip install mlxtend
!pip install ltntorch

Collecting ltntorch
  Downloading LTNtorch-1.0.2-py3-none-any.whl.metadata (13 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch->ltntorch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch->ltntorch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch->ltntorch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3

In [2]:
from google.colab import drive
import pandas as pd
from collections import Counter
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import fpgrowth, association_rules
from sklearn.preprocessing import MultiLabelBinarizer

# Mount Google Drive
drive.mount('/content/drive', force_remount=True)

# Load the dataset
df = pd.read_csv('/content/drive/My Drive/movie-genre-prediction/train.csv')

# --- Step 1: Parse genre labels ---
# Assumes genres are comma-separated strings
df["genre_list"] = df["expanded-genres"].fillna("").apply(lambda x: [genre.strip() for genre in x.split(", ") if genre.strip()])

# --- Step 2: Count frequency of each genre ---
genre_counter = Counter()
for genres in df["genre_list"]:
    genre_counter.update(genres)

# Display number of samples per genre
print("Samples per genre:")
for genre, count in genre_counter.items():
    print(f"{genre:<15} {count}")

# --- Step 3: Identify minority genres ---
# Set threshold for minority genre (e.g., fewer than 200 samples)
MINORITY_THRESHOLD = 200
minority_genres = {genre for genre, count in genre_counter.items() if count < MINORITY_THRESHOLD}

print(f"\nMinority genres (< {MINORITY_THRESHOLD} samples): {sorted(minority_genres)}")

# --- Step 4: Split dataset ---
# Mark rows that contain any minority genre
df["contains_minority"] = df["genre_list"].apply(lambda genres: any(g in minority_genres for g in genres))

# Keep all minority rows
minority_df = df[df["contains_minority"]]

# Sample 10% of the remaining data
non_minority_df = df[~df["contains_minority"]].sample(frac=0.10, random_state=42)

print(f"\nOriginal dataset size: {len(df)}")
# Combine both
df = pd.concat([minority_df, non_minority_df]).reset_index(drop=True)

print(f"Minority rows kept: {len(minority_df)}")
print(f"Non-minority rows sampled: {len(non_minority_df)}")
print(f"Total train set size: {len(df)}")

# Association rule mining
transactions = df['expanded-genres'].str.split(', ').tolist()
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = fpgrowth(df_encoded, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
high_confidence_rules = rules[(rules['confidence'] > 0.25) & (rules['support'] > 0.001)]

# Data preprocessing for multi-label classification
#df['description'] = df['Input'].apply(lambda x: x.split('\n\n', 1)[1] if '\n\n' in x else '')
df['Output-Label'] = df['expanded-genres'].str.split(', ')
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(df['Output-Label'])

# Display results
display(df.head())
display(high_confidence_rules)
print("Descriptions:")
display(df['description'].head())
print("\nBinary Labels (y):")
display(y[:5])

Mounted at /content/drive
Samples per genre:
Fantasy         20868
Sci-Fi          17567
Comedy          46119
Drama           96184
Romance         39076
Thriller        50464
Animation       13018
Adventure       41307
Family          17086
Biography       9489
Action          67419
Horror          41873
Mystery         25737
Musical         3288
Music           3653
History         9576
Crime           48370
War             7359
Film-Noir       2066
Western         1632
Sport           3518
Game-Show       6
Reality-TV      22
Adult           4
News            16
Talk-Show       4
Short           1

Minority genres (< 200 samples): ['Adult', 'Game-Show', 'News', 'Reality-TV', 'Short', 'Talk-Show']

Original dataset size: 238256
Minority rows kept: 51
Non-minority rows sampled: 23820
Total train set size: 23871


Unnamed: 0,movie title - year,genre,expanded-genres,rating,description,genre_list,contains_minority,Output-Label
0,Outrageous! - 1998,Family,"Family, Game-Show",,Two teams try to entice unsuspecting people to...,"[Family, Game-Show]",True,"[Family, Game-Show]"
1,Vancouver Remembers - 2018,War,"History, Reality-TV, War",,The coverage of the day's Remembrance Day cere...,"[History, Reality-TV, War]",True,"[History, Reality-TV, War]"
2,Cocaine Wars - 1985,Action,"Action, Adult, Drama",4.3,A DEA undercover agent who works for the bigge...,"[Action, Adult, Drama]",True,"[Action, Adult, Drama]"
3,Affected - 2010,Thriller,"Drama, News, Thriller",,Natural resources are running out and a myster...,"[Drama, News, Thriller]",True,"[Drama, News, Thriller]"
4,"Oh, Youth! - 1995",Romance,"Comedy, Drama, Reality-TV",5.4,A North Korean romantic comedy that involves m...,"[Comedy, Drama, Reality-TV]",True,"[Comedy, Drama, Reality-TV]"


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Family),(Adventure),0.070546,0.173432,0.020653,0.292755,1.68801,1.0,0.008418,1.168716,0.438523,0.092478,0.14436,0.205919
3,(Family),(Comedy),0.070546,0.196556,0.020904,0.296318,1.507548,1.0,0.007038,1.141771,0.362225,0.084907,0.124168,0.201335
5,(History),(Drama),0.037577,0.400276,0.025386,0.675585,1.687797,1.0,0.010345,1.848632,0.423422,0.061548,0.45906,0.369504
6,(War),(Drama),0.028612,0.400276,0.018558,0.648609,1.620403,1.0,0.007105,1.706713,0.394147,0.045227,0.414078,0.347486
9,"(Comedy, Crime)",(Action),0.033053,0.283314,0.010054,0.304183,1.073657,1.0,0.00069,1.029991,0.070949,0.032823,0.029117,0.169835
12,(Thriller),(Horror),0.211638,0.17825,0.056135,0.265241,1.488033,1.0,0.018411,1.118395,0.416017,0.168194,0.105862,0.290083
13,(Horror),(Thriller),0.17825,0.211638,0.056135,0.314924,1.488033,1.0,0.018411,1.150766,0.399114,0.168194,0.131013,0.290083
18,(Sci-Fi),(Action),0.074777,0.283314,0.021574,0.288515,1.018357,1.0,0.000389,1.00731,0.019483,0.064111,0.007257,0.182333
20,(Drama),(Crime),0.400276,0.207574,0.10762,0.268864,1.29527,1.0,0.024533,1.083829,0.380109,0.215141,0.077345,0.393665
21,(Crime),(Drama),0.207574,0.400276,0.10762,0.518466,1.29527,1.0,0.024533,1.245444,0.287674,0.215141,0.197074,0.393665


Descriptions:


Unnamed: 0,description
0,Two teams try to entice unsuspecting people to...
1,The coverage of the day's Remembrance Day cere...
2,A DEA undercover agent who works for the bigge...
3,Natural resources are running out and a myster...
4,A North Korean romantic comedy that involves m...



Binary Labels (y):


array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
        0, 0, 0, 1, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
        0, 0, 0, 0, 0]])

### 1.1. Train+Val / Test Split and tokenizer and model loading

This cell defines and trains a baseline multi-label classification model using a pre-trained DistilBERT model.

- **Device Configuration:** Sets the device to "cuda" if a GPU is available, otherwise "cpu".
- **Tokenizer and Model Loading:** Loads the "distilbert-base-uncased" tokenizer and model from the Hugging Face library.

In [6]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel
from sklearn.model_selection import train_test_split
import numpy as np

# Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Tokenization
X_tok = tokenizer(df['description'].tolist(), padding='max_length', truncation=True,
                  max_length=128, return_tensors='pt', return_attention_mask=True)

input_ids, attention_mask = X_tok['input_ids'], X_tok['attention_mask']

# Split train+val/test
X_train_val_ids, X_test_ids, y_train_val, y_test, X_train_val_mask, X_test_mask = train_test_split(
    input_ids, y, attention_mask, test_size=0.2, random_state=42
)

# Further split train into train and val (10% val)
X_train_ids, X_val_ids, y_train, y_val, X_train_mask, X_val_mask = train_test_split(
    X_train_val_ids, y_train_val, X_train_val_mask, test_size=0.125, random_state=42
)

# Calculate and clamp pos_weight
positive_counts = np.sum(y_train, axis=0)
total_counts = y_train.shape[0]
negative_counts = total_counts - positive_counts
epsilon = 1e-5
pos_weights_np = negative_counts / (positive_counts + epsilon)
pos_weights_np = np.clip(pos_weights_np, 0.1, 10.0)
pos_weights = torch.tensor(pos_weights_np, dtype=torch.float32).to(device)

print("initial setup completed...")

initial setup completed...


### 2. Baseline Model Training

This cell defines and trains a baseline multi-label classification model using a pre-trained DistilBERT model.

- **Model Definition:**
    - A `BaselineMovieClassifier` class is defined, which includes the DistilBERT model and a linear classifier layer.
    - The model takes tokenized input and produces logits for each genre.
- **Training Setup:**
    - The model, loss function (BCEWithLogitsLoss), and optimizer (Adam) are initialized.
- **Data Preparation:**
    - The movie descriptions are tokenized using the DistilBERT tokenizer.
    - The data is split into training and testing sets.
    - A DataLoader is created for the training data to handle batching and shuffling.
- **Training Loop:**
    - The model is trained for 10 epochs.
    - In each epoch, the model processes batches of data, calculates the loss, and updates its weights.

In [None]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import torch.optim as optim
import numpy as np

transformer = AutoModel.from_pretrained("distilbert-base-uncased").to(device)

# Classifier model
class BaselineMovieClassifier(nn.Module):
    def __init__(self, transformer_model, num_labels, dropout=0.3):
        super(BaselineMovieClassifier, self).__init__()
        self.transformer = transformer_model # transformer parameters are also updated unless explicitly freezed...!
        self.dropout = nn.Dropout(dropout)
        self.classifier = nn.Linear(transformer_model.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask=None):
        outputs = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        embeddings = outputs.last_hidden_state[:, 0, :]  # CLS token
        x = self.dropout(embeddings)
        logits = self.classifier(x)
        return logits

# Prepare data and labels (assumes mlb and df already defined)
num_genres = len(mlb.classes_)
baseline_model = BaselineMovieClassifier(transformer, num_genres).to(device)

criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weights)

# Hyperparams
epochs = 10
batch_size = 32
optimizer = optim.Adam(baseline_model.parameters(), lr=3e-5, weight_decay=0.01)
total_steps = (len(X_train_ids) // batch_size + 1) * epochs
warmup_steps = int(0.1 * total_steps)
scheduler = get_linear_schedule_with_warmup(optimizer, warmup_steps, total_steps)
max_grad_norm = 1.0

# DataLoaders
train_dataset = torch.utils.data.TensorDataset(
    X_train_ids.to(device),
    X_train_mask.to(device),
    torch.tensor(y_train, dtype=torch.float32).to(device)
)
val_dataset = torch.utils.data.TensorDataset(
    X_val_ids.to(device),
    X_val_mask.to(device),
    torch.tensor(y_val, dtype=torch.float32).to(device)
)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

def evaluate(model, loader):
    model.eval()
    losses = []
    preds = []
    targets = []
    with torch.no_grad():
        for batch_input_ids, batch_attention_mask, batch_y_true in loader:
            logits = model(batch_input_ids, attention_mask=batch_attention_mask)
            loss = criterion(logits, batch_y_true)
            losses.append(loss.item())

            y_pred = torch.sigmoid(logits).cpu().numpy()
            preds.append(y_pred)
            targets.append(batch_y_true.cpu().numpy())

    avg_loss = np.mean(losses)
    preds = np.vstack(preds)
    targets = np.vstack(targets)
    # Binarize preds with 0.5 threshold for metric
    preds_binary = (preds > 0.5).astype(int)

    f1 = f1_score(targets, preds_binary, average='micro', zero_division=0)
    return avg_loss, f1

# early-stopping
best_val_f1 = 0.0
patience = 4  # Number of epochs to wait before stopping
epochs_without_improvement = 0
best_model_state = None  # To store best model

# Training loop with validation
for epoch in range(epochs):
    baseline_model.train()
    total_loss = 0
    for batch_input_ids, batch_attention_mask, batch_y_true in train_loader:
        optimizer.zero_grad()

        logits = baseline_model(batch_input_ids, attention_mask=batch_attention_mask)
        loss = criterion(logits, batch_y_true)

        loss.backward()
        torch.nn.utils.clip_grad_norm_(baseline_model.parameters(), max_grad_norm)

        optimizer.step()
        scheduler.step()

        total_loss += loss.item()

    train_loss = total_loss / len(train_loader)
    val_loss, val_f1 = evaluate(baseline_model, val_loader)
    print(f"Epoch {epoch+1}/{epochs} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f} | Val Micro F1: {val_f1:.4f}")

    # --- Early Stopping Logic ---
    if val_f1 > best_val_f1:
        best_val_f1 = val_f1
        epochs_without_improvement = 0
        best_model_state = baseline_model.state_dict()  # Save best model
    else:
        epochs_without_improvement += 1
        if epochs_without_improvement >= patience:
            print(f"\nEarly stopping triggered. Best Val F1: {best_val_f1:.4f}")
            break

if best_model_state:
    baseline_model.load_state_dict(best_model_state)


Epoch 1/10 | Train Loss: 0.7323 | Val Loss: 0.5842 | Val Micro F1: 0.4547
Epoch 2/10 | Train Loss: 0.5539 | Val Loss: 0.5388 | Val Micro F1: 0.4450
Epoch 3/10 | Train Loss: 0.5388 | Val Loss: 0.5357 | Val Micro F1: 0.4583
Epoch 4/10 | Train Loss: 0.5348 | Val Loss: 0.5384 | Val Micro F1: 0.4765
Epoch 5/10 | Train Loss: 0.5312 | Val Loss: 0.5350 | Val Micro F1: 0.4712
Epoch 6/10 | Train Loss: 0.5276 | Val Loss: 0.5402 | Val Micro F1: 0.4559
Epoch 7/10 | Train Loss: 0.5214 | Val Loss: 0.5360 | Val Micro F1: 0.4726
Epoch 8/10 | Train Loss: 0.5162 | Val Loss: 0.5384 | Val Micro F1: 0.4718

Early stopping triggered. Best Val F1: 0.4765


### 3. Baseline Model Evaluation

This cell evaluates the performance of the trained baseline model on the test set.

- **Evaluation Mode:** The model is set to evaluation mode using `baseline_model.eval()`.
- **Prediction:** The model makes predictions on the test data.
- **Classification Report:** A classification report is printed, showing precision, recall, and F1-score for each genre.

In [None]:
from sklearn.metrics import classification_report
from torch.utils.data import DataLoader, TensorDataset

# Create test dataset with attention mask
test_dataset = TensorDataset(
    X_test_ids.to(device),
    X_test_mask.to(device),
    torch.tensor(y_test, dtype=torch.float32).to(device)
)

test_loader = DataLoader(test_dataset, batch_size=32)

# Evaluation
baseline_model.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for batch_input_ids, batch_attention_mask, batch_y_true in test_loader:
        # Pass attention mask to model
        logits = baseline_model(batch_input_ids, attention_mask=batch_attention_mask)
        probs = torch.sigmoid(logits)
        preds = (probs > 0.6).cpu().numpy()

        all_preds.append(preds)
        all_labels.append(batch_y_true.cpu().numpy())

# Concatenate predictions and labels
import numpy as np
y_pred_binary = np.vstack(all_preds)
y_true = np.vstack(all_labels)

# Generate classification report
print(classification_report(y_true, y_pred_binary, target_names=mlb.classes_, zero_division=0))
print("Avg predicted labels per sample:", y_pred_binary.sum(axis=1).mean())


              precision    recall  f1-score   support

      Action       0.57      0.51      0.54      1373
       Adult       0.00      0.00      0.00         0
   Adventure       0.47      0.50      0.48       858
   Animation       0.34      0.52      0.41       276
   Biography       0.26      0.56      0.35       179
      Comedy       0.42      0.26      0.32       895
       Crime       0.48      0.70      0.57       981
       Drama       0.59      0.48      0.53      1894
      Family       0.30      0.42      0.35       334
     Fantasy       0.25      0.51      0.33       423
   Film-Noir       0.00      0.00      0.00        42
   Game-Show       0.00      0.00      0.00         0
     History       0.31      0.48      0.38       208
      Horror       0.49      0.75      0.59       864
       Music       0.20      0.01      0.02        78
     Musical       0.00      0.00      0.00        73
     Mystery       0.22      0.67      0.33       523
        News       0.00    

### 4. Baseline Model Prediction on Evaluation Set

This cell uses the trained baseline model to make predictions on a separate evaluation dataset.

- **Load Evaluation Data:** Loads the evaluation dataset from a CSV file.
- **Preprocess Evaluation Data:** The descriptions from the evaluation data are tokenized.
- **Make Predictions:** The model predicts genres for the evaluation data.
- **Store Predictions:** The predicted genres are added as a new column to the evaluation DataFrame.
- **Classification Report:** A classification report is generated to evaluate the model's performance on this new data.

In [None]:
# Load the evaluation data
eval_df = pd.read_csv('/content/drive/My Drive/movie-genre-prediction/test.csv')

# Preprocess the evaluation data
eval_descriptions = eval_df['description'].tolist()
eval_X = tokenizer(
    text=eval_descriptions,
    add_special_tokens=True,
    max_length=128,
    truncation=True,
    padding='max_length',
    return_tensors='pt',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

# Move data to device
eval_input_ids = eval_X['input_ids']
eval_attention_mask = eval_X['attention_mask']

# Create dataset and loader
eval_dataset = TensorDataset(eval_input_ids, eval_attention_mask)
eval_loader = DataLoader(eval_dataset, batch_size=32)  # use smaller batch_size if needed

# Predict in batches
baseline_model.eval()
all_preds = []

with torch.no_grad():
    for batch_ids, batch_mask in eval_loader:
        batch_ids = batch_ids.to(device)
        batch_mask = batch_mask.to(device)

        logits = baseline_model(batch_ids, attention_mask=batch_mask)
        probs = torch.sigmoid(logits)
        batch_preds = (probs > 0.5).cpu().numpy()
        all_preds.append(batch_preds)

# Final predictions
import numpy as np
predicted_labels_binary = np.vstack(all_preds)

# Convert binary predictions to genre labels
predicted_labels = mlb.inverse_transform(predicted_labels_binary)

# Attach predictions to dataframe
eval_df['predicted_genres_baseline'] = predicted_labels

# Get true labels from CSV
y_true_eval = mlb.transform(eval_df['expanded-genres'].str.split(', '))

# Print classification report
print("Classification Report for baseline model on the evaluation set:")
print(classification_report(y_true_eval, predicted_labels_binary, target_names=mlb.classes_, zero_division=0))

# Optional: View predictions
display(eval_df.head())

Classification Report for baseline model on the evaluation set:
              precision    recall  f1-score   support

      Action       0.51      0.71      0.59      8550
       Adult       0.00      0.00      0.00         2
   Adventure       0.40      0.62      0.48      5112
   Animation       0.28      0.57      0.37      1591
   Biography       0.21      0.67      0.32      1117
      Comedy       0.38      0.48      0.42      5738
       Crime       0.39      0.81      0.53      6069
       Drama       0.53      0.73      0.61     11998
      Family       0.27      0.58      0.36      2102
     Fantasy       0.21      0.63      0.32      2603
   Film-Noir       0.00      0.00      0.00       237
   Game-Show       0.00      0.00      0.00         1
     History       0.29      0.63      0.39      1255
      Horror       0.40      0.83      0.54      5159
       Music       0.26      0.05      0.08       452
     Musical       0.15      0.02      0.03       407
     Mystery     

Unnamed: 0,movie title - year,genre,expanded-genres,rating,description,predicted_genres_baseline
0,Son of the Wolf - nan,Adventure,Adventure,,"Set in 1800'2 Yukon, The Malamute Kid takes on...","(Action, Adventure, Crime)"
1,Firstborn - 2003,Action,"Action, Adventure, Fantasy",6.1,Sorcerers fight against themselves for ultimat...,"(Action, Adventure, Animation, Fantasy, Sci-Fi)"
2,13 Cameras - 2015,Thriller,"Crime, Drama, Horror",5.2,"A newlywed couple, move into a new house acros...","(Comedy, Horror, Mystery, Romance, Thriller)"
3,"Straight Up, Now Tell Me... - nan",Romance,Romance,,When a gay man brings his fiancee home to meet...,"(Comedy, Crime, Drama, Mystery, Romance, Thril..."
4,The Ugly Duckling - 1959,Crime,"Comedy, Crime, Sci-Fi",5.5,"Henry Jeckle was always the outsider, a bungli...","(Drama, Romance)"


### 5. LTN Model Definition

This cell defines the LTN-enhanced movie classifier.

- **Model Definition:**
    - An `LTNMovieClassifier` class is defined, which, like the baseline, uses a DistilBERT model for embeddings.
    - Instead of a single classifier, it uses a dictionary of `ltn.Predicate` modules, one for each genre. Each predicate is a small neural network that learns a truth value for a movie belonging to a genre.
- **Model Instantiation:** The LTN model is instantiated.

In [None]:
print(dir(ltn.fuzzy_ops))

['AggregMean', 'AggregMin', 'AggregPMean', 'AggregPMeanError', 'AggregationOperator', 'AndLuk', 'AndMin', 'AndProd', 'BinaryConnectiveOperator', 'ConnectiveOperator', 'Equiv', 'ImpliesGodel', 'ImpliesGoguen', 'ImpliesKleeneDienes', 'ImpliesLuk', 'ImpliesReichenbach', 'LTNObject', 'NotGodel', 'NotStandard', 'OrLuk', 'OrMax', 'OrProbSum', 'SatAgg', 'UnaryConnectiveOperator', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'check_mask', 'check_values', 'eps', 'pi_0', 'pi_1', 'torch']


In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import ltn  # ltntorch for fuzzy logic
from ltn.fuzzy_ops import Equiv, AndLuk, ImpliesLuk, AggregPMean

# Label setup
ALL_LABELS = list(mlb.classes_)
NUM_LABELS = len(ALL_LABELS)
LABEL_TO_IDX = {label: i for i, label in enumerate(ALL_LABELS)}

# Fuzzy logic operators
and_op = AndLuk()
imp_op = ImpliesLuk()
equiv_op = Equiv(and_op=and_op, implies_op=imp_op)
aggregator = AggregPMean(p=2)

# Build implication rules
implication_pairs = []
for _, row in high_confidence_rules.iterrows():
    for a in list(row['antecedents']):
        for c in list(row['consequents']):
            if a in LABEL_TO_IDX and c in LABEL_TO_IDX:
                implication_pairs.append((LABEL_TO_IDX[a], LABEL_TO_IDX[c]))
implication_pairs = list(set(implication_pairs))
print(f"Loaded {len(implication_pairs)} implication rules from assoc rules.")

# LTN model definition
class LTNMultiLabelClassifier(nn.Module):
    def __init__(self, transformer_model, num_labels, implication_pairs, pos_weights=None):
        super().__init__()
        self.transformer = transformer_model # transformer parameters are also updated unless explicitly freezed...!
        self.dropout = nn.Dropout(0.3)
        self.fc = nn.Linear(transformer_model.config.hidden_size, num_labels)
        self.implication_pairs = implication_pairs
        self.pos_weights = pos_weights
        self.loss_fn = nn.BCEWithLogitsLoss(pos_weight=self.pos_weights)

    def forward(self, input_ids, attention_mask):
        outputs = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        embeddings = self.dropout(outputs.last_hidden_state[:, 0, :])
        logits = self.fc(embeddings)
        return logits  # raw logits, no sigmoid

    def compute_loss(self, logits, true_labels):
        pred_probs = torch.sigmoid(logits)
        bce_loss = self.loss_fn(logits, true_labels)

        equiv_values = equiv_op(pred_probs, true_labels)
        sat_gt = aggregator(aggregator(equiv_values))

        axiom_values = [imp_op(pred_probs[:, a], pred_probs[:, c]) for a, c in self.implication_pairs]
        if axiom_values:
            sat_axiom = aggregator(aggregator(torch.stack(axiom_values, dim=1)))
        else:
            sat_axiom = torch.tensor(1.0, device=logits.device)

        logic_loss = 1 - and_op(sat_gt, sat_axiom)
        total_loss = 0.95 * bce_loss + 0.05 * logic_loss
        return total_loss, sat_gt.item(), sat_axiom.item()

transformer = AutoModel.from_pretrained("distilbert-base-uncased").to(device)

# DataLoaders
train_dataset = torch.utils.data.TensorDataset(
    X_train_ids.to(device), X_train_mask.to(device), torch.tensor(y_train, dtype=torch.float32).to(device))
val_dataset = torch.utils.data.TensorDataset(
    X_val_ids.to(device), X_val_mask.to(device), torch.tensor(y_val, dtype=torch.float32).to(device))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=32)

model = LTNMultiLabelClassifier(transformer, NUM_LABELS, implication_pairs, pos_weights=pos_weights).to(device)

# Optimizer
optimizer = optim.Adam(model.parameters(), lr=3e-5, weight_decay=0.01)
total_steps = len(train_loader) * 10
scheduler = get_linear_schedule_with_warmup(optimizer, int(0.1 * total_steps), total_steps)

# Training with early stopping
best_val_f1 = 0.0
patience = 4
patience_counter = 0
best_model_state = None

for epoch in range(10):
    model.train()
    total_loss = 0
    for batch_ids, batch_mask, batch_labels in train_loader:
        optimizer.zero_grad()
        logits = model(batch_ids, batch_mask)
        loss, sat_gt, sat_axiom = model.compute_loss(logits, batch_labels)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()
        total_loss += loss.item()

    avg_train_loss = total_loss / len(train_loader)

    # Validation
    model.eval()
    val_preds, val_labels = [], []
    val_loss = 0
    with torch.no_grad():
        for batch_ids, batch_mask, batch_labels in val_loader:
            logits = model(batch_ids, batch_mask)
            loss, _, _ = model.compute_loss(logits, batch_labels)
            val_loss += loss.item()
            probs = torch.sigmoid(logits)  # apply sigmoid at eval time
            val_preds.append(probs.cpu().numpy())
            val_labels.append(batch_labels.cpu().numpy())

    val_loss /= len(val_loader)
    y_pred = np.vstack(val_preds)
    y_true = np.vstack(val_labels)
    y_pred_binary = (y_pred > 0.5).astype(int)
    val_f1_micro = f1_score(y_true, y_pred_binary, average='micro', zero_division=0)

    print(f"Epoch {epoch+1}/10 - Train Loss: {avg_train_loss:.4f} | Val Loss: {val_loss:.4f} | Val Micro F1: {val_f1_micro:.4f} | GT Sat: {sat_gt:.4f} | Axiom Sat: {sat_axiom:.4f}")

    if val_f1_micro > best_val_f1:
        best_val_f1 = val_f1_micro
        best_model_state = model.state_dict()
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print(f"Early stopping at epoch {epoch+1}")
            break

if best_model_state:
    model.load_state_dict(best_model_state)


Loaded 30 implication rules from assoc rules.


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Epoch 1/10 - Train Loss: 0.7118 | Val Loss: 0.5607 | Val Micro F1: 0.4340 | GT Sat: 0.7707 | Axiom Sat: 0.9376
Epoch 2/10 - Train Loss: 0.5395 | Val Loss: 0.5302 | Val Micro F1: 0.4813 | GT Sat: 0.8029 | Axiom Sat: 0.9405
Epoch 3/10 - Train Loss: 0.5234 | Val Loss: 0.5277 | Val Micro F1: 0.4502 | GT Sat: 0.7861 | Axiom Sat: 0.9324
Epoch 4/10 - Train Loss: 0.5228 | Val Loss: 0.5280 | Val Micro F1: 0.4797 | GT Sat: 0.8231 | Axiom Sat: 0.9550
Epoch 5/10 - Train Loss: 0.5216 | Val Loss: 0.5436 | Val Micro F1: 0.4706 | GT Sat: 0.8208 | Axiom Sat: 0.9472
Epoch 6/10 - Train Loss: 0.5171 | Val Loss: 0.5390 | Val Micro F1: 0.4397 | GT Sat: 0.7864 | Axiom Sat: 0.9477
Early stopping at epoch 6


### 6. LTN Model Evaluation

This cell evaluates the performance of the trained LTN model on the test set.

- **Evaluation Mode:** The model is set to evaluation mode using `model.eval()`.
- **Prediction:** The model makes predictions on the test data.
- **Classification Report:** A classification report is printed, showing precision, recall, and F1-score for each genre.

In [8]:
from sklearn.metrics import classification_report
from torch.utils.data import DataLoader, TensorDataset
from ltn.fuzzy_ops import ImpliesLuk, AggregPMean
import torch
import numpy as np

# Prepare axiom operators
imp_op = ImpliesLuk()
aggregator = AggregPMean(p=2)

# Test DataLoader with attention_mask
test_dataset = TensorDataset(
    X_test_ids.to(device),
    X_test_mask.to(device),
    torch.tensor(y_test, dtype=torch.float32).to(device)
)
test_loader = DataLoader(test_dataset, batch_size=32)

# Switch to eval mode
model.eval()
all_preds = []
all_labels = []
all_axioms = []

with torch.no_grad():
    for batch_input_ids, batch_attention_mask, batch_y_true in test_loader:
        # Forward pass
        logits = model(input_ids=batch_input_ids, attention_mask=batch_attention_mask)
        probs = torch.sigmoid(logits)

        # Binary predictions
        preds = (probs > 0.5).cpu().numpy()
        all_preds.append(preds)
        all_labels.append(batch_y_true.cpu().numpy())

        # Axiom satisfaction
        if hasattr(model, "implication_pairs"):
            axiom_vals = []
            for a_idx, c_idx in model.implication_pairs:
                premise = probs[:, a_idx]
                conclusion = probs[:, c_idx]
                val = imp_op(premise, conclusion)
                axiom_vals.append(val)
            if axiom_vals:
                stacked_axioms = torch.stack(axiom_vals, dim=1)
                sat_per_example = aggregator(stacked_axioms)
                all_axioms.append(sat_per_example.cpu().numpy())

# Concatenate results
y_pred_binary = np.vstack(all_preds)
y_true = np.vstack(all_labels)

# Classification report
print("\nMulti-label classification report:")
print(classification_report(y_true, y_pred_binary, target_names=mlb.classes_, zero_division=0))

# Axiom satisfaction report
if all_axioms:
    axiom_scores = np.stack(all_axioms)
    print(f"\nAverage axiom satisfaction on test set: {axiom_scores.mean():.4f}")
    print(f"Min: {axiom_scores.min():.4f}, Max: {axiom_scores.max():.4f}")
else:
    print("\nNo implication rules found in model for axiom satisfaction.")

print("\nAvg predicted labels per sample:", y_pred_binary.sum(axis=1).mean())



Multi-label classification report:
              precision    recall  f1-score   support

      Action       0.46      0.77      0.58      1373
       Adult       0.00      0.00      0.00         0
   Adventure       0.31      0.81      0.45       858
   Animation       0.26      0.72      0.38       276
   Biography       0.33      0.37      0.35       179
      Comedy       0.37      0.37      0.37       895
       Crime       0.44      0.75      0.56       981
       Drama       0.58      0.50      0.53      1894
      Family       0.26      0.50      0.34       334
     Fantasy       0.20      0.74      0.31       423
   Film-Noir       0.00      0.00      0.00        42
   Game-Show       0.00      0.00      0.00         0
     History       0.33      0.47      0.39       208
      Horror       0.33      0.92      0.49       864
       Music       0.25      0.01      0.02        78
     Musical       0.00      0.00      0.00        73
     Mystery       0.19      0.82      0.30  

### 7. Model Performance Comparison

LTN encouraged the model to satisfy logical constraints, which in multi-label classification often boosts recall at the cost of precision.

For example: If the rules say “Sci-Fi often co-occurs with Thriller”, the model will predict Thriller more often, even when unsure — hence more recall, less precision.

Next step is threshold calibration — because with LTN, the “predict more” approach is overshooting.