### 1. Data Loading and Preprocessing

This cell handles the initial setup, including mounting Google Drive, loading the dataset, and performing essential preprocessing steps.

- **Drive Mount:** Mounts the Google Drive to access the dataset file.
- **Data Loading:** Loads the movie data from a CSV file into a pandas DataFrame.
- **Association Rule Mining:**
    - The `Output` column, containing comma-separated genres, is split into a list of genres for each movie.
    - `TransactionEncoder` converts this list into a one-hot encoded format suitable for association rule mining.
    - `fpgrowth` is used to find frequent itemsets of genres.
    - `association_rules` generates rules based on these itemsets, which are then filtered for high confidence and support.
- **Multi-Label Classification Preprocessing:**
    - The `description` for each movie is extracted from the `Input` column.
    - The `Output` column is converted into a list of genre labels.
    - `MultiLabelBinarizer` transforms these genre lists into a binary matrix format, which is the standard for multi-label classification tasks.

In [1]:
!pip install mlxtend
!pip install ltntorch

Collecting ltntorch
  Downloading LTNtorch-1.0.2-py3-none-any.whl.metadata (13 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch->ltntorch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch->ltntorch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch->ltntorch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3

In [2]:
from google.colab import drive
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import fpgrowth, association_rules
from sklearn.preprocessing import MultiLabelBinarizer

drive.mount('/content/drive', force_remount=True)

# Load the data
df = pd.read_csv('/content/drive/My Drive/movie-genre-prediction/train.csv')

# Use only 25% of the dataset
df = df.sample(frac=0.10, random_state=42).reset_index(drop=True)

# Association rule mining
transactions = df['expanded-genres'].str.split(', ').tolist()
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = fpgrowth(df_encoded, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
high_confidence_rules = rules[(rules['confidence'] > 0.25) & (rules['support'] > 0.001)]

# Data preprocessing for multi-label classification
#df['description'] = df['Input'].apply(lambda x: x.split('\n\n', 1)[1] if '\n\n' in x else '')
df['Output-Label'] = df['expanded-genres'].str.split(', ')
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(df['Output-Label'])

# Display results
display(df.head())
display(high_confidence_rules)
print("Descriptions:")
display(df['description'].head())
print("\nBinary Labels (y):")
display(y[:5])

Mounted at /content/drive


Unnamed: 0,movie title - year,genre,expanded-genres,rating,description,Output-Label
0,Mei shan shou qi guai - 1973,Fantasy,"Action, Adventure, Fantasy",5.4,Na Cha is sent to the land of the dead to figh...,"[Action, Adventure, Fantasy]"
1,Money Fight - 2012,Action,"Action, Drama",3.9,"This full-contact action drama, loaded with au...","[Action, Drama]"
2,Dui Prithibi - 2010,Romance,"Drama, Romance",6.4,"Rahul, the son of a very rich man who has lost...","[Drama, Romance]"
3,The Barbarians - 1987,Fantasy,"Action, Adventure, Fantasy",4.9,Two twin barbarians seek revenge from the warl...,"[Action, Adventure, Fantasy]"
4,Bridge of Birds - nan,Fantasy,"Action, Adventure, Fantasy",,When a farm boy's village is cursed by a myste...,"[Action, Adventure, Fantasy]"


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Adventure),(Action),0.174641,0.28662,0.084278,0.482576,1.683682,1.0,0.034222,1.378716,0.491984,0.223558,0.274687,0.388308
1,(Action),(Adventure),0.28662,0.174641,0.084278,0.29404,1.683682,1.0,0.034222,1.16913,0.56921,0.223558,0.144663,0.388308
4,"(Adventure, Comedy)",(Action),0.040418,0.28662,0.013179,0.326064,1.13762,1.0,0.001594,1.058529,0.126068,0.04199,0.055293,0.186022
5,"(Action, Comedy)",(Adventure),0.044237,0.174641,0.013179,0.297913,1.705856,1.0,0.005453,1.175579,0.432936,0.064069,0.149355,0.186688
8,"(Drama, Adventure)",(Action),0.041803,0.28662,0.014438,0.345382,1.205017,1.0,0.002456,1.089765,0.177559,0.045983,0.082371,0.197877
10,(Fantasy),(Adventure),0.085621,0.174641,0.023,0.268627,1.538168,1.0,0.008047,1.128507,0.382638,0.09694,0.113873,0.200163
17,(Romance),(Drama),0.160035,0.395786,0.091245,0.570155,1.440563,1.0,0.027905,1.405654,0.364095,0.196404,0.288588,0.400348
18,(Comedy),(Romance),0.191975,0.160035,0.049148,0.256012,1.599724,1.0,0.018425,1.129003,0.463961,0.162278,0.114263,0.28156
19,(Romance),(Comedy),0.160035,0.191975,0.049148,0.307107,1.599724,1.0,0.018425,1.166162,0.446319,0.162278,0.142486,0.28156
20,"(Drama, Comedy)",(Romance),0.050617,0.160035,0.016201,0.320066,1.999974,1.0,0.0081,1.235363,0.526651,0.083315,0.190521,0.210649


Descriptions:


Unnamed: 0,description
0,Na Cha is sent to the land of the dead to figh...
1,"This full-contact action drama, loaded with au..."
2,"Rahul, the son of a very rich man who has lost..."
3,Two twin barbarians seek revenge from the warl...
4,When a farm boy's village is cursed by a myste...



Binary Labels (y):


array([[1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
        0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0]])

### 2. Baseline Model Training

This cell defines and trains a baseline multi-label classification model using a pre-trained DistilBERT model.

- **Device Configuration:** Sets the device to "cuda" if a GPU is available, otherwise "cpu".
- **Tokenizer and Model Loading:** Loads the "distilbert-base-uncased" tokenizer and model from the Hugging Face library.
- **Model Definition:**
    - A `BaselineMovieClassifier` class is defined, which includes the DistilBERT model and a linear classifier layer.
    - The model takes tokenized input and produces logits for each genre.
- **Training Setup:**
    - The model, loss function (BCEWithLogitsLoss), and optimizer (Adam) are initialized.
- **Data Preparation:**
    - The movie descriptions are tokenized using the DistilBERT tokenizer.
    - The data is split into training and testing sets.
    - A DataLoader is created for the training data to handle batching and shuffling.
- **Training Loop:**
    - The model is trained for 10 epochs.
    - In each epoch, the model processes batches of data, calculates the loss, and updates its weights.

In [4]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import torch.optim as optim
import numpy as np

# Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
transformer_model = AutoModel.from_pretrained("distilbert-base-uncased").to(device)

# Classifier model
class BaselineMovieClassifier(nn.Module):
    def __init__(self, transformer_model, num_labels, dropout=0.3):
        super(BaselineMovieClassifier, self).__init__()
        self.transformer = transformer_model
        self.dropout = nn.Dropout(dropout)
        self.classifier = nn.Linear(transformer_model.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask=None):
        outputs = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        embeddings = outputs.last_hidden_state[:, 0, :]  # CLS token
        x = self.dropout(embeddings)
        logits = self.classifier(x)
        return logits

# Prepare data and labels (assumes mlb and df already defined)
num_genres = len(mlb.classes_)
baseline_model = BaselineMovieClassifier(transformer_model, num_genres).to(device)

X = tokenizer(
    text=df['description'].tolist(),
    add_special_tokens=True,
    max_length=128,
    truncation=True,
    padding='max_length',
    return_tensors='pt',
    return_token_type_ids=False,
    return_attention_mask=True,
    verbose=True
)
input_ids = X['input_ids']
attention_mask = X['attention_mask']

# Split train+val/test
X_train_val_ids, X_test_ids, y_train_val, y_test, X_train_val_mask, X_test_mask = train_test_split(
    input_ids, y, attention_mask, test_size=0.2, random_state=42
)

# Further split train into train and val (10% val)
X_train_ids, X_val_ids, y_train, y_val, X_train_mask, X_val_mask = train_test_split(
    X_train_val_ids, y_train_val, X_train_val_mask, test_size=0.125, random_state=42
)

# Calculate pos_weight on training labels
positive_counts = np.sum(y_train, axis=0)
total_counts = y_train.shape[0]
negative_counts = total_counts - positive_counts
epsilon = 1e-5
pos_weights = torch.tensor(negative_counts / (positive_counts + epsilon), dtype=torch.float32).to(device)

criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weights)

# Hyperparams
epochs = 10
batch_size = 32
optimizer = optim.Adam(baseline_model.parameters(), lr=3e-5, weight_decay=0.01)
total_steps = (len(X_train_ids) // batch_size + 1) * epochs
warmup_steps = int(0.1 * total_steps)
scheduler = get_linear_schedule_with_warmup(optimizer, warmup_steps, total_steps)
max_grad_norm = 1.0

# DataLoaders
train_dataset = torch.utils.data.TensorDataset(
    X_train_ids.to(device),
    X_train_mask.to(device),
    torch.tensor(y_train, dtype=torch.float32).to(device)
)
val_dataset = torch.utils.data.TensorDataset(
    X_val_ids.to(device),
    X_val_mask.to(device),
    torch.tensor(y_val, dtype=torch.float32).to(device)
)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

def evaluate(model, loader):
    model.eval()
    losses = []
    preds = []
    targets = []
    with torch.no_grad():
        for batch_input_ids, batch_attention_mask, batch_y_true in loader:
            logits = model(batch_input_ids, attention_mask=batch_attention_mask)
            loss = criterion(logits, batch_y_true)
            losses.append(loss.item())

            y_pred = torch.sigmoid(logits).cpu().numpy()
            preds.append(y_pred)
            targets.append(batch_y_true.cpu().numpy())

    avg_loss = np.mean(losses)
    preds = np.vstack(preds)
    targets = np.vstack(targets)
    # Binarize preds with 0.5 threshold for metric
    preds_binary = (preds > 0.5).astype(int)

    f1 = f1_score(targets, preds_binary, average='micro', zero_division=0)
    return avg_loss, f1

# early-stopping
best_val_f1 = 0.0
patience = 3  # Number of epochs to wait before stopping
epochs_without_improvement = 0
best_model_state = None  # To store best model

# Training loop with validation
for epoch in range(epochs):
    baseline_model.train()
    total_loss = 0
    for batch_input_ids, batch_attention_mask, batch_y_true in train_loader:
        optimizer.zero_grad()

        logits = baseline_model(batch_input_ids, attention_mask=batch_attention_mask)
        loss = criterion(logits, batch_y_true)

        loss.backward()
        torch.nn.utils.clip_grad_norm_(baseline_model.parameters(), max_grad_norm)

        optimizer.step()
        scheduler.step()

        total_loss += loss.item()

    train_loss = total_loss / len(train_loader)
    val_loss, val_f1 = evaluate(baseline_model, val_loader)
    print(f"Epoch {epoch+1}/{epochs} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f} | Val Micro F1: {val_f1:.4f}")

    # --- Early Stopping Logic ---
    if val_f1 > best_val_f1:
        best_val_f1 = val_f1
        epochs_without_improvement = 0
        best_model_state = baseline_model.state_dict()  # Save best model
    else:
        epochs_without_improvement += 1
        if epochs_without_improvement >= patience:
            print(f"\nEarly stopping triggered. Best Val F1: {best_val_f1:.4f}")
            break

if best_model_state:
    baseline_model.load_state_dict(best_model_state)


Using device: cuda
Epoch 1/10 | Train Loss: 1.2341 | Val Loss: 0.8683 | Val Micro F1: 0.3810
Epoch 2/10 | Train Loss: 1.1064 | Val Loss: 0.7957 | Val Micro F1: 0.3924
Epoch 3/10 | Train Loss: 1.0793 | Val Loss: 0.8321 | Val Micro F1: 0.3986
Epoch 4/10 | Train Loss: 1.0711 | Val Loss: 0.7957 | Val Micro F1: 0.3997
Epoch 5/10 | Train Loss: 1.0920 | Val Loss: 0.8161 | Val Micro F1: 0.4024
Epoch 6/10 | Train Loss: 1.0346 | Val Loss: 0.8071 | Val Micro F1: 0.3961
Epoch 7/10 | Train Loss: 0.9823 | Val Loss: 0.8181 | Val Micro F1: 0.3882
Epoch 8/10 | Train Loss: 0.9515 | Val Loss: 0.8388 | Val Micro F1: 0.4240
Epoch 9/10 | Train Loss: 0.9759 | Val Loss: 0.8206 | Val Micro F1: 0.4163
Epoch 10/10 | Train Loss: 0.9171 | Val Loss: 0.8242 | Val Micro F1: 0.4225


### 3. Baseline Model Evaluation

This cell evaluates the performance of the trained baseline model on the test set.

- **Evaluation Mode:** The model is set to evaluation mode using `baseline_model.eval()`.
- **Prediction:** The model makes predictions on the test data.
- **Classification Report:** A classification report is printed, showing precision, recall, and F1-score for each genre.

In [6]:
from sklearn.metrics import classification_report
from torch.utils.data import DataLoader, TensorDataset

# Create test dataset with attention mask
test_dataset = TensorDataset(
    X_test_ids.to(device),
    X_test_mask.to(device),
    torch.tensor(y_test, dtype=torch.float32).to(device)
)

test_loader = DataLoader(test_dataset, batch_size=32)

# Evaluation
baseline_model.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for batch_input_ids, batch_attention_mask, batch_y_true in test_loader:
        # Pass attention mask to model
        logits = baseline_model(batch_input_ids, attention_mask=batch_attention_mask)
        probs = torch.sigmoid(logits)
        preds = (probs > 0.6).cpu().numpy()

        all_preds.append(preds)
        all_labels.append(batch_y_true.cpu().numpy())

# Concatenate predictions and labels
import numpy as np
y_pred_binary = np.vstack(all_preds)
y_true = np.vstack(all_labels)

# Generate classification report
print(classification_report(y_true, y_pred_binary, target_names=mlb.classes_, zero_division=0))
print("Avg predicted labels per sample:", y_pred_binary.sum(axis=1).mean())


              precision    recall  f1-score   support

      Action       0.55      0.56      0.56      1368
   Adventure       0.42      0.55      0.47       837
   Animation       0.22      0.68      0.34       262
   Biography       0.22      0.69      0.33       170
      Comedy       0.40      0.33      0.36       934
       Crime       0.50      0.64      0.56      1017
       Drama       0.60      0.33      0.42      1869
      Family       0.24      0.57      0.33       340
     Fantasy       0.23      0.57      0.32       437
   Film-Noir       0.05      0.56      0.09        32
     History       0.26      0.71      0.38       194
      Horror       0.45      0.72      0.56       877
       Music       0.11      0.47      0.18        88
     Musical       0.05      0.44      0.10        48
     Mystery       0.24      0.60      0.35       524
  Reality-TV       0.00      0.00      0.00         0
     Romance       0.44      0.54      0.48       756
      Sci-Fi       0.24    

### 4. Baseline Model Prediction on Evaluation Set

This cell uses the trained baseline model to make predictions on a separate evaluation dataset.

- **Load Evaluation Data:** Loads the evaluation dataset from a CSV file.
- **Preprocess Evaluation Data:** The descriptions from the evaluation data are tokenized.
- **Make Predictions:** The model predicts genres for the evaluation data.
- **Store Predictions:** The predicted genres are added as a new column to the evaluation DataFrame.
- **Classification Report:** A classification report is generated to evaluate the model's performance on this new data.

In [9]:
# Load the evaluation data
eval_df = pd.read_csv('/content/drive/My Drive/movie-genre-prediction/test.csv')

# Preprocess the evaluation data
eval_descriptions = eval_df['description'].tolist()
eval_X = tokenizer(
    text=eval_descriptions,
    add_special_tokens=True,
    max_length=128,
    truncation=True,
    padding='max_length',
    return_tensors='pt',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

# Move data to device
eval_input_ids = eval_X['input_ids']
eval_attention_mask = eval_X['attention_mask']

# Create dataset and loader
eval_dataset = TensorDataset(eval_input_ids, eval_attention_mask)
eval_loader = DataLoader(eval_dataset, batch_size=32)  # use smaller batch_size if needed

# Predict in batches
baseline_model.eval()
all_preds = []

with torch.no_grad():
    for batch_ids, batch_mask in eval_loader:
        batch_ids = batch_ids.to(device)
        batch_mask = batch_mask.to(device)

        logits = baseline_model(batch_ids, attention_mask=batch_mask)
        probs = torch.sigmoid(logits)
        batch_preds = (probs > 0.5).cpu().numpy()
        all_preds.append(batch_preds)

# Final predictions
import numpy as np
predicted_labels_binary = np.vstack(all_preds)

# Convert binary predictions to genre labels
predicted_labels = mlb.inverse_transform(predicted_labels_binary)

# Attach predictions to dataframe
eval_df['predicted_genres_baseline'] = predicted_labels

# Get true labels from CSV
y_true_eval = mlb.transform(eval_df['expanded-genres'].str.split(', '))

# Print classification report
print("Classification Report for baseline model on the evaluation set:")
print(classification_report(y_true_eval, predicted_labels_binary, target_names=mlb.classes_, zero_division=0))

# Optional: View predictions
display(eval_df.head())

Classification Report for baseline model on the evaluation set:
              precision    recall  f1-score   support

      Action       0.46      0.78      0.58      8550
   Adventure       0.33      0.71      0.45      5112
   Animation       0.18      0.78      0.29      1591
   Biography       0.17      0.72      0.27      1117
      Comedy       0.32      0.57      0.41      5738
       Crime       0.40      0.77      0.53      6069
       Drama       0.55      0.59      0.57     11998
      Family       0.19      0.73      0.30      2102
     Fantasy       0.18      0.73      0.29      2603
   Film-Noir       0.05      0.67      0.09       237
     History       0.22      0.75      0.33      1255
      Horror       0.35      0.83      0.50      5159
       Music       0.07      0.56      0.12       452
     Musical       0.06      0.57      0.11       407
     Mystery       0.19      0.74      0.31      3233
  Reality-TV       0.00      0.00      0.00         4
     Romance     



Unnamed: 0,movie title - year,genre,expanded-genres,rating,description,predicted_genres_baseline
0,Son of the Wolf - nan,Adventure,Adventure,,"Set in 1800'2 Yukon, The Malamute Kid takes on...","(Action, Adventure, Animation, Biography, Hist..."
1,Firstborn - 2003,Action,"Action, Adventure, Fantasy",6.1,Sorcerers fight against themselves for ultimat...,"(Action, Adventure, Animation, Family, Fantasy..."
2,13 Cameras - 2015,Thriller,"Crime, Drama, Horror",5.2,"A newlywed couple, move into a new house acros...","(Comedy, Crime, Horror, Mystery, Romance, Thri..."
3,"Straight Up, Now Tell Me... - nan",Romance,Romance,,When a gay man brings his fiancee home to meet...,"(Comedy, Drama, Family, Musical, Romance)"
4,The Ugly Duckling - 1959,Crime,"Comedy, Crime, Sci-Fi",5.5,"Henry Jeckle was always the outsider, a bungli...","(Biography, Comedy, Drama, Family, Fantasy, Ro..."


### 6. LTN Model Definition

This cell defines the LTN-enhanced movie classifier.

- **Model Definition:**
    - An `LTNMovieClassifier` class is defined, which, like the baseline, uses a DistilBERT model for embeddings.
    - Instead of a single classifier, it uses a dictionary of `ltn.Predicate` modules, one for each genre. Each predicate is a small neural network that learns a truth value for a movie belonging to a genre.
- **Model Instantiation:** The LTN model is instantiated.

In [None]:
print(dir(ltn.fuzzy_ops))

['AggregMean', 'AggregMin', 'AggregPMean', 'AggregPMeanError', 'AggregationOperator', 'AndLuk', 'AndMin', 'AndProd', 'BinaryConnectiveOperator', 'ConnectiveOperator', 'Equiv', 'ImpliesGodel', 'ImpliesGoguen', 'ImpliesKleeneDienes', 'ImpliesLuk', 'ImpliesReichenbach', 'LTNObject', 'NotGodel', 'NotStandard', 'OrLuk', 'OrMax', 'OrProbSum', 'SatAgg', 'UnaryConnectiveOperator', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'check_mask', 'check_values', 'eps', 'pi_0', 'pi_1', 'torch']


In [18]:
import torch
import torch.nn as nn
import torch.optim as optim
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import ltn  # ltntorch for fuzzy logic
from ltn.fuzzy_ops import Equiv, AndLuk, ImpliesLuk, AggregPMean

# Label setup
ALL_LABELS = list(mlb.classes_)
NUM_LABELS = len(ALL_LABELS)
LABEL_TO_IDX = {label: i for i, label in enumerate(ALL_LABELS)}

# Fuzzy logic operators
and_op = AndLuk()
imp_op = ImpliesLuk()
equiv_op = Equiv(and_op=and_op, implies_op=imp_op)
aggregator = AggregPMean(p=2)

# Build implication rules
implication_pairs = []
for _, row in high_confidence_rules.iterrows():
    for a in list(row['antecedents']):
        for c in list(row['consequents']):
            if a in LABEL_TO_IDX and c in LABEL_TO_IDX:
                implication_pairs.append((LABEL_TO_IDX[a], LABEL_TO_IDX[c]))
implication_pairs = list(set(implication_pairs))
print(f"Loaded {len(implication_pairs)} implication rules from assoc rules.")

# Class imbalance weights
pos_counts = y_train.sum(axis=0)
neg_counts = y_train.shape[0] - pos_counts
epsilon = 1e-5
pos_weights = torch.tensor(neg_counts / (pos_counts + epsilon), dtype=torch.float32).to("cuda")

# LTN model definition
class LTNMultiLabelClassifier(nn.Module):
    def __init__(self, transformer_model, num_labels, implication_pairs, pos_weights=None):
        super().__init__()
        self.transformer = transformer_model
        self.dropout = nn.Dropout(0.3)
        self.fc = nn.Linear(transformer_model.config.hidden_size, num_labels)
        self.sigmoid = nn.Sigmoid()
        self.implication_pairs = implication_pairs
        self.pos_weights = pos_weights

    def forward(self, input_ids, attention_mask):
        outputs = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        embeddings = self.dropout(outputs.last_hidden_state[:, 0, :])
        return self.sigmoid(self.fc(embeddings))

    def compute_loss(self, pred_truth, true_labels):
        eps = 1e-6
        pred_clamped = pred_truth.clamp(min=eps, max=1 - eps)

        if self.pos_weights is not None:
            weights = self.pos_weights.unsqueeze(0).expand_as(true_labels)
            bce_loss = -(weights * true_labels * torch.log(pred_clamped) +
                         (1 - true_labels) * torch.log(1 - pred_clamped)).mean()
        else:
            bce_loss = -(true_labels * torch.log(pred_clamped) + (1 - true_labels) * torch.log(1 - pred_clamped)).mean()

        equiv_values = equiv_op(pred_truth, true_labels)
        sat_gt = aggregator(aggregator(equiv_values))

        axiom_values = [imp_op(pred_truth[:, a], pred_truth[:, c]) for a, c in self.implication_pairs]
        if axiom_values:
            sat_axiom = aggregator(aggregator(torch.stack(axiom_values, dim=1)))
        else:
            sat_axiom = torch.tensor(1.0, device=pred_truth.device)

        logic_loss = 1 - and_op(sat_gt, sat_axiom)
        return 0.95 * bce_loss + 0.05 * logic_loss, sat_gt.item(), sat_axiom.item()

# Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
transformer = AutoModel.from_pretrained("distilbert-base-uncased").to(device)

model = LTNMultiLabelClassifier(transformer, NUM_LABELS, implication_pairs, pos_weights=pos_weights).to(device)

# Tokenization
X_tok = tokenizer(df['description'].tolist(), padding='max_length', truncation=True,
                  max_length=128, return_tensors='pt', return_attention_mask=True)
input_ids, attention_mask = X_tok['input_ids'], X_tok['attention_mask']

# Train/val split
X_train_val_ids, X_test_ids, y_train_val, y_test, X_train_val_mask, X_test_mask = train_test_split(
    input_ids, y, attention_mask, test_size=0.2, random_state=42)
X_train_ids, X_val_ids, y_train, y_val, X_train_mask, X_val_mask = train_test_split(
    X_train_val_ids, y_train_val, X_train_val_mask, test_size=0.125, random_state=42)

# DataLoaders
train_dataset = torch.utils.data.TensorDataset(
    X_train_ids.to(device), X_train_mask.to(device), torch.tensor(y_train, dtype=torch.float32).to(device))
val_dataset = torch.utils.data.TensorDataset(
    X_val_ids.to(device), X_val_mask.to(device), torch.tensor(y_val, dtype=torch.float32).to(device))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=32)

# Optimizer
optimizer = optim.Adam(model.parameters(), lr=3e-5, weight_decay=0.01)
total_steps = len(train_loader) * 10
scheduler = get_linear_schedule_with_warmup(optimizer, int(0.1 * total_steps), total_steps)

# Training with early stopping
best_val_f1 = 0.0
patience = 3
patience_counter = 0
best_model_state = None

for epoch in range(10):
    model.train()
    total_loss = 0
    for batch_ids, batch_mask, batch_labels in train_loader:
        optimizer.zero_grad()
        preds = model(batch_ids, batch_mask)
        loss, sat_gt, sat_axiom = model.compute_loss(preds, batch_labels)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()
        total_loss += loss.item()

    avg_train_loss = total_loss / len(train_loader)

    # Validation
    model.eval()
    val_preds, val_labels = [], []
    val_loss = 0
    with torch.no_grad():
        for batch_ids, batch_mask, batch_labels in val_loader:
            preds = model(batch_ids, batch_mask)
            loss, _, _ = model.compute_loss(preds, batch_labels)
            val_loss += loss.item()
            val_preds.append(preds.cpu().numpy())
            val_labels.append(batch_labels.cpu().numpy())

    val_loss /= len(val_loader)
    y_pred = np.vstack(val_preds)
    y_true = np.vstack(val_labels)
    y_pred_binary = (y_pred > 0.5).astype(int)
    val_f1_micro = f1_score(y_true, y_pred_binary, average='micro', zero_division=0)

    print(f"Epoch {epoch+1}/10 - Train Loss: {avg_train_loss:.4f} | Val Loss: {val_loss:.4f} | Val Micro F1: {val_f1_micro:.4f} | GT Sat: {sat_gt:.4f} | Axiom Sat: {sat_axiom:.4f}")

    if val_f1_micro > best_val_f1:
        best_val_f1 = val_f1_micro
        best_model_state = model.state_dict()
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print(f"Early stopping at epoch {epoch+1}")
            break

if best_model_state:
    model.load_state_dict(best_model_state)

Loaded 31 implication rules from assoc rules.
Epoch 1/10 - Train Loss: 1.1996 | Val Loss: 0.8389 | Val Micro F1: 0.3691 | GT Sat: 0.6675 | Axiom Sat: 0.9454
Epoch 2/10 - Train Loss: 1.0723 | Val Loss: 0.7947 | Val Micro F1: 0.4089 | GT Sat: 0.7345 | Axiom Sat: 0.9404
Epoch 3/10 - Train Loss: 1.0969 | Val Loss: 0.8022 | Val Micro F1: 0.3893 | GT Sat: 0.6828 | Axiom Sat: 0.9367
Epoch 4/10 - Train Loss: 1.1739 | Val Loss: 0.8075 | Val Micro F1: 0.3942 | GT Sat: 0.7084 | Axiom Sat: 0.9308
Epoch 5/10 - Train Loss: 0.9963 | Val Loss: 0.7785 | Val Micro F1: 0.3858 | GT Sat: 0.6678 | Axiom Sat: 0.9393
Early stopping at epoch 5


### 3. LTN Model Evaluation

This cell evaluates the performance of the trained LTN model on the test set.

- **Evaluation Mode:** The model is set to evaluation mode using `model.eval()`.
- **Prediction:** The model makes predictions on the test data.
- **Classification Report:** A classification report is printed, showing precision, recall, and F1-score for each genre.

In [19]:
from sklearn.metrics import classification_report
from torch.utils.data import DataLoader, TensorDataset
from ltn.fuzzy_ops import ImpliesLuk, AggregPMean
import torch
import numpy as np

# Prepare axiom operators
imp_op = ImpliesLuk()
aggregator = AggregPMean(p=2)

# Test DataLoader with attention_mask
test_dataset = TensorDataset(
    X_test_ids.to(device),
    X_test_mask.to(device),
    torch.tensor(y_test, dtype=torch.float32).to(device)
)
test_loader = DataLoader(test_dataset, batch_size=32)

# Switch to eval mode
model.eval()
all_preds = []
all_labels = []
all_axioms = []

with torch.no_grad():
    for batch_input_ids, batch_attention_mask, batch_y_true in test_loader:
        # Forward pass
        logits = model(input_ids=batch_input_ids, attention_mask=batch_attention_mask)
        probs = torch.sigmoid(logits)

        # Binary predictions
        preds = (probs > 0.5).cpu().numpy()
        all_preds.append(preds)
        all_labels.append(batch_y_true.cpu().numpy())

        # Axiom satisfaction
        if hasattr(model, "implication_pairs"):
            axiom_vals = []
            for a_idx, c_idx in model.implication_pairs:
                premise = probs[:, a_idx]
                conclusion = probs[:, c_idx]
                val = imp_op(premise, conclusion)
                axiom_vals.append(val)
            if axiom_vals:
                stacked_axioms = torch.stack(axiom_vals, dim=1)
                sat_per_example = aggregator(stacked_axioms)
                all_axioms.append(sat_per_example.cpu().numpy())

# Concatenate results
y_pred_binary = np.vstack(all_preds)
y_true = np.vstack(all_labels)

# Classification report
print("\nMulti-label classification report:")
print(classification_report(y_true, y_pred_binary, target_names=mlb.classes_, zero_division=0))

# Axiom satisfaction report
if all_axioms:
    axiom_scores = np.stack(all_axioms)
    print(f"\nAverage axiom satisfaction on test set: {axiom_scores.mean():.4f}")
    print(f"Min: {axiom_scores.min():.4f}, Max: {axiom_scores.max():.4f}")
else:
    print("\nNo implication rules found in model for axiom satisfaction.")

print("\nAvg predicted labels per sample:", y_pred_binary.sum(axis=1).mean())



Multi-label classification report:
              precision    recall  f1-score   support

      Action       0.29      1.00      0.45      1368
   Adventure       0.18      1.00      0.30       837
   Animation       0.05      1.00      0.10       262
   Biography       0.04      1.00      0.07       170
      Comedy       0.20      1.00      0.33       934
       Crime       0.21      1.00      0.35      1017
       Drama       0.39      1.00      0.56      1869
      Family       0.07      1.00      0.13       340
     Fantasy       0.09      1.00      0.17       437
   Film-Noir       0.01      1.00      0.01        32
     History       0.04      1.00      0.08       194
      Horror       0.18      1.00      0.31       877
       Music       0.02      1.00      0.04        88
     Musical       0.01      1.00      0.02        48
     Mystery       0.11      1.00      0.20       524
  Reality-TV       0.00      0.00      0.00         0
     Romance       0.16      1.00      0.27  

### 9. Model Performance Comparison

This cell provides a summary and comparison of the performance of both the baseline and the LTN-enhanced models. It discusses the poor performance of both models and suggests potential reasons and next steps for improvement.