### 1. Data Loading and Preprocessing

This cell handles the initial setup, including mounting Google Drive, loading the dataset, and performing essential preprocessing steps.

- **Drive Mount:** Mounts the Google Drive to access the dataset file.
- **Data Loading:** Loads the movie data from a CSV file into a pandas DataFrame.
- **Association Rule Mining:**
    - The `Output` column, containing comma-separated genres, is split into a list of genres for each movie.
    - `TransactionEncoder` converts this list into a one-hot encoded format suitable for association rule mining.
    - `fpgrowth` is used to find frequent itemsets of genres.
    - `association_rules` generates rules based on these itemsets, which are then filtered for high confidence and support.
- **Multi-Label Classification Preprocessing:**
    - The `description` for each movie is extracted from the `Input` column.
    - The `Output` column is converted into a list of genre labels.
    - `MultiLabelBinarizer` transforms these genre lists into a binary matrix format, which is the standard for multi-label classification tasks.

In [3]:
!pip install mlxtend
!pip install ltntorch

Collecting mlxtend
  Downloading mlxtend-0.23.4-py3-none-any.whl.metadata (7.3 kB)
Downloading mlxtend-0.23.4-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: mlxtend
Successfully installed mlxtend-0.23.4
Collecting ltntorch
  Downloading LTNtorch-1.0.2-py3-none-any.whl.metadata (13 kB)
Downloading LTNtorch-1.0.2-py3-none-any.whl (29 kB)
Installing collected packages: ltntorch
Successfully installed ltntorch-1.0.2


In [1]:
from google.colab import drive
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import fpgrowth, association_rules
from sklearn.preprocessing import MultiLabelBinarizer

drive.mount('/content/drive', force_remount=True)

# Load the data
df = pd.read_csv('/content/drive/My Drive/movie-genre-prediction/train.csv')

# Use only 25% of the dataset
df = df.sample(frac=0.10, random_state=42).reset_index(drop=True)

# Association rule mining
transactions = df['expanded-genres'].str.split(', ').tolist()
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = fpgrowth(df_encoded, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
high_confidence_rules = rules[(rules['confidence'] > 0.25) & (rules['support'] > 0.001)]

# Data preprocessing for multi-label classification
#df['description'] = df['Input'].apply(lambda x: x.split('\n\n', 1)[1] if '\n\n' in x else '')
df['Output-Label'] = df['expanded-genres'].str.split(', ')
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(df['Output-Label'])

# Display results
display(df.head())
display(high_confidence_rules)
print("Descriptions:")
display(df['description'].head())
print("\nBinary Labels (y):")
display(y[:5])

Mounted at /content/drive


Unnamed: 0,movie title - year,genre,expanded-genres,rating,description,Output-Label
0,Mei shan shou qi guai - 1973,Fantasy,"Action, Adventure, Fantasy",5.4,Na Cha is sent to the land of the dead to figh...,"[Action, Adventure, Fantasy]"
1,Money Fight - 2012,Action,"Action, Drama",3.9,"This full-contact action drama, loaded with au...","[Action, Drama]"
2,Dui Prithibi - 2010,Romance,"Drama, Romance",6.4,"Rahul, the son of a very rich man who has lost...","[Drama, Romance]"
3,The Barbarians - 1987,Fantasy,"Action, Adventure, Fantasy",4.9,Two twin barbarians seek revenge from the warl...,"[Action, Adventure, Fantasy]"
4,Bridge of Birds - nan,Fantasy,"Action, Adventure, Fantasy",,When a farm boy's village is cursed by a myste...,"[Action, Adventure, Fantasy]"


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Adventure),(Action),0.174641,0.28662,0.084278,0.482576,1.683682,1.0,0.034222,1.378716,0.491984,0.223558,0.274687,0.388308
1,(Action),(Adventure),0.28662,0.174641,0.084278,0.29404,1.683682,1.0,0.034222,1.16913,0.56921,0.223558,0.144663,0.388308
4,"(Adventure, Comedy)",(Action),0.040418,0.28662,0.013179,0.326064,1.13762,1.0,0.001594,1.058529,0.126068,0.04199,0.055293,0.186022
5,"(Comedy, Action)",(Adventure),0.044237,0.174641,0.013179,0.297913,1.705856,1.0,0.005453,1.175579,0.432936,0.064069,0.149355,0.186688
8,"(Adventure, Drama)",(Action),0.041803,0.28662,0.014438,0.345382,1.205017,1.0,0.002456,1.089765,0.177559,0.045983,0.082371,0.197877
10,(Fantasy),(Adventure),0.085621,0.174641,0.023,0.268627,1.538168,1.0,0.008047,1.128507,0.382638,0.09694,0.113873,0.200163
17,(Romance),(Drama),0.160035,0.395786,0.091245,0.570155,1.440563,1.0,0.027905,1.405654,0.364095,0.196404,0.288588,0.400348
18,(Comedy),(Romance),0.191975,0.160035,0.049148,0.256012,1.599724,1.0,0.018425,1.129003,0.463961,0.162278,0.114263,0.28156
19,(Romance),(Comedy),0.160035,0.191975,0.049148,0.307107,1.599724,1.0,0.018425,1.166162,0.446319,0.162278,0.142486,0.28156
20,"(Comedy, Drama)",(Romance),0.050617,0.160035,0.016201,0.320066,1.999974,1.0,0.0081,1.235363,0.526651,0.083315,0.190521,0.210649


Descriptions:


Unnamed: 0,description
0,Na Cha is sent to the land of the dead to figh...
1,"This full-contact action drama, loaded with au..."
2,"Rahul, the son of a very rich man who has lost..."
3,Two twin barbarians seek revenge from the warl...
4,When a farm boy's village is cursed by a myste...



Binary Labels (y):


array([[1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
        0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0]])

### 2. Baseline Model Training

This cell defines and trains a baseline multi-label classification model using a pre-trained DistilBERT model.

- **Device Configuration:** Sets the device to "cuda" if a GPU is available, otherwise "cpu".
- **Tokenizer and Model Loading:** Loads the "distilbert-base-uncased" tokenizer and model from the Hugging Face library.
- **Model Definition:**
    - A `BaselineMovieClassifier` class is defined, which includes the DistilBERT model and a linear classifier layer.
    - The model takes tokenized input and produces logits for each genre.
- **Training Setup:**
    - The model, loss function (BCEWithLogitsLoss), and optimizer (Adam) are initialized.
- **Data Preparation:**
    - The movie descriptions are tokenized using the DistilBERT tokenizer.
    - The data is split into training and testing sets.
    - A DataLoader is created for the training data to handle batching and shuffling.
- **Training Loop:**
    - The model is trained for 10 epochs.
    - In each epoch, the model processes batches of data, calculates the loss, and updates its weights.

In [16]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import torch.optim as optim
import numpy as np

# Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
transformer_model = AutoModel.from_pretrained("distilbert-base-uncased").to(device)

# Classifier model
class BaselineMovieClassifier(nn.Module):
    def __init__(self, transformer_model, num_labels, dropout=0.3):
        super(BaselineMovieClassifier, self).__init__()
        self.transformer = transformer_model
        self.dropout = nn.Dropout(dropout)
        self.classifier = nn.Linear(transformer_model.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask=None):
        outputs = self.transformer(input_ids=input_ids, attention_mask=attention_mask)
        embeddings = outputs.last_hidden_state[:, 0, :]  # CLS token
        x = self.dropout(embeddings)
        logits = self.classifier(x)
        return logits

# Prepare data and labels (assumes mlb and df already defined)
num_genres = len(mlb.classes_)
baseline_model = BaselineMovieClassifier(transformer_model, num_genres).to(device)

X = tokenizer(
    text=df['description'].tolist(),
    add_special_tokens=True,
    max_length=128,
    truncation=True,
    padding='max_length',
    return_tensors='pt',
    return_token_type_ids=False,
    return_attention_mask=True,
    verbose=True
)
input_ids = X['input_ids']
attention_mask = X['attention_mask']

# Split train+val/test
X_train_val_ids, X_test_ids, y_train_val, y_test, X_train_val_mask, X_test_mask = train_test_split(
    input_ids, y, attention_mask, test_size=0.2, random_state=42
)

# Further split train into train and val (10% val)
X_train_ids, X_val_ids, y_train, y_val, X_train_mask, X_val_mask = train_test_split(
    X_train_val_ids, y_train_val, X_train_val_mask, test_size=0.125, random_state=42
)

# Calculate pos_weight on training labels
positive_counts = np.sum(y_train, axis=0)
total_counts = y_train.shape[0]
negative_counts = total_counts - positive_counts
epsilon = 1e-5
pos_weights = torch.tensor(negative_counts / (positive_counts + epsilon), dtype=torch.float32).to(device)

criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weights)

# Hyperparams
epochs = 10
batch_size = 32
optimizer = optim.Adam(baseline_model.parameters(), lr=3e-5)
total_steps = (len(X_train_ids) // batch_size + 1) * epochs
warmup_steps = int(0.1 * total_steps)
scheduler = get_linear_schedule_with_warmup(optimizer, warmup_steps, total_steps)
max_grad_norm = 1.0

# DataLoaders
train_dataset = torch.utils.data.TensorDataset(
    X_train_ids.to(device),
    X_train_mask.to(device),
    torch.tensor(y_train, dtype=torch.float32).to(device)
)
val_dataset = torch.utils.data.TensorDataset(
    X_val_ids.to(device),
    X_val_mask.to(device),
    torch.tensor(y_val, dtype=torch.float32).to(device)
)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

def evaluate(model, loader):
    model.eval()
    losses = []
    preds = []
    targets = []
    with torch.no_grad():
        for batch_input_ids, batch_attention_mask, batch_y_true in loader:
            logits = model(batch_input_ids, attention_mask=batch_attention_mask)
            loss = criterion(logits, batch_y_true)
            losses.append(loss.item())

            y_pred = torch.sigmoid(logits).cpu().numpy()
            preds.append(y_pred)
            targets.append(batch_y_true.cpu().numpy())

    avg_loss = np.mean(losses)
    preds = np.vstack(preds)
    targets = np.vstack(targets)
    # Binarize preds with 0.5 threshold for metric
    preds_binary = (preds > 0.5).astype(int)

    f1 = f1_score(targets, preds_binary, average='micro', zero_division=0)
    return avg_loss, f1

# Training loop with validation
for epoch in range(epochs):
    baseline_model.train()
    total_loss = 0
    for batch_input_ids, batch_attention_mask, batch_y_true in train_loader:
        optimizer.zero_grad()

        logits = baseline_model(batch_input_ids, attention_mask=batch_attention_mask)
        loss = criterion(logits, batch_y_true)

        loss.backward()
        torch.nn.utils.clip_grad_norm_(baseline_model.parameters(), max_grad_norm)

        optimizer.step()
        scheduler.step()

        total_loss += loss.item()

    train_loss = total_loss / len(train_loader)
    val_loss, val_f1 = evaluate(baseline_model, val_loader)
    print(f"Epoch {epoch+1}/{epochs} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f} | Val Micro F1: {val_f1:.4f}")


Using device: cuda
Epoch 1/10 | Train Loss: 1.0907 | Val Loss: 0.7906 | Val Micro F1: 0.4191
Epoch 2/10 | Train Loss: 1.0073 | Val Loss: 0.7345 | Val Micro F1: 0.4729
Epoch 3/10 | Train Loss: 0.8137 | Val Loss: 0.7402 | Val Micro F1: 0.4967
Epoch 4/10 | Train Loss: 0.5914 | Val Loss: 0.7944 | Val Micro F1: 0.5043
Epoch 5/10 | Train Loss: 0.4885 | Val Loss: 0.8616 | Val Micro F1: 0.5215
Epoch 6/10 | Train Loss: 0.3777 | Val Loss: 0.9652 | Val Micro F1: 0.5292
Epoch 7/10 | Train Loss: 0.3333 | Val Loss: 1.0109 | Val Micro F1: 0.5434
Epoch 8/10 | Train Loss: 0.2824 | Val Loss: 1.1378 | Val Micro F1: 0.5394
Epoch 9/10 | Train Loss: 0.2582 | Val Loss: 1.1218 | Val Micro F1: 0.5461
Epoch 10/10 | Train Loss: 0.2433 | Val Loss: 1.1376 | Val Micro F1: 0.5455


### 3. Baseline Model Evaluation

This cell evaluates the performance of the trained baseline model on the test set.

- **Evaluation Mode:** The model is set to evaluation mode using `baseline_model.eval()`.
- **Prediction:** The model makes predictions on the test data.
- **Classification Report:** A classification report is printed, showing precision, recall, and F1-score for each genre.

In [17]:
from sklearn.metrics import classification_report
from torch.utils.data import DataLoader, TensorDataset

# Create a DataLoader for the test set
test_dataset = TensorDataset(X_test_ids.to(device), torch.tensor(y_test, dtype=torch.float32).to(device))
test_loader = DataLoader(test_dataset, batch_size=32)  # Adjust batch_size as needed

# Evaluation
baseline_model.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for batch_input_ids, batch_y_true in test_loader:
        logits = baseline_model(batch_input_ids)
        probs = torch.sigmoid(logits)
        preds = (probs > 0.6).cpu().numpy()
        all_preds.append(preds)
        all_labels.append(batch_y_true.cpu().numpy())

# Concatenate results
import numpy as np
y_pred_binary = np.vstack(all_preds)
y_true = np.vstack(all_labels)

# Generate the classification report
print(classification_report(y_true, y_pred_binary, target_names=mlb.classes_, zero_division=0))

print("Avg predicted labels per sample:", y_pred_binary.sum(axis=1).mean())



              precision    recall  f1-score   support

      Action       0.49      0.78      0.60      1368
   Adventure       0.35      0.69      0.47       837
   Animation       0.43      0.41      0.42       262
   Biography       0.71      0.41      0.52       170
      Comedy       0.32      0.72      0.45       934
       Crime       0.70      0.19      0.30      1017
       Drama       0.67      0.11      0.19      1869
      Family       0.27      0.57      0.37       340
     Fantasy       0.29      0.52      0.37       437
   Film-Noir       0.00      0.00      0.00        32
     History       0.61      0.21      0.31       194
      Horror       0.84      0.21      0.34       877
       Music       0.31      0.22      0.25        88
     Musical       0.11      0.08      0.09        48
     Mystery       0.26      0.64      0.37       524
  Reality-TV       0.00      0.00      0.00         0
     Romance       0.51      0.54      0.52       756
      Sci-Fi       0.64    

### 4. Baseline Model Prediction on Evaluation Set

This cell uses the trained baseline model to make predictions on a separate evaluation dataset.

- **Load Evaluation Data:** Loads the evaluation dataset from a CSV file.
- **Preprocess Evaluation Data:** The descriptions from the evaluation data are tokenized.
- **Make Predictions:** The model predicts genres for the evaluation data.
- **Store Predictions:** The predicted genres are added as a new column to the evaluation DataFrame.
- **Classification Report:** A classification report is generated to evaluate the model's performance on this new data.

In [None]:
# Load the evaluation data
eval_df = pd.read_csv('/content/drive/My Drive/movie-genre-prediction/evaluation_set.csv')

# Preprocess the evaluation data
eval_descriptions = eval_df['Input'].apply(lambda x: x.split('\n\n', 1)[1] if '\n' in x else '').tolist()
eval_X = tokenizer(
    text=eval_descriptions,
    add_special_tokens=True,
    max_length=128,
    truncation=True,
    padding='max_length',
    return_tensors='pt',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

# Make predictions on the evaluation data
baseline_model.eval()
with torch.no_grad():
    eval_X_tensor = eval_X['input_ids'].to(device)

    logits = baseline_model(eval_X_tensor)
    y_pred = torch.sigmoid(logits)
    predicted_labels_binary = (y_pred > 0.5).cpu().numpy()

# Convert the binary predictions to labels
predicted_labels = mlb.inverse_transform(predicted_labels_binary)

# Add the predicted labels to the evaluation dataframe
eval_df['predicted_genres_baseline'] = predicted_labels

# Transform the true and predicted labels using the same binarizer for a fair comparison
y_true_eval = mlb.transform(eval_df['expected_output'].str.split(', '))

# Generate the classification report
print("Classification Report for baseline model on the evaluation set:")
print(classification_report(y_true_eval, predicted_labels_binary, target_names=mlb.classes_, zero_division=0))

display(eval_df.head())

Classification Report for baseline model on the evaluation set:
                 precision    recall  f1-score   support

         Action       0.00      0.00      0.00        12
      Adventure       0.00      0.00      0.00         3
      Animation       0.00      0.00      0.00         5
         Comedy       0.00      0.00      0.00        36
          Crime       0.00      0.00      0.00         8
    Documentary       0.00      0.00      0.00        28
          Drama       0.00      0.00      0.00        49
         Family       0.00      0.00      0.00         8
        Fantasy       0.00      0.00      0.00         3
        History       0.00      0.00      0.00        10
         Horror       0.00      0.00      0.00        18
          Music       0.00      0.00      0.00         2
        Mystery       0.00      0.00      0.00         0
        Romance       0.00      0.00      0.00        17
Science Fiction       0.00      0.00      0.00         7
       TV Movie       0

Unnamed: 0,user_interaction_id,Input,Output,Vote Average,Vote Count,Annotation,expected_output,predicted_genres_baseline
0,e74dbc6c-36df-4822-b4df-913ae6c7a8bc,Spirit of a Denture\n\nDr. Middling is a denti...,"Adventure, Comedy",5.7,7,good,"Adventure, Comedy",()
1,f37f14a1-a013-4b01-88bc-9338c5a7c44a,We Ate the Children Last\n\nResearchers discov...,Comedy,6.2,6,good,Comedy,()
2,79edf183-8880-4141-b91a-475b429fc230,Castle Freak\n\nAfter she’s permanently blinde...,Horror,4.8,43,good,Horror,()
3,1e83532c-5a4c-40d9-a2cf-cc0d93d205a9,"My Man Is a Loser\n\nWhen it comes to women, p...",Comedy,4.5,29,good,Comedy,()
4,02d57491-c75e-4f1b-901b-64db45e9d78c,"Chirakodinja Kinavukal\n\nSumathi, a village g...",Comedy,7.2,11,good,Comedy,()


### 5. Installing LTNtorch

This cell installs the `ltntorch` library, which is required for building and training Logic Tensor Networks.

In [19]:
!pip install ltntorch

Collecting ltntorch
  Downloading LTNtorch-1.0.2-py3-none-any.whl.metadata (13 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->ltntorch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch->ltntorch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch->ltntorch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch->ltntorch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3

### 6. LTN Model Definition

This cell defines the LTN-enhanced movie classifier.

- **Model Definition:**
    - An `LTNMovieClassifier` class is defined, which, like the baseline, uses a DistilBERT model for embeddings.
    - Instead of a single classifier, it uses a dictionary of `ltn.Predicate` modules, one for each genre. Each predicate is a small neural network that learns a truth value for a movie belonging to a genre.
- **Model Instantiation:** The LTN model is instantiated.

In [10]:
print(dir(ltn.fuzzy_ops))

['AggregMean', 'AggregMin', 'AggregPMean', 'AggregPMeanError', 'AggregationOperator', 'AndLuk', 'AndMin', 'AndProd', 'BinaryConnectiveOperator', 'ConnectiveOperator', 'Equiv', 'ImpliesGodel', 'ImpliesGoguen', 'ImpliesKleeneDienes', 'ImpliesLuk', 'ImpliesReichenbach', 'LTNObject', 'NotGodel', 'NotStandard', 'OrLuk', 'OrMax', 'OrProbSum', 'SatAgg', 'UnaryConnectiveOperator', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'check_mask', 'check_values', 'eps', 'pi_0', 'pi_1', 'torch']


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from transformers import AutoTokenizer, AutoModel, get_linear_schedule_with_warmup
import numpy as np
from sklearn.model_selection import train_test_split

import ltn  # ltntorch for fuzzy logic
from ltn.fuzzy_ops import Equiv, AndLuk, ImpliesLuk, AggregPMean

# Your label list and mappings (replace with your actual mlb.classes_)
ALL_LABELS = list(mlb.classes_)
NUM_LABELS = len(ALL_LABELS)
LABEL_TO_IDX = {label: i for i, label in enumerate(ALL_LABELS)}

# Instantiate fuzzy logic ops
and_op = AndLuk()
imp_op = ImpliesLuk()
equiv_op = Equiv(and_op=and_op, implies_op=imp_op)
aggregator = AggregPMean(p=2)

# Helper to parse frozensets in assoc rules
def frozenset_to_list(fs):
    return list(fs)

# Build implication pairs from association rules
implication_pairs = []
for _, row in high_confidence_rules.iterrows():
    antecedents = frozenset_to_list(row['antecedents'])
    consequents = frozenset_to_list(row['consequents'])
    for a in antecedents:
        for c in consequents:
            if a in LABEL_TO_IDX and c in LABEL_TO_IDX:
                implication_pairs.append((LABEL_TO_IDX[a], LABEL_TO_IDX[c]))
implication_pairs = list(set(implication_pairs))
print(f"Loaded {len(implication_pairs)} implication rules from assoc rules.")

# y_train is numpy array with shape [num_samples, num_labels]
pos_counts = y_train.sum(axis=0)
neg_counts = y_train.shape[0] - pos_counts
epsilon = 1e-5
pos_weights = torch.tensor(neg_counts / (pos_counts + epsilon), dtype=torch.float32).to(device)

# Text Encoder with transformer
class TextEncoder(nn.Module):
    def __init__(self, model_name):
        super().__init__()
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModel.from_pretrained(model_name)

    def forward(self, texts):
        inputs = self.tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=128)
        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
        outputs = self.model(**inputs)
        cls_embeddings = outputs.last_hidden_state[:, 0, :]
        return cls_embeddings

# Grounding network maps embeddings to fuzzy truth values
class GroundingNetwork(nn.Module):
    def __init__(self, input_dim, num_labels, hidden_dim=256):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, num_labels)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return self.sigmoid(x)

class MultiLabelPredicate(nn.Module):
    def __init__(self, grounding_network):
        super().__init__()
        self.grounding_network = grounding_network

    def forward(self, embeddings):
        return self.grounding_network(embeddings)

class LTNMultiLabelClassifier(nn.Module):
    def __init__(self, model_name, num_labels, implication_pairs, pos_weights=None):
        super().__init__()
        self.text_encoder = TextEncoder(model_name)
        self.grounding_nn = GroundingNetwork(self.text_encoder.model.config.hidden_size, num_labels)
        self.P = MultiLabelPredicate(self.grounding_nn)
        self.implication_pairs = implication_pairs
        self.pos_weights = pos_weights  # tensor or None

    def forward(self, texts):
        embeddings = self.text_encoder(texts)
        pred_truth = self.P(embeddings)
        return pred_truth

    def compute_loss(self, pred_truth, true_labels):
        eps = 1e-6
        pred_clamped = pred_truth.clamp(min=eps, max=1 - eps)

        # Weighted BCE loss with pos_weights if provided
        if self.pos_weights is not None:
            # Expand pos_weights to batch size
            weights = self.pos_weights.unsqueeze(0).expand_as(true_labels)
            bce_loss = -(weights * true_labels * torch.log(pred_clamped) +
                         (1 - true_labels) * torch.log(1 - pred_clamped)).mean()
        else:
            bce_loss = -(true_labels * torch.log(pred_clamped) + (1 - true_labels) * torch.log(1 - pred_clamped)).mean()

        # Logical equivalence truth values
        equiv_values = equiv_op(pred_truth, true_labels)
        sat_per_example = aggregator(equiv_values)
        sat_gt = aggregator(sat_per_example)

        # Axiom satisfaction (implications)
        axiom_values = []
        for a_idx, c_idx in self.implication_pairs:
            a_truth = pred_truth[:, a_idx]
            c_truth = pred_truth[:, c_idx]
            impl_val = imp_op(a_truth, c_truth)
            axiom_values.append(impl_val)

        if axiom_values:
            axiom_stack = torch.stack(axiom_values, dim=1)
            axiom_per_example = aggregator(axiom_stack)
            sat_axiom = aggregator(axiom_per_example)
        else:
            sat_axiom = torch.tensor(1.0, device=pred_truth.device)

        overall_sat = and_op(sat_gt, sat_axiom)
        logic_loss = 1 - overall_sat

        lambda_logic = 0.25
        loss = (1 - lambda_logic) * bce_loss + lambda_logic * logic_loss

        return loss, sat_gt.item(), sat_axiom.item()


# Device and tokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Instantiate model
LLM_MODEL_NAME = "distilbert-base-uncased"
#model = LTNMultiLabelClassifier(LLM_MODEL_NAME, NUM_LABELS, implication_pairs).to(device)
pos_weights = torch.tensor(neg_counts / (pos_counts + 1e-5), dtype=torch.float32).to(device)

model = LTNMultiLabelClassifier(
    LLM_MODEL_NAME,
    NUM_LABELS,
    implication_pairs,
    pos_weights=pos_weights
).to(device)


# Prepare data
X = tokenizer(
    text=df['description'].tolist(),
    add_special_tokens=True,
    max_length=128,
    truncation=True,
    padding='max_length',
    return_tensors='pt',
    return_token_type_ids=False,
    return_attention_mask=True,
)

input_ids = X['input_ids']
attention_mask = X['attention_mask']

# Train/val/test splits
X_train_val_ids, X_test_ids, y_train_val, y_test, X_train_val_mask, X_test_mask = train_test_split(
    input_ids, y, attention_mask, test_size=0.2, random_state=42
)
X_train_ids, X_val_ids, y_train, y_val, X_train_mask, X_val_mask = train_test_split(
    X_train_val_ids, y_train_val, X_train_val_mask, test_size=0.125, random_state=42
)

# Create DataLoaders
train_dataset = torch.utils.data.TensorDataset(
    X_train_ids.to(device),
    X_train_mask.to(device),
    torch.tensor(y_train, dtype=torch.float32).to(device)
)
val_dataset = torch.utils.data.TensorDataset(
    X_val_ids.to(device),
    X_val_mask.to(device),
    torch.tensor(y_val, dtype=torch.float32).to(device)
)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=32)

# Optimizer and scheduler
optimizer = optim.Adam(model.parameters(), lr=3e-5)
total_steps = len(train_loader) * 10  # 10 epochs
warmup_steps = int(0.1 * total_steps)
scheduler = get_linear_schedule_with_warmup(optimizer, warmup_steps, total_steps)

# Training loop
epochs = 10
model.train()
for epoch in range(epochs):
    total_loss = 0
    for batch_ids, batch_mask, batch_labels in train_loader:
        optimizer.zero_grad()
        texts = tokenizer.batch_decode(batch_ids, skip_special_tokens=True)
        preds = model(texts)  # sigmoid outputs
        loss, sat_gt, sat_axiom = model.compute_loss(preds, batch_labels)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()
        total_loss += loss.item()

    avg_loss = total_loss / len(train_loader)
    print(f"Epoch {epoch+1}/{epochs} - Loss: {avg_loss:.4f} GT Sat: {sat_gt:.4f} Axiom Sat: {sat_axiom:.4f}")

    # Validation
    model.eval()
    with torch.no_grad():
        val_loss = 0
        for batch_ids, batch_mask, batch_labels in val_loader:
            texts = tokenizer.batch_decode(batch_ids, skip_special_tokens=True)
            preds = model(texts)
            loss, _, _ = model.compute_loss(preds, batch_labels)
            val_loss += loss.item()
        print(f"Validation Loss: {val_loss / len(val_loader):.4f}")
    model.train()


Loaded 31 implication rules from assoc rules.
Epoch 1/10 - Loss: 1.0311 GT Sat: 0.7059 Axiom Sat: 0.9648
Validation Loss: 0.7109
Epoch 2/10 - Loss: 0.8577 GT Sat: 0.7916 Axiom Sat: 0.9553
Validation Loss: 0.6601
Epoch 3/10 - Loss: 0.7556 GT Sat: 0.8227 Axiom Sat: 0.9484
Validation Loss: 0.6677
Epoch 4/10 - Loss: 0.6103 GT Sat: 0.8466 Axiom Sat: 0.9257
Validation Loss: 0.7129
Epoch 5/10 - Loss: 0.4976 GT Sat: 0.8725 Axiom Sat: 0.9355
Validation Loss: 0.7999
Epoch 6/10 - Loss: 0.4307 GT Sat: 0.8848 Axiom Sat: 0.9285
Validation Loss: 0.8770
Epoch 7/10 - Loss: 0.3636 GT Sat: 0.8905 Axiom Sat: 0.9418
Validation Loss: 0.9595


### 3. LTN Model Evaluation

This cell evaluates the performance of the trained LTN model on the test set.

- **Evaluation Mode:** The model is set to evaluation mode using `model.eval()`.
- **Prediction:** The model makes predictions on the test data.
- **Classification Report:** A classification report is printed, showing precision, recall, and F1-score for each genre.

In [30]:
from sklearn.metrics import classification_report
from torch.utils.data import DataLoader, TensorDataset
from ltn.fuzzy_ops import ImpliesLuk, AggregPMean
import torch
import numpy as np

# Initialize axiom operators
imp_op = ImpliesLuk()
aggregator = AggregPMean(p=2)

# Create a DataLoader for the test set
test_dataset = TensorDataset(X_test_ids.to(device), torch.tensor(y_test, dtype=torch.float32).to(device))
test_loader = DataLoader(test_dataset, batch_size=32)

# Evaluation
model.eval()
all_preds = []
all_labels = []
all_axioms = []

with torch.no_grad():
    for batch_input_ids, batch_y_true in test_loader:
        # Decode token ids back to list of strings
        texts = tokenizer.batch_decode(batch_input_ids, skip_special_tokens=True)

        # Pass texts to the model
        logits = model(texts)

        #logits = model(batch_input_ids)
        probs = torch.sigmoid(logits)

        # Binary predictions
        preds = (probs > 0.5).cpu().numpy()
        all_preds.append(preds)
        all_labels.append(batch_y_true.cpu().numpy())

        # Axiom satisfaction per batch
        if hasattr(model, "implication_pairs"):
            axiom_vals = []
            for a_idx, c_idx in model.implication_pairs:
                premise = probs[:, a_idx]
                conclusion = probs[:, c_idx]
                val = imp_op(premise, conclusion)
                axiom_vals.append(val)
            if axiom_vals:
                stacked_axioms = torch.stack(axiom_vals, dim=1)
                sat_per_example = aggregator(stacked_axioms)
                all_axioms.append(sat_per_example.cpu().numpy())

# Concatenate results
y_pred_binary = np.vstack(all_preds)
y_true = np.vstack(all_labels)

# Generate the classification report
print("\nMulti-label classification report:")
print(classification_report(y_true, y_pred_binary, target_names=mlb.classes_, zero_division=0))

# Axiom satisfaction reporting
if all_axioms:
    axiom_scores = np.stack(all_axioms)
    print(f"\nAverage axiom satisfaction on test set: {axiom_scores.mean():.4f}")
    print(f"Min: {axiom_scores.min():.4f}, Max: {axiom_scores.max():.4f}")
else:
    print("\nNo implication rules found in model for axiom satisfaction.")

print("\nAvg predicted labels per sample:", y_pred_binary.sum(axis=1).mean())



Multi-label classification report:
              precision    recall  f1-score   support

      Action       0.29      1.00      0.45      1368
   Adventure       0.18      1.00      0.30       837
   Animation       0.05      1.00      0.10       262
   Biography       0.04      1.00      0.07       170
      Comedy       0.20      1.00      0.33       934
       Crime       0.21      1.00      0.35      1017
       Drama       0.39      1.00      0.56      1869
      Family       0.07      1.00      0.13       340
     Fantasy       0.09      1.00      0.17       437
   Film-Noir       0.01      1.00      0.01        32
     History       0.04      1.00      0.08       194
      Horror       0.18      1.00      0.31       877
       Music       0.02      1.00      0.04        88
     Musical       0.01      1.00      0.02        48
     Mystery       0.11      1.00      0.20       524
  Reality-TV       0.00      0.00      0.00         0
     Romance       0.16      1.00      0.27  

### 7. LTN Model Training

This cell trains the LTN-enhanced model, incorporating logical axioms.

- **Fuzzy Operators:** Defines the fuzzy logic operators (And, Or, Implies, etc.) that will be used to construct the axioms.
- **Custom Loss Function:** A custom loss function `ltn_loss` is defined, which combines the standard binary cross-entropy loss with a loss term for the logical axioms. This axiom loss encourages the model to satisfy the predefined rules.
- **Training Setup:** The data is tokenized and prepared for training, similar to the baseline model.
- **Training Loop:**
    - The model is trained for 10 epochs.
    - In each training step, logical axioms are created from the high-confidence association rules.
    - The total loss is calculated and used to update the model's weights.

In [2]:
# === Training setup ===
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LTNMultiLabelClassifier("distilbert-base-uncased", NUM_LABELS, AXIOMS).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)

texts = df['description'].tolist()
true_labels = torch.tensor(y, dtype=torch.float32)

batch_size = 32
num_epochs = 5
num_samples = len(texts)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for i in range(0, num_samples, batch_size):
        batch_texts = texts[i:i+batch_size]
        batch_labels = true_labels[i:i+batch_size].to(device)

        optimizer.zero_grad()
        preds = model(batch_texts)
        loss, gt_sat, axiom_sat = model.compute_loss(preds, batch_labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / (num_samples // batch_size)
    print(f"Epoch {epoch+1}/{num_epochs} | Loss: {avg_loss:.4f} | GT Sat: {gt_sat:.4f} | Axiom Sat: {axiom_sat:.4f}")

"""
# === Inference example ===
model.eval()
with torch.no_grad():
    sample_texts = [
        "An exciting action drama with crime and thriller elements.",
        "A romantic comedy with family and animation themes.",
        "A science fiction fantasy movie."
    ]
    preds = model(sample_texts).cpu().numpy()
    for i, text in enumerate(sample_texts):
        print(f"\nText: {text}")
        for j, g in enumerate(GENRES):
            print(f"  {g}: {preds[i, j]:.4f}")
        for ant_idx, cons_idx in AXIOMS:
            val = ltn.functions.Lukasiewicz_implicator(
                torch.tensor(preds[i, ant_idx]),
                torch.tensor(preds[i, cons_idx])
            ).item()
            print(f"  Axiom: If {GENRES[ant_idx]} then {GENRES[cons_idx]} truth: {val:.4f}")
"""

NameError: name 'tokenizer' is not defined

### 8. LTN Model Evaluation

This cell evaluates the trained LTN-enhanced model on the evaluation set.

- **Prediction:** The model predicts genres for the evaluation data.
- **Store Predictions:** The predicted genres are added as a new column to the evaluation DataFrame.
- **Classification Report:** A classification report is generated to evaluate the model's performance.

In [None]:
from sklearn.metrics import classification_report

# Load the evaluation data
eval_df = pd.read_csv('/content/drive/My Drive/movie-genre-prediction/evaluation_set.csv')

# Preprocess the evaluation data
eval_descriptions = eval_df['Input'].apply(lambda x: x.split('\n\n', 1)[1] if '\n' in x else '').tolist()
eval_X = tokenizer(
    text=eval_descriptions,
    add_special_tokens=True,
    max_length=128,
    truncation=True,
    padding='max_length',
    return_tensors='pt',
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)

# Make predictions on the evaluation data
ltn_model.eval()
with torch.no_grad():
    eval_X_tensor = eval_X['input_ids'].to(device)

    embeddings = ltn_model(eval_X_tensor)
    x_embeddings = ltn.Variable("x_embeddings", embeddings)

    y_pred_list = [ltn_model.predicates[genre](x_embeddings).value.unsqueeze(1) for genre in genres]
    y_pred = torch.cat(y_pred_list, dim=1)

    predicted_labels_binary = (y_pred > 0.5).cpu().numpy()

# Convert the binary predictions to labels
predicted_labels = mlb.inverse_transform(predicted_labels_binary)

# Add the predicted labels to the evaluation dataframe
eval_df['predicted_genres_ltn'] = predicted_labels

# Transform the true and predicted labels using the same binarizer for a fair comparison
y_true_eval = mlb.transform(eval_df['expected_output'].str.split(', '))

# Generate the classification report
print("Classification Report for LTN-enhanced model on the evaluation set:")
print(classification_report(y_true_eval, predicted_labels_binary, target_names=genres, zero_division=0))

display(eval_df.head())

Classification Report for LTN-enhanced model on the evaluation set:
                 precision    recall  f1-score   support

         Action       0.00      0.00      0.00        12
      Adventure       0.00      0.00      0.00         3
      Animation       0.00      0.00      0.00         5
         Comedy       0.00      0.00      0.00        36
          Crime       0.00      0.00      0.00         8
    Documentary       0.00      0.00      0.00        28
          Drama       0.00      0.00      0.00        49
         Family       0.00      0.00      0.00         8
        Fantasy       0.00      0.00      0.00         3
        History       0.00      0.00      0.00        10
         Horror       0.00      0.00      0.00        18
          Music       0.00      0.00      0.00         2
        Mystery       0.00      0.00      0.00         0
        Romance       0.00      0.00      0.00        17
Science Fiction       0.00      0.00      0.00         7
       TV Movie    

Unnamed: 0,user_interaction_id,Input,Output,Vote Average,Vote Count,Annotation,expected_output,predicted_genres_ltn
0,e74dbc6c-36df-4822-b4df-913ae6c7a8bc,Spirit of a Denture\n\nDr. Middling is a denti...,"Adventure, Comedy",5.7,7,good,"Adventure, Comedy",()
1,f37f14a1-a013-4b01-88bc-9338c5a7c44a,We Ate the Children Last\n\nResearchers discov...,Comedy,6.2,6,good,Comedy,()
2,79edf183-8880-4141-b91a-475b429fc230,Castle Freak\n\nAfter she’s permanently blinde...,Horror,4.8,43,good,Horror,()
3,1e83532c-5a4c-40d9-a2cf-cc0d93d205a9,"My Man Is a Loser\n\nWhen it comes to women, p...",Comedy,4.5,29,good,Comedy,()
4,02d57491-c75e-4f1b-901b-64db45e9d78c,"Chirakodinja Kinavukal\n\nSumathi, a village g...",Comedy,7.2,11,good,Comedy,()


### 9. Model Performance Comparison

This cell provides a summary and comparison of the performance of both the baseline and the LTN-enhanced models. It discusses the poor performance of both models and suggests potential reasons and next steps for improvement.

In [None]:
## Model Performance Comparison

### Baseline Model:

The baseline model, a standard multi-label classifier using a pre-trained DistilBERT model, performs poorly on the evaluation set. The classification report shows precision, recall, and F1-scores of 0.00 for all genres. This indicates that the model fails to correctly predict any of the genres in the evaluation data. The `predicted_genres_baseline` column in the `eval_df` DataFrame is empty for all samples, confirming that the model did not make any positive predictions.

### LTN-enhanced Model:

Similarly, the LTN-enhanced model, which incorporates logical axioms derived from association rule mining, also shows no improvement in performance. The classification report for this model is identical to the baseline model, with all metrics at 0.00. The `predicted_genres_ltn` column is also empty, indicating a failure to predict any genres.

### Conclusion:

Both the baseline and the LTN-enhanced models completely fail to generalize to the evaluation set. Several factors could contribute to this poor performance:

- **Data Quality:** The descriptions might not contain enough information to distinguish between genres.
- **Model Complexity:** The models might be too complex for the given data, leading to overfitting on the training set.
- **Hyperparameter Tuning:** The learning rate, batch size, and number of epochs might not be optimal.
- **Axiom Quality:** The association rules used as axioms in the LTN model might not be strong enough or might not generalize well to unseen data.

Further investigation is needed to diagnose the root cause of the issue. This could involve:

- **Error Analysis:** Manually inspecting the model's predictions on the training set to understand where it is failing.
- **Data Augmentation:** Increasing the size and diversity of the training data.
- **Hyperparameter Optimization:** Systematically tuning the model's hyperparameters.
- **Feature Engineering:** Exploring different ways to represent the input text.