## **Solution B:** Deep learning-based approaches that do not employ transformer architectures  

<!-- Move to README
-START-
**Task A:** Natural Language Inference (NLI)  

*Given a premise and a hypothesis, determine if the hypothesis is true based on the premise. You will be given more than 26K premise-hypothesis pairs as training data, and more than 6K pairs as validation data.*

---
-END- -->


Our final model uses a BiMPM-inspired architecture with frozen RoBERTa embeddings. The model captures matching perspectives between the encoded premise and hypothesis via a custom multi-perspective matching layer. Pre-trained RoBERTa embeddings are computed once and used as static inputs. The model was optimized using Optuna, and trained on the full dataset.

---

<!-- **Group 33:** Joudi Saeidan & Ghayadah Alsaadi   -->

<!-- --- -->

### **Notebook Overview**

This notebook:
- Tunes hyperparameters using Optuna  
- Trains the final BiMPM model with best parameters  
- Saves the model to:  
  `/savedModels/best_bimpm_model.keras`  
- Evaluates the model on the dev set  
<!-- - Saves predictions to `.predict` and `.zip` files for submission   -->

>  *Demo code for loading and using this model is provided in a separate notebook.*

 Setup and Install Packages

In [33]:
# from google.colab import drive
# drive.mount('/content/drive')

!pip install transformers tensorflow scikit-learn optuna --quiet

import os, re, string, zipfile, gc
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Bidirectional, Dense, Dropout, Concatenate, Lambda, GlobalAveragePooling1D, GlobalMaxPooling1D, Layer
from transformers import AutoTokenizer, TFAutoModel
from sklearn.metrics import classification_report
import optuna
from keras.saving import register_keras_serializable


# model_path = "/content/drive/MyDrive/dataset/training_data/NLI/best_bimpm_model.keras"
# train_path = "/content/drive/MyDrive/dataset/training_data/NLI/train.csv"
# dev_path   = "/content/drive/MyDrive/dataset/training_data/NLI/dev.csv"

model_path = "savedModels/best_bimpm_model.keras"
train_path = "../Data/train.csv"
dev_path   = "../Data/dev.csv"

MODEL_NAME = "roberta-base"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Load and Preprocess Data

In [34]:
def clean_text(text):
    text = ''.join(ch for ch in text if ch not in string.punctuation)
    text = re.sub(r'\s+', ' ', text).strip()
    return text

def load_and_preprocess_data(train_path,dev_path):
    train_df = pd.read_csv(train_path)
    dev_df   = pd.read_csv(dev_path)

    for df in (train_df, dev_df):
        df['premise'] = df['premise'].fillna("").apply(clean_text)
        df['hypothesis'] = df['hypothesis'].fillna("").apply(clean_text)

    train_labels = train_df.label.values.astype("int32")
    dev_labels   = dev_df.label.values.astype("int32")

    return train_df, dev_df, train_labels, dev_labels

train_df, dev_df, train_labels, dev_labels = load_and_preprocess_data(train_path,dev_path)

Extract RoBERTa Embeddings


In [35]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
transformer = TFAutoModel.from_pretrained(MODEL_NAME)
transformer.trainable = False

def encode(df, max_len=50):
    return tokenizer(
        df['premise'].tolist(),
        df['hypothesis'].tolist(),
        padding="max_length", truncation=True, max_length=max_len,
        return_tensors="tf"
    )

def compute_embeddings(input_ids, attention_mask, batch_size=256):
    embeddings = []
    dataset = tf.data.Dataset.from_tensor_slices((input_ids, attention_mask)).batch(batch_size)
    for batch_ids, batch_mask in dataset:
        output = transformer(batch_ids, attention_mask=batch_mask).last_hidden_state
        embeddings.append(output.numpy())
    return np.vstack(embeddings)

train_encodings = encode(train_df)
dev_encodings   = encode(dev_df)

train_embeddings = compute_embeddings(train_encodings['input_ids'], train_encodings['attention_mask'])
dev_embeddings   = compute_embeddings(dev_encodings['input_ids'], dev_encodings['attention_mask'])

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'roberta.embeddings.position_ids', 'lm_head.bias']
- This IS expected if you are initializing TFRobertaModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFRobertaModel were not initialized from the PyTorch model and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and infe

## Build BiMPM Model

### Custom BiMPM Matching Layer

> Note:
This function is used inside a Lambda layer in the model architecture to split the inputs into premise and hypothesis embeddings.
Because the model was saved with this custom function, we need to **redefine it exactly as it was during training** so that Keras can correctly rebuild the model when loading it.


In [36]:

class BiMPMMatching(Layer):
    def __init__(self, hidden_size, num_perspectives, **kwargs):
        super().__init__(**kwargs)
        self.hidden_size = hidden_size
        self.num_perspectives = num_perspectives
        self.W = self.add_weight(
            shape=(num_perspectives, hidden_size * 2),
            initializer="random_normal",
            trainable=True,
            name="W_bimpm"
        )

    def call(self, inputs):
        u, v = inputs

        def cosine_similarity(tensor_a, tensor_b):
            a_expanded = tf.expand_dims(tensor_a, axis=2) * tf.reshape(self.W, (1, 1, self.num_perspectives, self.hidden_size * 2))
            b_expanded = tf.expand_dims(tensor_b, axis=2) * tf.reshape(self.W, (1, 1, self.num_perspectives, self.hidden_size * 2))
            return -tf.keras.losses.cosine_similarity(a_expanded, b_expanded, axis=-1)

        def full_match(sequence, last_step_other_sequence):
            last_step_other_sequence_expanded = tf.repeat(tf.expand_dims(last_step_other_sequence, 1), tf.shape(sequence)[1], axis=1)
            return cosine_similarity(sequence, last_step_other_sequence_expanded)

        def maxpool_match(sequence_a, sequence_b):
            pooled_similarities = []
            for i in range(sequence_a.shape[1]):
                sequence_a_i = tf.repeat(tf.expand_dims(sequence_a[:, i, :], 1), sequence_b.shape[1], axis=1)
                similarity_scores = cosine_similarity(sequence_a_i, sequence_b)
                pooled_similarities.append(tf.reduce_max(similarity_scores, axis=1))
            return tf.stack(pooled_similarities, axis=1)

        full_match_premise = full_match(u, v[:, -1, :])
        full_match_hypothesis = full_match(v, u[:, -1, :])
        maxpool_premise = maxpool_match(u, v)
        maxpool_hypothesis = maxpool_match(v, u)

        return tf.concat([full_match_premise, full_match_hypothesis, maxpool_premise, maxpool_hypothesis], axis=-1)

### Model Builder

In [37]:
@register_keras_serializable()
def split_premise_and_hypothesis(x):
    return tf.split(x, num_or_size_splits=2, axis=1)


def build_bimpm_model(hidden_size=128, num_perspectives=20, dropout_rate=0.4, learning_rate=3e-4):
    model_input = Input(shape=(50, 768))
    premise, hypothesis = Lambda(split_premise_and_hypothesis, name="split_input")(model_input)

    encode = Bidirectional(LSTM(hidden_size, return_sequences=True))
    premise_encoded = encode(premise)
    hypothesis_encoded = encode(hypothesis)

    matching_layer = BiMPMMatching(hidden_size, num_perspectives)
    matching_output = matching_layer([premise_encoded, hypothesis_encoded])

    aggregation = Bidirectional(LSTM(hidden_size, return_sequences=True))(matching_output)
    average_pooling = GlobalAveragePooling1D()(aggregation)
    max_pooling = GlobalMaxPooling1D()(aggregation)
    pooled = Concatenate()([average_pooling, max_pooling])

    features = Dropout(dropout_rate)(pooled)
    features = Dense(hidden_size, activation='relu')(features)
    features = Dropout(dropout_rate)(features)

    output = Dense(2, activation='softmax')(features)

    model = Model(inputs=model_input, outputs=output)
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate),
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    return model

Hyperparameter Tuning

In [38]:
def optimize_bimpm(trial):
    # Hyperparameters to optimize
    hidden_size = trial.suggest_categorical("hidden_size", [64, 128])
    num_perspectives = trial.suggest_categorical("num_perspectives", [10, 20])
    dropout_rate = trial.suggest_float("dropout_rate", 0.2, 0.5)
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 5e-4, log=True)
    batch_size = trial.suggest_categorical("batch_size", [16, 32])

    model = build_bimpm_model(hidden_size, num_perspectives, dropout_rate, learning_rate)


    early_stop = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2, restore_best_weights=True)

    history = model.fit(
        train_embeddings, train_labels,
        validation_data=(dev_embeddings, dev_labels),
        epochs=10,
        batch_size=batch_size,
        callbacks=[early_stop],
        verbose=0
    )

    # Get the best validation accuracy
    best_validation_accuracy = max(history.history["val_accuracy"])

    # free memory between trials
    tf.keras.backend.clear_session()
    gc.collect()

    return best_validation_accuracy

Train and Save Best Model

In [39]:
if os.path.exists(model_path):
    print("Found existing model. Loading...")

    model = tf.keras.models.load_model(
        model_path,
        custom_objects={
            'BiMPMMatching': BiMPMMatching,
            'split_premise_and_hypothesis': split_premise_and_hypothesis
        }
    )
else:
    print("No saved model found. Running Optuna and training a new model...")

    study = optuna.create_study(direction="maximize")
    study.optimize(optimize_bimpm, n_trials=15)

    print("Best Hyperparameters:", study.best_params)

    best_hyperparameters = study.best_params
    model = build_bimpm_model(
        hidden_size=best_hyperparameters["hidden_size"],
        num_perspectives=best_hyperparameters["num_perspectives"],
        dropout_rate=best_hyperparameters["dropout_rate"],
        learning_rate=best_hyperparameters["learning_rate"]
    )

    early_stop = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=3, restore_best_weights=True)
    model.fit(train_embeddings, train_labels, validation_data=(dev_embeddings, dev_labels),
              epochs=15, batch_size=best_hyperparameters["batch_size"], callbacks=[early_stop])
    model.save(model_path)
    print("Model trained and saved to:", model_path)

Found existing model. Loading...


 Evaluation on Dev Set

In [40]:
preds = model.predict(dev_embeddings).argmax(axis=1)
print(classification_report(dev_labels, preds, digits=4))

[1m211/211[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 38ms/step
              precision    recall  f1-score   support

           0     0.7466    0.7397    0.7431      3258
           1     0.7583    0.7648    0.7615      3478

    accuracy                         0.7527      6736
   macro avg     0.7524    0.7523    0.7523      6736
weighted avg     0.7526    0.7527    0.7526      6736

