The code aims to load a text dataset, preprocess it, and compare the performance of four different deep learning models by evaluating their F1 scores. The models compared are:

Bidirectional LSTM
GRU
1D Convolutional LSTM
Attention-based LSTM
Each model is trained on a subset of the data for 2 epochs, and their performance is evaluated using the F1 score on a test set.

Step-by-Step Breakdown
1. Importing Necessary Libraries
The code starts by importing libraries for data processing, model building, training, and evaluation. These include pandas, numpy, scikit-learn, and TensorFlow with Keras.

2. Loading and Preprocessing the Data
Loading the Dataset: The dataset is loaded from a CSV file.
Handling Missing Values: Missing values in the 'comment' column are filled with an empty string to ensure no null values are present.
Reducing Dataset Size: To manage memory constraints, a random sample of 10,000 records is taken from the dataset to make the computations more feasible.
Encoding Target Labels: The target labels are encoded using LabelEncoder to convert categorical labels into numerical format.
Text Vectorization: The text data in the 'comment' column is vectorized using TF-IDF (Term Frequency-Inverse Document Frequency) with a maximum of 1000 features to convert text into numerical vectors.
Splitting the Data: The dataset is split into training and testing sets to evaluate the models' performance on unseen data.
Standardizing the Data: The feature data is standardized using StandardScaler to have zero mean and unit variance, which helps in faster convergence during training.

3. Defining the Models
Bidirectional LSTM: This model uses Bidirectional LSTM layers to capture information from both forward and backward sequences. It is particularly useful for sequence data where context from both directions can be beneficial.
GRU (Gated Recurrent Unit): This model uses GRU layers, which are an alternative to LSTM with fewer parameters, making it faster and sometimes more efficient.
1D Convolutional LSTM: This model combines ConvLSTM2D layers, which apply convolutional operations over sequences, followed by Flatten and Dense layers. It captures spatial and temporal dependencies in the data.
Attention-based LSTM: This model uses an LSTM layer followed by a custom attention mechanism. The attention layer helps the model focus on important parts of the sequence by assigning different weights to different parts of the input.

4. Reshaping Data for Models
ConvLSTM Data Reshaping: The data is reshaped to fit the input requirements of the ConvLSTM2D layer.
LSTM, GRU, and Attention Data Reshaping: The data is reshaped to fit the input requirements of LSTM, GRU, and Attention-based LSTM models.

5. Training and Evaluating Models
Model Training: Each model is trained on the training data for 2 epochs. Training involves feeding the data to the model, calculating the loss, and updating the model weights to minimize the loss.
Model Evaluation: After training, each model makes predictions on the test data. These predictions are then compared to the actual labels to calculate the F1 score, which considers both precision and recall. The F1 score is particularly useful for imbalanced datasets.

6. Displaying F1 Scores
The F1 scores of all models are printed and compared to evaluate their performance. The model with the highest F1 score is considered the best performing model for the given task.


The models are compared based on their F1 scores. The F1 score is a measure of a model's accuracy considering both precision (the number of true positive results divided by the number of all positive results, including those not correctly identified) and recall (the number of true positive results divided by the number of positives that should have been identified). This metric is particularly useful for evaluating models on imbalanced datasets, where the number of positive and negative samples may not be equal. By comparing the F1 scores, we can determine which model performs best in terms of balancing precision and recall.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import f1_score
from sklearn.feature_extraction.text import TfidfVectorizer
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Conv1D, Flatten, LSTM, GRU, Bidirectional, ConvLSTM2D, Layer, Activation
from tensorflow.keras.layers import LayerNormalization, MultiHeadAttention, GlobalAveragePooling1D, Reshape, Add

# Load the dataset
data = pd.read_csv('cleaned_balanced_dataset_FINAL.csv')

# Handle missing values in the 'comment' column
data['comment'].fillna('', inplace=True)

# Reduce dataset size for memory efficiency (sample 10,000 records)
data = data.sample(n=10000, random_state=42)

# Encode target labels if necessary
label_column = 'label'
label_encoder = LabelEncoder()
data[label_column] = label_encoder.fit_transform(data[label_column])

# Text Vectorization using TF-IDF with fewer features
tfidf = TfidfVectorizer(max_features=1000)
X = tfidf.fit_transform(data['comment']).toarray()

# Split data into features and target
y = data[label_column]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler(with_mean=False)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define a custom attention layer
class AttentionLayer(Layer):
    def __init__(self, **kwargs):
        super(AttentionLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.W = self.add_weight(name='attention_weight', shape=(input_shape[-1], input_shape[-1]), initializer='random_normal', trainable=True)
        self.b = self.add_weight(name='attention_bias', shape=(input_shape[-1],), initializer='random_normal', trainable=True)
        super(AttentionLayer, self).build(input_shape)

    def call(self, inputs):
        q = tf.matmul(inputs, self.W) + self.b
        k = inputs
        v = inputs
        score = tf.matmul(q, k, transpose_b=True)
        score = tf.nn.softmax(score, axis=-1)
        context = tf.matmul(score, v)
        return context

# Define the models
def build_bilstm_model(input_shape):
    model = Sequential()
    model.add(Input(shape=input_shape))
    model.add(Bidirectional(LSTM(64)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

def build_gru_model(input_shape):
    model = Sequential()
    model.add(Input(shape=input_shape))
    model.add(GRU(64))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

def build_conv_lstm_model(input_shape):
    model = Sequential()
    model.add(Input(shape=(None, input_shape[0], 1, 1)))
    model.add(ConvLSTM2D(filters=32, kernel_size=(3, 3), padding='same', return_sequences=False))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

def build_attention_lstm_model(input_shape):
    inputs = Input(shape=input_shape)
    lstm_out = LSTM(64, return_sequences=True)(inputs)
    attention_out = AttentionLayer()(lstm_out)
    attention_out = GlobalAveragePooling1D()(attention_out)
    outputs = Dense(1, activation='sigmoid')(attention_out)
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Reshape data for ConvLSTM
X_train_conv_lstm = X_train.reshape((X_train.shape[0], 1, X_train.shape[1], 1, 1))
X_test_conv_lstm = X_test.reshape((X_test.shape[0], 1, X_test.shape[1], 1, 1))

# Reshape data for LSTM, GRU, and Attention models
X_train_rnn = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test_rnn = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# Define input shapes for different models
input_shape_rnn = (X_train.shape[1], 1)
input_shape_conv_lstm = (X_train.shape[1],)

# Initialize models
models = {
    'Bidirectional LSTM': build_bilstm_model(input_shape_rnn),
    'GRU': build_gru_model(input_shape_rnn),
    '1D Convolutional LSTM': build_conv_lstm_model(input_shape_conv_lstm),
    'Attention-based LSTM': build_attention_lstm_model(input_shape_rnn)
}

# Train and evaluate models
f1_scores = {}
for model_name, model in models.items():
    print(f"Training {model_name} model...")
    if model_name == '1D Convolutional LSTM':
        model.fit(X_train_conv_lstm, y_train, epochs=2, batch_size=32, verbose=1)
        y_pred = (model.predict(X_test_conv_lstm) > 0.5).astype("int32")
    else:
        model.fit(X_train_rnn, y_train, epochs=2, batch_size=32, verbose=1)
        y_pred = (model.predict(X_test_rnn) > 0.5).astype("int32")
    f1 = f1_score(y_test, y_pred)
    f1_scores[model_name] = f1
    print(f"{model_name} F1 Score: {f1}")

# Display F1 scores
print("F1 Scores for different models:")
for model_name, score in f1_scores.items():
    print(f"{model_name}: {score}")



Training Bidirectional LSTM model...
Epoch 1/2
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m193s[0m 708ms/step - accuracy: 0.5107 - loss: 0.6932
Epoch 2/2
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m174s[0m 696ms/step - accuracy: 0.5195 - loss: 0.6917
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 280ms/step
Bidirectional LSTM F1 Score: 0.6293706293706294
Training GRU model...
Epoch 1/2
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m152s[0m 568ms/step - accuracy: 0.4965 - loss: 0.6938
Epoch 2/2
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m201s[0m 562ms/step - accuracy: 0.5015 - loss: 0.6935
[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 203ms/step
GRU F1 Score: 0.6574882471457354
Training 1D Convolutional LSTM model...
Epoch 1/2
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 161ms/step - accuracy: 0.5643 - loss: 0.6789
Epoch 2/2
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[