Models Used:
Multi-Layer Perceptron (MLP)
Convolutional Neural Network (CNN)
Long Short-Term Memory (LSTM)
Recurrent Neural Network (RNN)
Steps:
Data Loading and Preprocessing:

Load the dataset.
Handle missing values in the 'comment' column by filling them with an empty string.
Encode the target labels using LabelEncoder.
Vectorize the text data using TF-IDF with a maximum of 5000 features.
Split the data into training and testing sets.
Standardize the features.
Model Building:

Define a function to build and compile models based on the specified type (MLP, CNN, LSTM, RNN).
Add appropriate layers for each model type, with specific configurations for each (e.g., Dense layers for MLP, Conv1D for CNN).
Model Training and Evaluation:

Train each model on the training data for 10 epochs with a batch size of 32.
Predict the test data and compute the accuracy scores.
Print and compare the accuracy scores for each model.
Comparison:
Models are compared based on their accuracy scores, which measure the proportion of correctly classified instances out of the total instances.
The performance of each model is evaluated to determine which one achieves the highest classification accuracy.
Key Differences:


F1 Score: Focuses on the balance between precision and recall, making it suitable for imbalanced datasets.
Accuracy: Measures the overall correctness of the model, making it suitable for balanced datasets where both false positives and false negatives are equally important.
Both metrics provide valuable insights into model performance, and choosing between them depends on the specific requirements and characteristics of the classification task.

In [1]:
pip install keras-tuner --upgrade





In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import TfidfVectorizer
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Conv1D, Flatten, LSTM, SimpleRNN
import keras_tuner as kt

# Load the dataset
data = pd.read_csv('cleaned_balanced_dataset_FINAL.csv')

# Handle missing values in the 'comment' column
data['comment'].fillna('', inplace=True)

# Encode target labels if necessary
label_column = 'label'
label_encoder = LabelEncoder()
data[label_column] = label_encoder.fit_transform(data[label_column])

# Text Vectorization using TF-IDF
tfidf = TfidfVectorizer(max_features=5000)
X = tfidf.fit_transform(data['comment']).toarray()

# Split data into features and target
y = data[label_column]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler(with_mean=False)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the hypermodel
def build_hypermodel(hp):
    model = Sequential()
    model_type = hp.Choice('model_type', ['MLP', 'CNN', 'LSTM', 'RNN'])
    if model_type == 'CNN':
        model.add(Input(shape=(X_train.shape[1], 1)))
        model.add(Conv1D(filters=hp.Int('filters', min_value=32, max_value=128, step=32), kernel_size=hp.Choice('kernel_size', [3, 5, 7]), activation='relu'))
        model.add(Flatten())
    else:
        model.add(Input(shape=(X_train.shape[1],)))
        if model_type == 'MLP':
            model.add(Dense(units=hp.Int('units_mlp', min_value=32, max_value=512, step=32), activation='relu'))
            model.add(Dense(units=hp.Int('units_mlp2', min_value=32, max_value=512, step=32), activation='relu'))
        elif model_type == 'LSTM':
            model.add(LSTM(units=hp.Int('units_lstm', min_value=32, max_value=256, step=32)))
        elif model_type == 'RNN':
            model.add(SimpleRNN(units=hp.Int('units_rnn', min_value=32, max_value=256, step=32)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Reshape data for CNN
X_train_cnn = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test_cnn = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

# Initialize the tuner
tuner = kt.Hyperband(
    build_hypermodel,
    objective='val_accuracy',
    max_epochs=10,
    factor=3,
    directory='hyperband',
    project_name='text_classification'
)

# Define a callback to stop training early if the validation loss does not improve
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

# Run the tuner search
tuner.search(X_train, y_train, epochs=50, validation_split=0.2, callbacks=[stop_early])

# Get the best hyperparameters and model
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
best_model = tuner.hypermodel.build(best_hps)

# Train the best model
if best_hps.get('model_type') == 'CNN':
    best_model.fit(X_train_cnn, y_train, epochs=50, validation_split=0.2, callbacks=[stop_early])
else:
    best_model.fit(X_train, y_train, epochs=50, validation_split=0.2, callbacks=[stop_early])

# Evaluate the model
if best_hps.get('model_type') == 'CNN':
    y_pred = (best_model.predict(X_test_cnn) > 0.5).astype("int32")
else:
    y_pred = (best_model.predict(X_test) > 0.5).astype("int32")

accuracy = accuracy_score(y_test, y_pred)
print(f"Best Model Accuracy: {accuracy}")

# Display the best hyperparameters
print(f"Best hyperparameters: {best_hps.values}")


Trial 9 Complete [00h 00m 51s]

Best val_accuracy So Far: 0.6554674506187439
Total elapsed time: 00h 19m 22s

Search: Running Trial #10

Value             |Best Value So Far |Hyperparameter
CNN               |CNN               |model_type
192               |32                |units_mlp
64                |384               |units_mlp2
64                |224               |units_rnn
32                |128               |filters
3                 |5                 |kernel_size
2                 |2                 |tuner/epochs
0                 |0                 |tuner/initial_epoch
2                 |2                 |tuner/bracket
0                 |0                 |tuner/round

