<a href="https://www.kaggle.com/code/mennatullaheisawy/spam-mail-detection-using-rnn-lstm-gru-99-2?scriptVersionId=191789892" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 1. Introduction
In this project, we will build and compare different Recurrent Neural Network (RNN) architectures, including LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Units), to classify emails as spam or ham. The dataset used for this project contains SMS messages labeled as 'spam' or 'ham'.

# 2. Import Libraries
In this project, we will build and compare different Recurrent Neural Network (RNN) architectures, including LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Units), to classify emails as spam or ham. The dataset used for this project contains SMS messages labeled as 'spam' or 'ham'.

In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, LSTM, GRU, Dense, Dropout

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

We begin by importing the necessary libraries for data processing, model building, and evaluation. Libraries like TensorFlow are used for building RNN models, while scikit-learn is used for preprocessing and evaluation.

**Educational Content**:

**Tokenization**: This is the process of converting text into tokens, which are the smallest units of text (like words or characters). In neural networks, tokens are then converted into sequences that can be fed into the models.

**RNNs, LSTM, and GRU**: RNNs are neural networks well-suited for sequential data like text. LSTMs and GRUs are variants of RNNs designed to mitigate the vanishing gradient problem, allowing them to capture longer-term dependencies.

# 2. Loading and Exploring the Data

In [15]:
df = pd.read_csv('/kaggle/input/spam-emails/spam.csv')
df.head()

Unnamed: 0,Category,Message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


We start by loading the dataset and displaying the first few rows to understand its structure.

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5572 entries, 0 to 5571
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Category  5572 non-null   object
 1   Message   5572 non-null   object
dtypes: object(2)
memory usage: 87.2+ KB


In [17]:
df.describe()

Unnamed: 0,Category,Message
count,5572,5572
unique,2,5157
top,ham,"Sorry, I'll call later"
freq,4825,30


**Data Understanding**: 

The dataset contains two columns: Category, which is the target variable (ham or spam), and Message, which contains the text data to be classified.

# 4. Data Cleaning and Preprocessing

### 4.1 Checking for Duplicates

In [18]:
df.duplicated().sum()

415

### 4.2 Removing Duplicates

In [19]:
df.drop_duplicates(inplace=True)

### 4.3 Label Encoding

In [20]:
encoder = LabelEncoder()
df['Category'] = encoder.fit_transform(df['Category'])
df.head()

Unnamed: 0,Category,Message
0,0,"Go until jurong point, crazy.. Available only ..."
1,0,Ok lar... Joking wif u oni...
2,1,Free entry in 2 a wkly comp to win FA Cup fina...
3,0,U dun say so early hor... U c already then say...
4,0,"Nah I don't think he goes to usf, he lives aro..."


Here, we convert the categorical labels (ham and spam) into numeric form using Label Encoding. This step is necessary because most machine learning models work with numerical data.

**Educational Content**:

Label Encoding: It is a process of converting categorical data into a format that can be provided to ML algorithms to improve predictions. It assigns a unique number to each class.

# 5. Data Preparation for Model Training

In [21]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(df['Message'])
sequences = tokenizer.texts_to_sequences(df['Message'])

max_len = max([len(seq) for seq in sequences])
X = pad_sequences(sequences, maxlen= max_len)
y = df['Category']

vocabs = len(tokenizer.word_index)+1

In this step, we convert the text data into sequences of tokens and then pad them to ensure that all sequences have the same length. This preprocessing step is crucial for feeding text data into RNN models.


**Educational Content**

**Padding**: Padding sequences ensures that all input sequences are of the same length, which is necessary for batch processing in neural networks.

In [22]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=42)

# 6. Building and Training the Models

## Simple RNN Model

In [32]:
rnn_model= Sequential([
    Embedding(vocabs, 64, input_length = max_len ),
    SimpleRNN(128),
    Dense(1, activation='sigmoid')
])

rnn_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])



## LSTM Model

In [33]:
lstm_model= Sequential([
    Embedding(vocabs, 64, input_length = max_len ),
    LSTM(128),
    Dropout(0.2),
    Dense(1, activation='sigmoid')
])

lstm_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

## GRU Model

In [34]:
gru_model= Sequential([
    Embedding(vocabs, 64, input_length = max_len ),
    GRU(128),
    Dropout(0.4),
    Dense(1, activation='sigmoid')
])

gru_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# 7. Model Evaluation

## Training and Testing the Models

In [35]:
from tensorflow.keras.callbacks import EarlyStopping

rnn_cb = EarlyStopping(patience=5, restore_best_weights=True)
lstm_cb = EarlyStopping(patience=5, restore_best_weights=True)
gru_cb = EarlyStopping(patience=5, restore_best_weights=True)

In [36]:
history_rnn = rnn_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), batch_size=16, callbacks=rnn_cb)
history_lstm = lstm_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), batch_size=16, callbacks=lstm_cb)
history_gru = gru_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), batch_size=16, callbacks=gru_cb)

Epoch 1/10
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 49ms/step - accuracy: 0.8651 - loss: 0.3688 - val_accuracy: 0.9767 - val_loss: 0.0858
Epoch 2/10
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 48ms/step - accuracy: 0.9870 - loss: 0.0479 - val_accuracy: 0.9884 - val_loss: 0.0457
Epoch 3/10
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 46ms/step - accuracy: 0.9977 - loss: 0.0087 - val_accuracy: 0.9903 - val_loss: 0.0416
Epoch 4/10
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 47ms/step - accuracy: 0.9996 - loss: 0.0021 - val_accuracy: 0.9835 - val_loss: 0.0520
Epoch 5/10
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 48ms/step - accuracy: 0.9999 - loss: 8.8069e-04 - val_accuracy: 0.9816 - val_loss: 0.0540
Epoch 6/10
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 46ms/step - accuracy: 0.9997 - loss: 7.7973e-04 - val_accuracy: 0.9893 - val_loss: 0.0526
Epoch 7/

We train each model on the training data and evaluate their performance on the test data. The models are trained for 10 epochs with a batch size of 16.



**Educational Content**

Epochs and Batch Size: An epoch is one complete pass through the entire training dataset. Batch size refers to the number of training examples utilized in one iteration.

### Performance Metrics

In [37]:
y_pred_rnn = rnn_model.predict(X_test)
y_pred_lstm = lstm_model.predict(X_test)
y_pred_gru = gru_model.predict(X_test)

print("RNN Model Accuracy: ", accuracy_score(y_test, y_pred_rnn.round()))
print("LSTM Model Accuracy: ", accuracy_score(y_test, y_pred_lstm.round()))
print("GRU Model Accuracy: ", accuracy_score(y_test, y_pred_gru.round()))


[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 29ms/step
[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 74ms/step
[1m33/33[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 52ms/step
RNN Model Accuracy:  0.9903100775193798
LSTM Model Accuracy:  0.9922480620155039
GRU Model Accuracy:  0.9903100775193798


Here, we predict the test data using the trained models and calculate the accuracy. The performance of each model is compared to identify the best model for this task.

### Confusion Matrix and Classification Report


#### 1. RNN

In [38]:
print(confusion_matrix(y_test, y_pred_rnn.round()))
print(classification_report(y_test, y_pred_rnn.round()))

[[896   0]
 [ 10 126]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99       896
           1       1.00      0.93      0.96       136

    accuracy                           0.99      1032
   macro avg       0.99      0.96      0.98      1032
weighted avg       0.99      0.99      0.99      1032



#### 2. LSTM

In [39]:
print(confusion_matrix(y_test, y_pred_lstm.round()))
print(classification_report(y_test, y_pred_lstm.round()))

[[893   3]
 [  5 131]]
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       896
           1       0.98      0.96      0.97       136

    accuracy                           0.99      1032
   macro avg       0.99      0.98      0.98      1032
weighted avg       0.99      0.99      0.99      1032



#### 3. GRU

In [40]:
print(confusion_matrix(y_test, y_pred_gru.round()))
print(classification_report(y_test, y_pred_gru.round()))

[[893   3]
 [  7 129]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99       896
           1       0.98      0.95      0.96       136

    accuracy                           0.99      1032
   macro avg       0.98      0.97      0.98      1032
weighted avg       0.99      0.99      0.99      1032



We evaluate the models using confusion matrices and classification reports, which provide more insights into the model performance by showing precision, recall, and F1-score.

# 8. Conclusion

In this notebook, we built and compared three different RNN-based models for spam detection: Simple RNN, LSTM, and GRU. Among these models, **[LSTM]** showed the highest accuracy with **[99%]**. It also had **[the best Precision and Recall Results]**.
