# Orange SPAM detector

One of the main pain point that AT\&T users are facing is constant exposure to SPAM messages.

AT\&T has been able to manually flag spam messages for a time, but they are looking for an automated way of detecting spams to protect their users.

Your goal is to build a spam detector, that can automatically flag spams as they come based sollely on the sms' content.

## Summary

- PREPROCESSING

- MODELS : 
  - Simple RNN
  - GRU
  - LSTM

- CLASSIFICATION EVALUATION

- TRANSFERT LEARNING

In [33]:
# Import Tensorflow & Pathlib librairies
import tensorflow as tf 
import pathlib 
import pandas as pd 
import os
import io
import warnings
warnings.filterwarnings('ignore')

In [34]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [35]:
# Import dataset with Pandas 
dataset = pd.read_csv("spam.csv", error_bad_lines=False, encoding = "ISO-8859-1")
dataset.head()

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,


In [36]:
# Show the total number of rows, the dataset description and the number of missing values

print("Number of rows : {}".format(dataset.shape[0]))
print()

print("Percentage of missing values: ")
display(100*dataset.isnull().sum()/dataset.shape[0])

dataset.describe(include="all")

Number of rows : 5572

Percentage of missing values: 


v1             0.000000
v2             0.000000
Unnamed: 2    99.102656
Unnamed: 3    99.784637
Unnamed: 4    99.892319
dtype: float64

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
count,5572,5572,50,12,6
unique,2,5169,43,10,5
top,ham,"Sorry, I'll call later","bt not his girlfrnd... G o o d n i g h t . . .@""","MK17 92H. 450Ppw 16""","GNT:-)"""
freq,4825,30,3,2,2


In [37]:
# show the distribution between the classes to predict
dataset.v1.groupby(dataset.v1).count()

v1
ham     4825
spam     747
Name: v1, dtype: int64

In [38]:
# Let's take the columns we're interested in
dataset = dataset.loc[:,["v1", "v2"]]
dataset.head()

Unnamed: 0,v1,v2
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [39]:
# transform target names in numerical classes
dataset['v1_transformed'] = dataset.v1.apply(lambda x : 0 if x == 'ham' else 1)
dataset.head()

Unnamed: 0,v1,v2,v1_transformed
0,ham,"Go until jurong point, crazy.. Available only ...",0
1,ham,Ok lar... Joking wif u oni...,0
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,1
3,ham,U dun say so early hor... U c already then say...,0
4,ham,"Nah I don't think he goes to usf, he lives aro...",0


In [40]:
# download all language elements related to the English language
!python -m spacy download en_core_web_sm -q

2023-01-14 09:57:59.108696: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m78.3 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [41]:
# Import Spacy and english initialisation
import en_core_web_sm
nlp = en_core_web_sm.load()

In [42]:
# Import Stop words to remove useless words
from spacy.lang.en.stop_words import STOP_WORDS

In [43]:
# use the command str.isalnum to remove all characters from strings that aren't alphanumeric, except for whitespaces 
dataset["v2_clean"] = dataset["v2"].apply(lambda x:''.join(ch for ch in x if ch.isalnum() or ch==" "))

# use str.replace, str.lower and str.strip to replace double whitespaces with single whitespaces, convert all characters to lowercase and trim starting and finishing whitespaces.
dataset["v2_clean"] = dataset["v2_clean"].apply(lambda x: x.replace(" +"," ").lower().strip())

# use spacy to replace all tokens in your texts with lemma_ and remove all the stop words.
dataset["v2_clean"] = dataset["v2_clean"].apply(lambda x: " ".join([token.lemma_ for token in nlp(x) if (token.lemma_ not in STOP_WORDS) & (token.text not in STOP_WORDS)]))

dataset.head()


Unnamed: 0,v1,v2,v1_transformed,v2_clean
0,ham,"Go until jurong point, crazy.. Available only ...",0,jurong point crazy available bugis n great wor...
1,ham,Ok lar... Joking wif u oni...,0,ok lar joking wif u oni
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,1,free entry 2 wkly comp win fa cup final tkts 2...
3,ham,U dun say so early hor... U c already then say...,0,u dun early hor u c
4,ham,"Nah I don't think he goes to usf, he lives aro...",0,nah think usf live


In [44]:
# instanciate the tokenizer and fit it on text
import numpy as np
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=1000) # keep only the 1000 most common words
tokenizer.fit_on_texts(dataset["v2_clean"])
dataset["v2_encoded"] = tokenizer.texts_to_sequences(dataset.v2_clean)

# create a new column showing the lenght of each sentence 
dataset["len_v2"] = dataset["v2_encoded"].apply(lambda x: len(x))
dataset = dataset[dataset["len_v2"]!=0]

# display the first 5 rows of the new dataset
dataset.head()

Unnamed: 0,v1,v2,v1_transformed,v2_clean,v2_encoded,len_v2
0,ham,"Go until jurong point, crazy.. Available only ...",0,jurong point crazy available bugis n great wor...,"[230, 444, 460, 943, 35, 51, 204, 944, 79, 945...",11
1,ham,Ok lar... Joking wif u oni...,0,ok lar joking wif u oni,"[9, 193, 289, 1]",4
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,1,free entry 2 wkly comp win fa cup final tkts 2...,"[11, 300, 3, 532, 655, 33, 849, 420, 20, 157, ...",13
3,ham,U dun say so early hor... U c already then say...,0,u dun early hor u c,"[1, 124, 149, 1, 84]",5
4,ham,"Nah I don't think he goes to usf, he lives aro...",0,nah think usf live,"[705, 22, 656, 127]",4


In [45]:
# add zero padding at the end of the sequences so they all have equal lenght
v2_pad = tf.keras.preprocessing.sequence.pad_sequences(dataset.v2_encoded, padding="post")

In [46]:
# create the tensor dataset
full_ds = tf.data.Dataset.from_tensor_slices((v2_pad, dataset.v1_transformed.values))

In [47]:
# Train Test Split
TAKE_SIZE = int(0.7*dataset.shape[0])

# use .shuffle only on the train set, and .batch on both sets to organise them by batches of 64 observations
train_data = full_ds.take(TAKE_SIZE).shuffle(TAKE_SIZE)
train_data = train_data.batch(64)

test_data = full_ds.skip(TAKE_SIZE)
test_data = test_data.batch(64)

In [48]:
 # Take a look of one batch 
for v2, v1 in train_data.take(1):
  print(v2, v1)

tf.Tensor(
[[ 17  44   0 ...   0   0   0]
 [ 11  77   0 ...   0   0   0]
 [106   0   0 ...   0   0   0]
 ...
 [  5 920   0 ...   0   0   0]
 [  9   0   0 ...   0   0   0]
 [  9  27   0 ...   0   0   0]], shape=(64, 47), dtype=int32) tf.Tensor(
[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0], shape=(64,), dtype=int64)


# Simple RNN

In [49]:
# Let's try with Simple RNN method
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

# vocab size is the lenght of 
vocab_size = len(tokenizer.word_index)
model = tf.keras.Sequential([
                  # Word Embedding layer           
                  Embedding(vocab_size, 64,name="embedding"),
                  # Gobal average pooling
                  SimpleRNN(units=64, return_sequences=True), # maintains the sequential nature
                  SimpleRNN(units=32, return_sequences=False), # returns the last output
                  # Dense layers once the data is flat
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  # output layer with as many neurons as the number of classes
                  # for the target variable and softmax activation
                  Dense(1, activation="sigmoid")
])

In [50]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 64)          524032    
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, None, 64)          8256      
                                                                 
 simple_rnn_3 (SimpleRNN)    (None, 32)                3104      
                                                                 
 dense_6 (Dense)             (None, 16)                528       
                                                                 
 dense_7 (Dense)             (None, 8)                 136       
                                                                 
 dense_8 (Dense)             (None, 1)                 9         
                                                                 
Total params: 536,065
Trainable params: 536,065
Non-tr

In [51]:
# instanciate the optimizer and compile the model
optimizer= tf.keras.optimizers.Adam()

model.compile(optimizer=optimizer,
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()])

In [52]:
(dataset["v1_transformed"]-1).value_counts()

-1    4667
 0     745
Name: v1_transformed, dtype: int64

In [53]:
# create a dictionnary variable that assigns to each value of the target variable a weight 
# that is inversely proportionnal to their frequency in the dataset
weights = 1/(dataset.v1_transformed).value_counts()
weights = weights * len(dataset)/2
weights = {index : values for index , values in zip(weights.index,weights.values)}
weights

{0: 0.5798157274480394, 1: 3.632214765100671}

In [54]:
# Model training 
history = model.fit(train_data,
                    epochs=100, 
                    validation_data=test_data,
                    class_weight=weights)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Scores after 100 epochs :


    loss : 0.0188 - val_loss : 0.1407
    binary_accuracy : 0.9945 - val_binary_accuracy : 0.9667


In [55]:
# Save the model
model.save("model_simpleRNN.h5")

In [56]:
import json
json.dump(model.history.history, open("/content/simpleRNN_history.json", 'w'))

# GRU

In [57]:
# Let's try GRU now 
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

vocab_size = len(tokenizer.word_index)
model_gru = tf.keras.Sequential([
                  Embedding(vocab_size, 64,name="embedding"),
                  GRU(units=64, return_sequences=True), # maintains the sequential nature
                  GRU(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(1, activation="sigmoid")
])

In [58]:
model_gru.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 64)          524032    
                                                                 
 gru_2 (GRU)                 (None, None, 64)          24960     
                                                                 
 gru_3 (GRU)                 (None, 32)                9408      
                                                                 
 dense_9 (Dense)             (None, 16)                528       
                                                                 
 dense_10 (Dense)            (None, 8)                 136       
                                                                 
 dense_11 (Dense)            (None, 1)                 9         
                                                                 
Total params: 559,073
Trainable params: 559,073
Non-tr

In [59]:
# instanciate the optimizer
optimizer= tf.keras.optimizers.Adam()

model_gru.compile(optimizer=optimizer,
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()])

In [60]:
# fit the model
model_gru.fit(train_data,
              epochs=100, 
              validation_data=test_data,
              class_weight=weights)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7fd4ca4175e0>

Scores after 100 epochs :


    loss : 0.6921 - val_loss : 0.6940
    binary_accuracy : 0.1381 - val_binary_accuracy : 0.1392

In [61]:
model_gru.save("model_gru.h5")

In [62]:
json.dump(model_gru.history.history, open("/content/GRU_history.json", 'w'))

# LSTM

In [63]:
# We try LSTM now
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM

vocab_size = len(tokenizer.word_index)
model_lstm = tf.keras.Sequential([
                  Embedding(vocab_size, 64,name="embedding"),
                  LSTM(units=64, return_sequences=True), # maintains the sequential nature
                  LSTM(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(1, activation="sigmoid")
])

In [64]:
model_lstm.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 64)          524032    
                                                                 
 lstm (LSTM)                 (None, None, 64)          33024     
                                                                 
 lstm_1 (LSTM)               (None, 32)                12416     
                                                                 
 dense_12 (Dense)            (None, 16)                528       
                                                                 
 dense_13 (Dense)            (None, 8)                 136       
                                                                 
 dense_14 (Dense)            (None, 1)                 9         
                                                                 
Total params: 570,145
Trainable params: 570,145
Non-tr

In [65]:
optimizer= tf.keras.optimizers.Adam()

model_lstm.compile(optimizer=optimizer,
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()])

In [66]:
model_lstm.fit(train_data,
              epochs=100, 
              validation_data=test_data,
               class_weight=weights)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7fd4c6f808e0>

Scores after 100 epochs :


    loss : 0.2952 - val_loss : 0.4043
    binary_accuracy : 0.8279 - val_binary_accuracy : 0.8042

In [67]:
model_lstm.save("model_lstm.h5")

In [68]:
json.dump(model_lstm.history.history, open("/content/LSTM_history.json", 'w'))

## Classification Evaluation

This part will focus on visualizing the training process and interpreting the results for our predictive models.

### SimpleRNN

In [69]:
# load Simple RNN model history
simpleRNN_history = json.load(open("/content/simpleRNN_history.json", 'r'))

In [70]:
model_simpleRNN = tf.keras.models.load_model("/content/model_simpleRNN.h5")

In [71]:
# Create a graph showing your loss and validation loss in relation to 
# the number of epochs for the simpleRNN model 
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(y=simpleRNN_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=simpleRNN_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()


The validation loss is greater than the training loss. This may indicate that the model is underfitting. Underfitting occurs when the model is unable to accurately model the training data, and hence generates large errors.

### GRU

In [72]:
GRU_history = json.load(open("/content/GRU_history.json", 'r'))
model_gru = tf.keras.models.load_model("/content/model_gru.h5")

In [73]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=GRU_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=GRU_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()

This model show an unrepresentative validation set, it means that the validation data does not provide sufficient information to evaluate the ability of the model to generalize.

This may occur if the validation dataset has too few examples as compared to the training dataset.

### LSTM

In [74]:
LSTM_history = json.load(open("/content/LSTM_history.json", 'r'))
model_lstm = tf.keras.models.load_model("/content/model_lstm.h5")

In [75]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=LSTM_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=LSTM_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()

LSTM model show an unrepresentative training set, it means that the training set does not provide sufficient information to learn the problem, relative to the validation data used to evaluate it.

This may occur if the training dataset has too few examples as compared to the validation dataset.

# Transfert Learning

Now we try to use a new pre-trained model based on text embedding trained on English Google News 7B corpus


In [104]:
# import the dataset
dataset = pd.read_csv("spam.csv", error_bad_lines=False, encoding = "ISO-8859-1")

In [105]:
from sklearn.preprocessing import LabelEncoder

# instanciate the LabelEncoder et fit on the target column
label = LabelEncoder()
dataset['v1_encoded'] = label.fit_transform(dataset['v1'])

In [106]:
from sklearn.model_selection import train_test_split

# train test split
x_train, x_test, y_train, y_test = train_test_split(dataset["v2"],dataset["v1_encoded"], test_size=0.3)

# create the tensor train and test sets
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
test_data = tf.data.Dataset.from_tensor_slices((x_test, y_test))

In [107]:
import tensorflow_hub as hub

# import the pre-trained embedding model
embedding = "https://tfhub.dev/google/nnlm-en-dim50/2"
hub_layer = hub.KerasLayer(embedding, input_shape=[], 
                           dtype=tf.string, trainable=True)

In [122]:
from tensorflow.keras.layers import Dropout

# create a sequential model with new trainables params 
model = tf.keras.Sequential()
model.add(hub_layer)                 
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2)) # adding dropouts layers to reduce overfitting
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.2)) # adding dropouts layers to reduce overfitting
model.add(Dense(1, activation="sigmoid"))

In [123]:
model.summary()

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 keras_layer_3 (KerasLayer)  (None, 50)                48190600  
                                                                 
 dense_35 (Dense)            (None, 32)                1632      
                                                                 
 dropout_13 (Dropout)        (None, 32)                0         
                                                                 
 dense_36 (Dense)            (None, 16)                528       
                                                                 
 dropout_14 (Dropout)        (None, 16)                0         
                                                                 
 dense_37 (Dense)            (None, 1)                 17        
                                                                 
Total params: 48,192,777
Trainable params: 48,192,777

In [124]:
model.compile(optimizer='adam',
                    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                    metrics=['accuracy'])

In [125]:
model.fit(train_data.shuffle(10000).batch(512),
              epochs=30, 
              validation_data=test_data.batch(512),
              class_weight=weights,
              verbose = 1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7fd4be95e040>

Scores after 30 epochs :


    loss : 0.0048 - val_loss : 0.1072
    binary_accuracy : 0.9997 - val_binary_accuracy : 0.9749

In [126]:
# save the model
model_lstm.save("model.h5")

In [127]:
json.dump(model.history.history, open("/content/model_history.json", 'w'))

# Model Classification

In [128]:
model_history = json.load(open("/content/model_history.json", 'r'))
model = tf.keras.models.load_model("/content/model.h5")

In [129]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=model_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=model_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()

## GOOD FIT

In this scenario, in the image below, the training loss and validation loss both decrease and stabilize at 7th epoch.

Unfortunately after the 7th epoch the model starts to overfits with a large gap between both curves.