<a href="https://colab.research.google.com/github/snps-erwinc/Training/blob/main/SeqClassification_LSTM_Dropout.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Dropout is a regularization technique in neural networks that prevents overfitting by randomly "dropping out" (setting to zero) a fraction of neurons during training, which forces the network to learn more robust features that are not reliant on specific neurons. Using dropout in a neural network, say LSTM, has several advantages, particularly in preventing overfitting and improving the generalization of the model. Please see Foot Note 1 for more details.

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing import sequence
# fix random seed for reproducibility
tf.random.set_seed(7)

In [None]:
# load the dataset but only keep the top n words, zero the rest
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [None]:
# truncate and pad input sequences
max_review_length = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)

In [None]:
# create the model
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)



None
Epoch 1/3
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 31ms/step - accuracy: 0.6827 - loss: 0.5691 - val_accuracy: 0.8690 - val_loss: 0.3231
Epoch 2/3
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 29ms/step - accuracy: 0.8615 - loss: 0.3290 - val_accuracy: 0.8663 - val_loss: 0.3251
Epoch 3/3
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 31ms/step - accuracy: 0.8905 - loss: 0.2730 - val_accuracy: 0.8580 - val_loss: 0.3337


<keras.src.callbacks.history.History at 0x7db29fddaf20>

None
Epoch 1/3
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 31ms/step - accuracy: 0.6735 - loss: 0.5799 - val_accuracy: 0.8229 - val_loss: 0.4022
Epoch 2/3
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 31ms/step - accuracy: 0.8632 - loss: 0.3377 - val_accuracy: 0.8762 - val_loss: 0.3008
Epoch 3/3
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 30ms/step - accuracy: 0.8941 - loss: 0.2692 - val_accuracy: 0.8596 - val_loss: 0.3507


<keras.src.callbacks.history.History at 0x7db29ff16aa0>

In [None]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 85.80%
Accuracy: 85.96%


In [None]:
from tensorflow.keras.preprocessing.text import text_to_word_sequence
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Function to preprocess and predict sentiment of a new review
def predict_review(review):
    # Preprocess the review text
    word_index = imdb.get_word_index()
    review = text_to_word_sequence(review)
    review = [word_index.get(word, 0) + 3 for word in review if word_index.get(word, 0) < (top_words - 3)]
    review = pad_sequences([review], maxlen=max_review_length)

    # Predict sentiment
    prediction = model.predict(review)
    print("Positive" if prediction[0][0] > 0.5 else "Negative")

# Example usage
new_review = "It was a terrible and not useful movie"
predict_review(new_review)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[1m1641221/1641221[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 105ms/step
Negative
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 123ms/step
Negative


**FOOTNOTE 1**

Using dropout in an LSTM (Long Short-Term Memory) network has several advantages, particularly in preventing overfitting and improving the generalization of the model. Here are the key benefits:

  Preventing Overfitting:
        Dropout helps prevent overfitting by randomly setting a fraction of the input units to 0 at each update during training time. This forces the network to learn more robust features that are less dependent on specific neurons, thereby improving the model's ability to generalize to new, unseen data.

  Regularization:
        Dropout acts as a form of regularization, which helps to reduce the complexity of the model. By randomly dropping units, the network is less likely to memorize the training data, and instead, it is encouraged to learn more general patterns.

  Improving Generalization:
        By adding noise to the model (in the form of dropped units), dropout helps the network to generalize better to new data. This means the model performs better on the validation and test datasets, not just on the training data.

  Combating the Vanishing Gradient Problem:
        In deep networks, and particularly in RNNs like LSTMs, the vanishing gradient problem can hinder learning. Dropout can help mitigate this issue by providing a form of noise that can help gradients propagate more effectively through the network.

REF: https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/