<h1>IMDB Reviews sentiments prediction using LSTM</h1>

In [1]:
import os
import numpy as np
import tempfile
import tensorflow as tf
from tensorflow.python.keras import layers
import fastestimator as fe
from fastestimator.dataset.data import imdb_review
from fastestimator.op.numpyop.univariate.reshape import Reshape
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy

In [2]:
MAX_WORDS = 10000
MAX_LEN = 500
batch_size = 64
epochs = 10

<h2>Step 1: Prepare training & evaluation data and define Pipeline</h2>

We are loading the dataset from the tf.keras.datasets.imdb which contains movie reviews and sentiment scores. All the words have been replaced with the integers that specifies the popularity of the word in corpus. To ensure all the sequences are of same length we need to pad the input sequences before defining the Pipeline.

In [3]:
train_data, eval_data = imdb_review.load_data(MAX_LEN, MAX_WORDS)
pipeline = fe.Pipeline(train_data=train_data,
                       eval_data=eval_data,
                       batch_size=batch_size,
                       ops=Reshape(1, inputs="y", outputs="y"))

<h2>Step 2: Create model and FastEstimator network</h2>

First, we have to define the network architecture and after defining the architecture, users are expected to feed the architecture definition, its associated model name and optimizer to fe.build.

In [4]:
def create_lstm():
    model = tf.keras.Sequential()
    model.add(layers.Embedding(MAX_WORDS, 64, input_length=MAX_LEN))
    model.add(layers.Conv1D(32, 3, padding='same', activation='relu'))
    model.add(layers.MaxPooling1D(pool_size=4))
    model.add(layers.LSTM(64))
    model.add(layers.Dense(250, activation='relu'))
    model.add(layers.Dense(1, activation="sigmoid"))
    return model

Network is the object that define the whole logic of neural network, including models, loss functions, optimizers etc. A Network can have several different models and loss funcitons (like GAN). <b>fe.Network</b> takes series of operators and here we feed our model in the ModelOp with inputs and outputs. It should be noted that the y_pred is the key in the data dictionary which will store the predictions.

In [5]:
model = fe.build(model_fn=create_lstm, optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(model=model, inputs="x", outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="loss"),
    UpdateOp(model=model, loss_name="loss")
])

<h2>Step 3: Prepare estimator and configure the training loop</h2>

<b>Estimator</b> is the API that wrap up the Pipeline, Network and other training metadata together. Estimator basically has four arguments network, pipeline, epochs and traces. Network and Pipeline objects are passed here as an argument. Traces are similar to the callbacks of Keras. The trace object will be called on specific timing during the training.

In the training loop, we want to measure the validation loss and save the model that has the minimum loss. BestModelSaver and Accuracy in the Trace class provide this convenient feature of storing the model.

In [6]:
model_dir=tempfile.mkdtemp()
traces = [Accuracy(true_key="y", pred_key="y_pred"), BestModelSaver(model=model, save_dir=model_dir)]
estimator = fe.Estimator(network=network,
                         pipeline=pipeline,
                         epochs=epochs,
                         traces=traces)

In [7]:
estimator.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Start: step: 0; model_lr: 0.001; 
FastEstimator-Train: step: 0; loss: 0.6933472; 
FastEstimator-Train: step: 100; loss: 0.694617; steps/sec: 4.96; 
FastEstimator-Train: step: 200; loss: 0.6934054; steps/sec: 4.95; 
FastEstimator-Train: step: 300; loss: 0.6937516; steps/sec: 4.91; 
FastEstimator-Train: step: 391; epoch: 0; epoch_time: 84.15 sec; 
Saved model to /tmp/tmpkgn3_al0/model_best_loss.h5
FastEstimator-Eval: step: 391; epoch: 0; loss: 0.6932595; min_loss: 0.6932595; since_best: 0; accuracy: 0.5007399627631641; 
FastEstimator-Train: st

<h2>Inferencing</h2>

In [8]:
model_name = 'model_best_loss.h5'
model_path = os.path.join(model_dir, model_name)
print(model_path)
trained_model = create_lstm()
trained_model.load_weights(model_path)

/tmp/tmpkgn3_al0/model_best_loss.h5


Get any random sequence and compare the prediction with the ground truth.

In [9]:
selected_idx = np.random.randint(10000)
print("Ground truth is: ",eval_data[selected_idx]['y'])
padded_seq = np.array([eval_data[selected_idx]['x']])
prediction = trained_model.predict(padded_seq)
print("Prediction for the input sequence: ",prediction)

Ground truth is:  0
Prediction for the input sequence:  [[0.35348406]]
