<h1>IMDB Reviews sentiments prediction using LSTM</h1>

In [1]:
import tempfile
import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as fn
import fastestimator as fe
from fastestimator.dataset.data import imdb_review
from fastestimator.op.numpyop.univariate.reshape import Reshape
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy
from fastestimator.backend import feed_forward

In [2]:
MAX_WORDS = 10000
MAX_LEN = 500
batch_size = 64
epochs = 10

<h2>Building components</h2>

<h3>Step 1: Prepare training & evaluation data and define Pipeline</h3>

We are loading the dataset from the tf.keras.datasets.imdb which contains movie reviews and sentiment scores. All the words have been replaced with the integers that specifies the popularity of the word in corpus. To ensure all the sequences are of same length we need to pad the input sequences before defining the Pipeline.

In [3]:
train_data, eval_data = imdb_review.load_data(MAX_LEN, MAX_WORDS)
pipeline = fe.Pipeline(train_data=train_data,
                       eval_data=eval_data,
                       batch_size=batch_size,
                       ops=Reshape(1, inputs="y", outputs="y"))

<h3>Step 2: Create model and FastEstimator network</h3>

First, we have to define the network architecture and after defining the architecture, users are expected to feed the architecture definition, its associated model name and optimizer to fe.build.

In [4]:
class ReviewSentiment(nn.Module):
    def __init__(self, embedding_size=64, hidden_units=64):
        super().__init__()
        self.embedding = nn.Embedding(MAX_WORDS, embedding_size)
        self.conv1d = nn.Conv1d(in_channels=64, out_channels=32, kernel_size=3, padding=1)
        self.maxpool1d = nn.MaxPool1d(kernel_size=4)
        self.lstm = nn.LSTM(input_size=125, hidden_size=hidden_units, num_layers=1)
        self.fc1 = nn.Linear(in_features=hidden_units, out_features=250)
        self.fc2 = nn.Linear(in_features=250, out_features=1)

    def forward(self, x):
        x = self.embedding(x)
        x = x.permute((0, 2, 1))
        x = self.conv1d(x)
        x = fn.relu(x)
        x = self.maxpool1d(x)
        output, _ = self.lstm(x)
        x = output[:, -1]  # sequence output of only last timestamp
        x = fn.tanh(x)
        x = self.fc1(x)
        x = fn.relu(x)
        x = self.fc2(x)
        x = fn.sigmoid(x)
        return x

Network is the object that define the whole logic of neural network, including models, loss functions, optimizers etc. A Network can have several different models and loss funcitons (like GAN). <b>fe.Network</b> takes series of operators and here we feed our model in the ModelOp with inputs and outputs. It should be noted that the y_pred is the key in the data dictionary which will store the predictions.

In [5]:
model = fe.build(model_fn=lambda: ReviewSentiment(), optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(model=model, inputs="x", outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="loss"),
    UpdateOp(model=model, loss_name="loss")
])

<h3>Step 3: Prepare estimator and configure the training loop</h3>

<b>Estimator</b> is the API that wrap up the Pipeline, Network and other training metadata together. Estimator basically has four arguments network, pipeline, epochs and traces. Network and Pipeline objects are passed here as an argument. Traces are similar to the callbacks of Keras. The trace object will be called on specific timing during the training.

In the training loop, we want to measure the validation loss and save the model that has the minimum loss. BestModelSaver and Accuracy in the Trace class provide this convenient feature of storing the model.

In [6]:
model_dir = tempfile.mkdtemp()
traces = [Accuracy(true_key="y", pred_key="y_pred"), BestModelSaver(model=model, save_dir=model_dir)]
estimator = fe.Estimator(network=network,
                         pipeline=pipeline,
                         epochs=epochs,
                         traces=traces)

<h2>Training</h2>

In [7]:
estimator.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Start: step: 0; model_lr: 0.001; 




FastEstimator-Train: step: 0; loss: 0.69042003; 
FastEstimator-Train: step: 100; loss: 0.69947135; steps/sec: 47.4; 
FastEstimator-Train: step: 200; loss: 0.6544732; steps/sec: 48.28; 
FastEstimator-Train: step: 300; loss: 0.5703044; steps/sec: 49.8; 
FastEstimator-Train: step: 391; epoch: 0; epoch_time: 8.02 sec; 
Saved model to /tmp/tmp8vx7_h3t/model_best_loss.pt
FastEstimator-Eval: step: 391; epoch: 0; loss: 0.552492; min_loss: 0.552492; since_best: 0; accuracy: 0.7149949873490238; 
FastEstimator-Train: step: 400; loss: 0.4926735; steps/sec: 50.53; 
FastEstimator-Train: step: 500; loss: 0.51988935; steps/sec: 58.38; 
FastEstimator-Train: step: 600; loss: 0.4267825; steps/sec: 61.37; 
FastEstimator-Train: step: 700; loss: 0.39433068; steps/sec: 61.34; 
FastEstimator-Train: step: 782; epoch: 1; epoch_time: 6.57 sec; 
Saved model to /tmp/tmp8vx7_h3t/model_best_loss.pt
FastEstimator-Eval: step: 782; epoch: 1; loss: 0.42843726; min_loss: 0.42843726; since_best: 0; accuracy: 0.80479304912

<h2>Inferencing</h2>

For inferencing, first we have to load the trained model weights. We saved model weights with minimum loss and we will load the weights using <i>fe.build()</i>

In [8]:
model_name = 'model_best_loss.pt'
model_path = os.path.join(model_dir, model_name)
trained_model = fe.build(model_fn=lambda: ReviewSentiment(), weights_path=model_path, optimizer_fn="adam")

Loaded model weights from /tmp/tmp8vx7_h3t/model_best_loss.pt


Get any random sequence and compare the prediction with the ground truth.

In [9]:
selected_idx = np.random.randint(10000)
print("Ground truth is: ",eval_data[selected_idx]['y'])

Ground truth is:  0


Create data dictionary for the inference. <i>Transform()</i> function in Pipeline and Network applies all the operations on the given data.

In [10]:
infer_data = {"x":eval_data[selected_idx]['x'], "y":eval_data[selected_idx]['y']}
data = pipeline.transform(infer_data, mode="infer")
data = network.transform(data, mode="infer")

Finally, pass the model and data to the <i>feed_forward</i> for the inferencing

In [11]:
prediction = feed_forward(trained_model, data['x'], training=False)
print("Prediction for the input sequence: ",np.array(prediction.detach())[0][0])

Prediction for the input sequence:  0.47310668
