# NLP Modeling: Classification Models

In this notebook, we demonstrate how to develop neural networks for NLP tasks. We will make use of the pre-trained embeddings and the data pipeline from the first two notebooks and train a binary classification model for sentiment analysis on IMDb movie reviews.

From this notebook, you will understand:

- how to develop models in Gluon.
- how to develop training pipelines.

You will learn the following about developing a model in Gluon:

- how to implement the continuous bag-of-words model in Gluon using the [`Block`](https://mxnet.apache.org/api/python/docs/api/gluon/block.html) API.
- how to switch to [`HybridBlock`](https://mxnet.apache.org/api/python/docs/api/gluon/hybrid_block.html) and its benefits.
- how to use the simplified [`Sequential`](https://mxnet.apache.org/api/python/docs/api/gluon/nn/index.html#mxnet.gluon.nn.Sequential) API for building the same model.

You will learn the following about developing a training pipeline:

- how to set up [`Loss`](https://mxnet.apache.org/api/python/docs/api/gluon/loss/index.html#mxnet.gluon.loss.Loss), [`Optimizer`](https://mxnet.apache.org/api/python/docs/api/optimizer/index.html#mxnet.optimizer.Optimizer), and [`EvalMetrics`](https://mxnet.apache.org/api/python/docs/api/metric/index.html#mxnet.metric.EvalMetric).
- how to enable single/multi-GPU training by specifying the [`Context`](https://mxnet.apache.org/api/python/docs/api/mxnet/context/index.html#mxnet.context.Context).
- how to put everything together in a modular way with the new [`estimator`](https://mxnet.apache.org/api/python/docs/api/gluon/contrib/index.html#mxnet.gluon.contrib.estimator.Estimator) API.

In [None]:
import mxnet as mx
from mxnet import gluon, nd, metric
from mxnet.gluon import nn, rnn
from mxnet.gluon.contrib import estimator
import gluonnlp as nlp

In [None]:
import utils

batch_size = 64
train_dataloader, test_dataloader, vocab = utils.load_data_imdb(batch_size) # see notebook 02
emb = nlp.embedding.create('fasttext', source='wiki.en', load_ngrams=True)

## Continuous Bag of Words (CBoW): Block and HybridBlock

In [None]:
class ContinuousBagOfWords(gluon.Block):
    def __init__(self, vocab_size, embed_size, **kwargs):
        super(ContinuousBagOfWords, self).__init__(**kwargs)
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.decoder = nn.Dense(2)

    def forward(self, inputs):
        # The shape of inputs is (batch size, number of words).
        embeddings = self.embedding(inputs)
        encoding = embeddings.mean(axis=1)
        outputs = self.decoder(encoding)
        return outputs

### Initialize Model with Pre-trained Embedding

In [None]:
emb_vocab_size, dim = emb.idx_to_vec.shape
print('Pre-trained embedding vocabulary size: {}, dimension: {}'.format(emb_vocab_size, dim))
print('IMDb training set vocabulary size: {}'.format(len(vocab)))

In [None]:
vocab.set_embedding(emb)
print('Shuffled embedding vocabulary size: {}, dimension: {}'.format(*vocab.embedding.idx_to_vec.shape))

In [None]:
embed_size, ctx = 300, mx.gpu(0)
net = ContinuousBagOfWords(len(vocab), embed_size)

In [None]:
net.embedding.initialize(mx.init.Constant(vocab.embedding.idx_to_vec), ctx=ctx)
net.embedding.weight.grad_req = 'null'

In [None]:
net.initialize(mx.init.Xavier(), ctx=ctx)

### HybridBlock

In [None]:
class HybridCBOW(gluon.HybridBlock):
    def __init__(self, vocab_size, embed_size, **kwargs):
        super(HybridCBOW, self).__init__(**kwargs)
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.decoder = nn.Dense(2)

    def hybrid_forward(self, F, inputs):
        # The shape of inputs is (batch size, number of words).
        embeddings = self.embedding(inputs)
        encoding = embeddings.mean(axis=1)
        outputs = self.decoder(encoding)
        return outputs

In [None]:
hybrid_net = HybridCBOW(len(vocab), embed_size)

### Simplified Modeling with Sequential

In [None]:
hybrid_sequential_net = nn.HybridSequential()
hybrid_sequential_net.add(hybrid_net.embedding,
                          nn.HybridLambda(lambda F, x: x.mean(axis=1)),
                          hybrid_net.decoder)

## Training Pipeline with Estimator

### Loss

In [None]:
loss = gluon.loss.SoftmaxCrossEntropyLoss()

### Trainer

In [None]:
trainer = gluon.Trainer(net.collect_params(), 'adam',
                        {'learning_rate': 0.01})

### Metrics

In [None]:
metrics = [metric.Loss(), metric.Accuracy()]

### Estimator

In [None]:
est = estimator.Estimator(net=net, loss=loss, metrics=metrics, trainer=trainer, context=ctx)

In [None]:
est.fit(train_data=train_dataloader, val_data=test_dataloader, epochs=5)

### Try out the model

In [None]:
def predict_sentiment(net, vocab, sentence):
    sentence = nd.array(vocab[sentence.split()], ctx=ctx)
    label = nd.argmax(net(sentence.reshape((1, -1))), axis=1)
    return 'positive' if label.asscalar() == 1 else 'negative'

In [None]:
predict_sentiment(net, vocab, 'this movie is so great')

In [None]:
predict_sentiment(net, vocab, 'this movie is so bad')

### API Docs

- [gluon.Block](https://mxnet.apache.org/api/python/docs/api/gluon/block.html) and [gluon.HybridBlock](https://mxnet.apache.org/api/python/docs/api/gluon/hybrid_block.html) classes.
- [D2L Hybridize Tutorial](en.d2l.ai/chapter_computational-performance/hybridize.html)
- [gluon.nn](https://mxnet.apache.org/api/python/docs/api/gluon/nn/index.html) and [gluon.rnn](https://mxnet.apache.org/api/python/docs/api/gluon/rnn/index.html) modules
- [Sequential](https://mxnet.apache.org/api/python/docs/api/gluon/nn/index.html#mxnet.gluon.nn.Sequential) and [HybridSequential](https://mxnet.apache.org/api/python/docs/api/gluon/nn/index.html#mxnet.gluon.nn.HybridSequential)
- [gluon.loss.SoftmaxCrossEntropyLoss](https://mxnet.apache.org/api/python/docs/api/gluon/loss/index.html#mxnet.gluon.loss.SoftmaxCrossEntropyLoss) and other [losses](https://mxnet.apache.org/api/python/docs/api/gluon/loss/index.html).
- [gluon.Trainer](https://mxnet.apache.org/api/python/docs/api/gluon/trainer.html) class.
- [metric.Loss](https://mxnet.apache.org/api/python/docs/api/metric/index.html#mxnet.metric.Loss), [metric.Accuracy](https://mxnet.apache.org/api/python/docs/api/metric/index.html#mxnet.metric.Accuracy), and other [metrics](https://mxnet.apache.org/api/python/docs/api/metric/index.html#module-mxnet.metric)
- [gluon.contrib.estimator.Estimator](https://mxnet.apache.org/api/python/docs/api/gluon/contrib/index.html#mxnet.gluon.contrib.estimator.Estimator) class and [handlers](https://mxnet.apache.org/api/python/docs/api/gluon/contrib/index.html#event-handler).

## Exercise: Train a Bi-directional LSTM Model

In this exercise, we will implement a bi-directional LSTM model for the sentiment analysis task. As an enhancement to the previous model, we will replace the mean pooling operation in CBoW with bi-directional LSTM layers. This model should consist of:

- an embedding layer with pre-trained word embedding. (same as CBoW)
- bi-directional LSTM layers for encoding.
- concatenation of the last layer's output on the first and the last time-steps.
- a dense layer for the binary classification output.

Complete the implementation of the class below:

In [None]:
class BiLSTMClassifier(nn.HybridBlock):
    """A standard embedding-bilstm-dense architecture for binary classification.
    
    Parameters
    ----------
    vocab_size: int
        Vocabulary size.
    embed_size: int
        Embedding dimension.
    num_hiddens: int
        Hidden state size of LSTM.
    num_layers: int
        Number of LSTM layers.
    """
    def __init__(self, vocab_size, embed_size, num_hiddens, num_layers, **kwargs):
        super(BiLSTMClassifier, self).__init__(**kwargs)
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.encoder = rnn.LSTM(...)
        self.decoder = nn.Dense(2)

    def hybrid_forward(self, F, inputs):
        embeddings = self.embedding(F.transpose(inputs))
        encoded_sequence = self.encoder(embeddings)

        first_out = F.slice_axis(encoded_sequence, 0, 0, 1)
        last_out = F.slice_axis(encoded_sequence, 0, -1, None)

        encoding = F.concat(...).reshape((-3, -1))
        outs = self.decoder(encoding)
        return outs

In [None]:
num_hiddens, num_layers = 100, 2
net = BiLSTMClassifier(len(vocab), embed_size, num_hiddens, num_layers)

In [None]:
net.embedding.initialize(mx.init.Constant(vocab.embedding.idx_to_vec), ctx=ctx)
net.embedding.weight.grad_req = 'null'
net.initialize(mx.init.Xavier(), ctx=ctx)
net.hybridize(static_alloc=True)

### Train and Evaluate the Model

Use the same setting as CBoW for training and evaluation with the following addition:

- enable checkpointing with [`estimator.CheckpointHandler`](https://mxnet.apache.org/api/python/docs/api/gluon/contrib/index.html#mxnet.gluon.contrib.estimator.CheckpointHandler) and save the model parameter and trainer state for every epoch to `data/`.

In [None]:
trainer = gluon.Trainer(net.collect_params(), 'adam',
                        {'learning_rate': 0.01})
est = estimator.Estimator(net=net, loss=loss,
                          metrics=metrics,
                          trainer=trainer,
                          context=ctx)

In [None]:
checkpoint = mx.gluon.contrib.estimator.CheckpointHandler('data/')
est.fit(train_data=train_dataloader,
        val_data=test_dataloader,
        event_handlers=checkpoint,
        epochs=5)

In [None]:
predict_sentiment(net, vocab, 'this movie is so great')

In [None]:
predict_sentiment(net, vocab, 'this movie is so bad')