# Hands-on: Training and deploying GluonNLP models on AWS SageMaker

You will learn the following:

- practice fine-tuning Bert on Sentiment classification
- exporting models in a self-contained way
- creating a SageMaker Endpoint serving your model

In [1]:
import argparse, time
import numpy as np
import mxnet as mx
import gluonnlp as nlp

# Hyperparameters
batch_size = 32
num_epochs = 1
lr = 5e-5

### Get Pre-trained BERT Model

We can load the pre-trained BERT easily using the model API in GluonNLP, which returns the vocabulary along with the model. We include the pooler layer of the pre-trained model by setting `use_pooler` to `True`.
The list of pre-trained BERT models available in GluonNLP can be found [here](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html).

In [2]:
ctx = mx.gpu(0)
bert, vocabulary = nlp.model.get_model('bert_12_768_12', # the 12-layer BERT Base model
                                            dataset_name='book_corpus_wiki_en_uncased',
                                            # use pre-trained weights
                                            pretrained=True, ctx=ctx,
                                            # decoder and classifier are for pre-training only
                                            use_decoder=False, use_classifier=False)

Vocab file is not found. Downloading.
Downloading /home/ec2-user/SageMaker/models/1572691992.7365024book_corpus_wiki_en_uncased-a6607397.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/vocab/book_corpus_wiki_en_uncased-a6607397.zip...
Downloading /home/ec2-user/SageMaker/models/bert_12_768_12_book_corpus_wiki_en_uncased-75cc780f.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/bert_12_768_12_book_corpus_wiki_en_uncased-75cc780f.zip...


Now that we have loaded the BERT model, we only need to attach an additional layer for classification.
The `BERTClassifier` class uses a BERT base model to encode sentence representation, followed by a `nn.Dense` layer for classification. We only need to initialize the classification layer. The encoding layers are already initialized with pre-trained weights. 

In [3]:
net = nlp.model.BERTClassifier(bert, num_classes=2)
net.classifier.initialize(ctx=ctx)  # only initialize the classification layer from scratch
net.hybridize()  # compile the model, required for deployment

## Data Preprocessing

To use the pre-trained BERT model, we need to:
- tokenize the inputs into words,
- insert [CLS] at the beginning of a sentence, 
- insert [SEP] at the end of a sentence, and
- generate segment ids

### Data Transformations

We again use the IMDB dataset, but for this time, downloading using the GluonNLP data API. We then use the transform API to transform the raw scores to positive labels and negative labels. 
To process sentences with BERT-style '[CLS]', '[SEP]' tokens, you can use `data.BERTSentenceTransform` API.

In [4]:
train_dataset_raw = nlp.data.IMDB('train')
test_dataset_raw = nlp.data.IMDB('test')
# tokenize texts into words
tokenizer = nlp.data.BERTTokenizer(vocabulary)
# add begin-of-sentence, end-of-sentence tokens and perform vocabulary lookup
transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=128, pair=False)

def transform_fn(data):
    # transform texts to tensors
    text, label = data
    # transform label into position / negative
    label = 1 if label >= 5 else 0
    data, length, segment_type = transform([text])
    return data.astype('float32'), length.astype('float32'), segment_type.astype('float32'), label

Downloading /home/ec2-user/SageMaker/datasets/imdb/train.json from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/imdb/train.json...
Downloading /home/ec2-user/SageMaker/datasets/imdb/test.json from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/imdb/test.json...


In [5]:
train_dataset = train_dataset_raw.transform(transform_fn)
test_dataset = test_dataset_raw.transform(transform_fn)

data, length, _, label = train_dataset[0]
print('original sentence = \n{}'.format(train_dataset_raw[0][0]))
print('\nword indices = \n{}'.format(data.astype('int32')))

original sentence = 
Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!

word indices = 
[    2 22953  2213  4381  2152  2003  1037  9476  4038  1012  2009  2743
  2012  1996  2168  2051  2004  2070  2060  3454  2055  2082  2166  1010
  2107  2

### Let's Train the Model

Now we have all the pieces to put together, and we can finally start fine-tuning the
model with a few epochs.

In [6]:
padding_id = vocabulary[vocabulary.padding_token]
batchify_fn = nlp.data.batchify.Tuple(
        nlp.data.batchify.Pad(axis=0, pad_val=padding_id), # words
        nlp.data.batchify.Stack(), # valid length
        nlp.data.batchify.Pad(axis=0, pad_val=0), # segment type
        nlp.data.batchify.Stack(np.float32)) # label

train_data = mx.gluon.data.DataLoader(train_dataset,
                               batchify_fn=batchify_fn, shuffle=True,
                               batch_size=batch_size, num_workers=4)
test_data = mx.gluon.data.DataLoader(test_dataset,
                              batchify_fn=batchify_fn,
                              shuffle=False, batch_size=batch_size, num_workers=4)

In [7]:
from mxnet.gluon.contrib.estimator import TrainBegin, BatchBegin


class MyLearningRateHandler(TrainBegin, BatchBegin):
    """Warm-up learning rate handler.

    Parameters
    ----------
    trainer: gluon.Trainer
        Trainer object to adjust the learning rate on.
    num_warmup_steps: int
        Number of initial steps during which the learning rate is linearly
        increased to it's target.
    num_train_steps: int
        Total number of steps to be taken during training. Should be equal to
        the number of batches * number of epochs.
    lr: float
        Base learning rate to reach after warmup.
    """

    def __init__(self, trainer, num_warmup_steps, num_train_steps, lr):
        self.trainer = trainer
        self.num_warmup_steps = num_warmup_steps
        self.num_train_steps = num_train_steps
        self.lr = lr

        self.step_num = 0

    def train_begin(self, estimator, *args, **kwargs):
        self.step_num = 0

    def batch_begin(self, estimator, *args, **kwargs):
        self.step_num += 1
        if self.step_num < self.num_warmup_steps:
            new_lr = self.lr * self.step_num / self.num_warmup_steps
        else:
            non_warmup_steps = self.step_num - self.num_warmup_steps
            offset = non_warmup_steps / (self.num_train_steps - self.num_warmup_steps)
            new_lr = self.lr - offset * self.lr
        self.trainer.set_learning_rate(new_lr)

In [8]:
from mxnet.gluon.contrib import estimator
from mxnet.gluon.utils import split_and_load

class MyEstimator(estimator.Estimator):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # params for grad clipping
        self.params = [p for p in self.net.collect_params().values() if p.grad_req != 'null']
        
    def fit_batch(self, train_batch, batch_axis=0):
        train_batch = [split_and_load(x, ctx_list=self.context, batch_axis=batch_axis) for x in train_batch]
        with mx.autograd.record():
            pred = [self.net(inp, token_type, seq_len) for inp, seq_len, token_type, _ in zip(*train_batch)]
            loss = [self.loss(out, label.astype('float32')) for out, _, _, _, label in zip(pred, *train_batch)]
        mx.autograd.backward(loss)

        # Gradient clipping
        trainer.allreduce_grads()
        nlp.utils.clip_grad_global_norm(self.params, 1)
        trainer.update(1)
        
        return train_batch[:3], train_batch[3], pred, loss

In [9]:
trainer = mx.gluon.Trainer(net.collect_params(), 'bertadam',
                        {'learning_rate': lr, 'wd':0.01})
loss_fn = mx.gluon.loss.SoftmaxCELoss()
metrics = [mx.metric.Loss(), mx.metric.Accuracy()]
event_handlers = [MyLearningRateHandler(trainer=trainer, num_warmup_steps=50, lr=5e-5,
                                       num_train_steps = len(train_data) * num_epochs)]

est = MyEstimator(net=net, loss=loss_fn, metrics=metrics, trainer=trainer, context=ctx)
est.fit(train_data=train_data, epochs=num_epochs, event_handlers=event_handlers)

Training begin: using optimizer BERTAdam with current learning rate 0.0001 
INFO:mxnet.gluon.contrib.estimator.event_handler:Training begin: using optimizer BERTAdam with current learning rate 0.0001 
Train for 1 epochs.
INFO:mxnet.gluon.contrib.estimator.event_handler:Train for 1 epochs.
[Epoch 0] Begin, current learning rate: 0.0001
INFO:mxnet.gluon.contrib.estimator.event_handler:[Epoch 0] Begin, current learning rate: 0.0001
[Epoch 0] Finished in 344.070s, training loss: 0.3652, training accuracy: 0.8371
INFO:mxnet.gluon.contrib.estimator.event_handler:[Epoch 0] Finished in 344.070s, training loss: 0.3652, training accuracy: 0.8371
Train finished using total 344s with 1 epochs. training loss: 0.3652, training accuracy: 0.8371
INFO:mxnet.gluon.contrib.estimator.event_handler:Train finished using total 344s with 1 epochs. training loss: 0.3652, training accuracy: 0.8371


### Inference

In [10]:
def predict_sentiment(net, ctx, vocabulary, bert_tokenizer, sentence):
    ctx = ctx[0] if isinstance(ctx, list) else ctx
    max_len = 128
    padding_id = vocabulary[vocabulary.padding_token]
    
    inputs = mx.nd.array([vocabulary[['[CLS]'] + bert_tokenizer(sentence) + ['SEP']]], ctx=ctx)
    print(inputs)
    seq_len = mx.nd.array([inputs.shape[1]], ctx=ctx)
    token_types = mx.nd.zeros_like(inputs)
    
    out = net(inputs, token_types, seq_len)
    label = mx.nd.argmax(out, axis=1)
    return 'positive' if label.asscalar() == 1 else 'negative'

In [11]:
predict_sentiment(net, ctx, vocabulary, tokenizer, 'this movie is so great')


[[2.000e+00 2.023e+03 3.185e+03 2.003e+03 2.061e+03 2.307e+03 0.000e+00]]
<NDArray 1x7 @gpu(0)>


'positive'

## Deploy on SageMaker

1. Model parameters
2. Code with data pre-processing and model inference
3. A docker container with dependencies installed
4. Launch a serving end-point with SageMaker SDK

### 1. Save Model Parameters

In [12]:
# save parameters, model definition and vocabulary in a zip file
net.export('checkpoint')
with open('vocab.json', 'w') as f:
    f.write(vocabulary.to_json())
import tarfile
with tarfile.open("model.tar.gz", "w:gz") as tar:
    tar.add("checkpoint-0000.params") 
    tar.add("checkpoint-symbol.json") 
    tar.add("vocab.json")

### 2. the Code for Inference

Two functions: 
1. model_fn() to load model parameters
2. transform_fn() to run model inference given an input

In [13]:
%%writefile serve.py
import json, logging, warnings
import gluonnlp as nlp
import mxnet as mx


def model_fn(model_dir):
    """
    Load the gluon model. Called once when hosting service starts.
    :param: model_dir The directory where model files are stored.
    :return: a Gluon model, and the vocabulary
    """
    prefix = 'checkpoint'
    net = mx.gluon.nn.SymbolBlock.imports(prefix + '-symbol.json',
                                          ['data0', 'data1', 'data2'],
                                          prefix + '-0000.params')
    net.load_parameters('%s/' % model_dir + prefix + '-0000.params',
                        ctx=mx.cpu())
    vocab_json = open('%s/vocab.json' % model_dir).read()
    vocab = nlp.Vocab.from_json(vocab_json)
    tokenizer = nlp.data.BERTTokenizer(vocab)
    return net, vocab, tokenizer


def transform_fn(model, data, input_content_type, output_content_type):
    """
    Transform a request using the Gluon model. Called once per request.
    :param model: The Gluon model and the vocab
    :param data: The request payload.
    :param input_content_type: The request content type.
    :param output_content_type: The (desired) response content type.
    :return: response payload and content type.
    """
    # we can use content types to vary input/output handling, but
    # here we just assume json for both
    net, vocabulary, tokenizer = model
    sentence = json.loads(data)
    result = predict_sentiment(net, mx.cpu(), vocabulary, tokenizer, sentence)
    response_body = json.dumps(result)
    return response_body, output_content_type


def predict_sentiment(net, ctx, vocabulary, bert_tokenizer, sentence):
    ctx = ctx[0] if isinstance(ctx, list) else ctx
    max_len = 128
    padding_id = vocabulary[vocabulary.padding_token]
    
    inputs = mx.nd.array([vocabulary[['[CLS]'] + bert_tokenizer(sentence) + ['SEP']]], ctx=ctx)
    print(inputs)
    seq_len = mx.nd.array([inputs.shape[1]], ctx=ctx)
    token_types = mx.nd.zeros_like(inputs)
    
    out = net(inputs, token_types, seq_len)
    label = mx.nd.argmax(out, axis=1)
    return 'positive' if label.asscalar() == 1 else 'negative'

Writing serve.py


### 3. Build a Docker Container for Serving

Let's prepare a docker container with all the dependencies required for model inference. Here we build a docker container based on the SageMaker MXNet inference container, and you can find the list of all available inference containers at https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html

Here we use local mode for demonstration purpose. To deploy on actual instances, you need to login into AWS elastic container registry (ECR) service, and push the container to ECR. 

```
docker build -t $YOUR_EDR_DOCKER_TAG . -f Dockerfile
$(aws ecr get-login --no-include-email --region $YOUR_REGION)
docker push $YOUR_EDR_DOCKER_TAG
```

In [18]:
!cat Dockerfile
!docker build --no-cache -t my-docker:inference . -f Dockerfile -q 

FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/mxnet-inference:1.4.1-gpu-py3
# If running outside of us-west-2, change us-west-2 in above URL to the region you're running from.

RUN pip install --upgrade --user --pre 'mxnet-cu100==1.6.0b20191101' 'git+https://github.com/dmlc/gluon-nlp.git#egg=gluonnlp[extras]'

COPY *.py /opt/ml/model/code/sha256:04226266c940566398559f9c09fad2cd48fc33c65a67184e43c64d53c5605326


## Use SageMaker SDK to Deploy the Model

We create a MXNet model which can be deployed later, by specifying the docker image, and entry point for the inference code. If serve.py does not work, use dummy_hosting_module.py for debugging purpose. 

In [19]:
import sagemaker
from sagemaker.mxnet.model import MXNetModel
sagemaker_model = MXNetModel(model_data='file:///home/ec2-user/SageMaker/EMNLP19-D2L/06_deployment/model.tar.gz',
                             image='my-docker:inference', # docker images
                             role=sagemaker.get_execution_role(), 
                             py_version='py3',            # python version
                             entry_point='serve.py',
                             source_dir='.')

We use 'local' mode to test our deployment code, where the inference happens on the current instance.
If you are ready to deploy the model on a new instance, change the `instance_type` argument to values such as `ml.c4.xlarge`

In [20]:
# Here we use 'local' mode for testing, for real instances use c5.2xlarge, p2.xlarge, etc
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='local')



Attaching to tmpkdyy0cn2_algo-1-smsj1_1
[36malgo-1-smsj1_1  |[0m 2019-11-02 11:11:58,324 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
[36malgo-1-smsj1_1  |[0m MMS Home: /usr/local/lib/python3.6/site-packages
[36malgo-1-smsj1_1  |[0m Current directory: /
[36malgo-1-smsj1_1  |[0m Temp directory: /home/model-server/tmp
[36malgo-1-smsj1_1  |[0m Number of GPUs: 0
[36malgo-1-smsj1_1  |[0m Number of CPUs: 8
[36malgo-1-smsj1_1  |[0m Max heap size: 13646 M
[36malgo-1-smsj1_1  |[0m Python executable: /usr/local/bin/python3.6
[36malgo-1-smsj1_1  |[0m Config file: /etc/sagemaker-mms.properties
[36malgo-1-smsj1_1  |[0m Inference address: http://0.0.0.0:8080
[36malgo-1-smsj1_1  |[0m Management address: http://127.0.0.1:8081
[36malgo-1-smsj1_1  |[0m Model Store: /.sagemaker/mms/models
[36malgo-1-smsj1_1  |[0m Initial Models: ALL
[36malgo-1-smsj1_1  |[0m Log dir: /logs
[36malgo-1-smsj1_1  |[0m Metrics dir: /logs
[36malgo-1-smsj1_1  |[0m Netty threads: 0
[36malgo-1-

In [21]:
output = predictor.predict('The model is deployed. Great!')
print('\nPrediction output: {}\n\n'.format(output))

[36malgo-1-smsj1_1  |[0m 2019-11-02 11:12:01,546 [INFO ] W-9006-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2879
[36malgo-1-smsj1_1  |[0m 2019-11-02 11:12:01,547 [INFO ] W-9001-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2892
[36malgo-1-smsj1_1  |[0m 2019-11-02 11:12:01,574 [WARN ] W-9006-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 	data0: None
[36malgo-1-smsj1_1  |[0m 2019-11-02 11:12:01,575 [WARN ] W-9006-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   input_sym_arg_type = in_param.infer_type()[0]
[36malgo-1-smsj1_1  |[0m 2019-11-02 11:12:01,599 [INFO ] W-9004-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2932
[36malgo-1-smsj1_1  |[0m 2019-11-02 11:12:01,603 [INFO ] W-9002-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2949
[36malgo-1-smsj1_1  |[0m 2019-11-02 11:12:01,606 [INFO ] W-9003-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 

### Clean Up

Remove the endpoint after we are done. 

In [22]:
predictor.delete_endpoint()

Gracefully stopping... (press Ctrl+C again to force)


# Resources
- Amazon SageMaker https://aws.amazon.com/sagemaker/
- Amazon SageMaker Python SDK https://sagemaker.readthedocs.io/
- GluonNLP http://gluon-nlp.mxnet.io/
- GluonCV http://gluon-cv.mxnet.io/
- GluonTS https://gluon-ts.mxnet.io/
- Dive into Deep Learning http://d2l.ai/
- MXNet Forum https://discuss.mxnet.io/

For more fine-tuning scripts, visit the [BERT model zoo webpage](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html).

## References

[1] Devlin, Jacob, et al. "Bert:
Pre-training of deep
bidirectional transformers for language understanding."
arXiv preprint
arXiv:1810.04805 (2018).

[2] Dolan, William B., and Chris
Brockett.
"Automatically constructing a corpus of sentential paraphrases."
Proceedings of
the Third International Workshop on Paraphrasing (IWP2005). 2005.

[3] Peters,
Matthew E., et al. "Deep contextualized word representations." arXiv
preprint
arXiv:1802.05365 (2018).

[4] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (gelus)." arXiv preprint arXiv:1606.08415 (2016).