# Train and Deploy Your BERT Model with GluonNLP on SageMaker

## Fine-tuning BERT for Sentiment Analysis

In this section, we fine-tune the BERT Base model for sentiment analysis on the IMDB dataset.

## Preparation

First, let's install the necessary dependencies.

In [1]:
!pip install mxnet-cu100mkl d2l https://github.com/dmlc/gluon-nlp/tarball/master -U -q
!pip install sagemaker-containers -U -q
import argparse, time, os, tarfile
import d2l
import numpy as np
import mxnet as mx
import gluonnlp as nlp
import utils
import sagemaker

[31mmxnet-cu100mkl 1.5.1.post0 has requirement numpy<2.0.0,>1.16.0, but you'll have numpy 1.14.5 which is incompatible.[0m
[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[31mmxnet-cu100mkl 1.5.1.post0 has requirement numpy<2.0.0,>1.16.0, but you'll have numpy 1.14.5 which is incompatible.[0m
[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
parser = argparse.ArgumentParser(description='BERT sentiment analysis fine-tune example.')
parser.add_argument('--batch_size', type=int, default=32,
                    help='batch size per GPU.')
parser.add_argument('--num_epochs', type=int, default=1, 
                    help='The number of epochs to train')
parser.add_argument('--lr', type=float, default=5e-5,
                    help='Learning rate')

args = parser.parse_args([])
print(args)

Namespace(batch_size=32, lr=5e-05, num_epochs=1)


### Get Pre-trained BERT Model

We can load the pre-trained BERT fairly easily using the model API in GluonNLP, which returns the vocabulary along with the model. We include the pooler layer of the pre-trained model by setting `use_pooler` to `True`.
The list of pre-trained BERT models available in GluonNLP can be found [here](../../model_zoo/bert/index.rst).

Now that we have loaded the BERT model, we only need to attach an additional layer for classification.
The `BERTClassifier` class uses a BERT base model to encode sentence representation, followed by a `nn.Dense` layer for classification. We only need to initialize the classification layer. The encoding layers are already initialized with pre-trained weights. 

In [3]:
ctx = d2l.try_all_gpus()
bert_base, vocabulary = nlp.model.get_model('bert_12_768_12',
                                            dataset_name='book_corpus_wiki_en_uncased',
                                            pretrained=True, ctx=ctx,
                                            use_decoder=False, use_classifier=False)
loss_fn = mx.gluon.loss.SoftmaxCELoss()
net = nlp.model.BERTClassifier(bert_base, 2)
net.classifier.initialize(ctx=ctx)
net.hybridize()
print(net)

BERTClassifier(
  (bert): BERTModel(
    (encoder): BERTEncoder(
      (dropout_layer): Dropout(p = 0.1, axes=())
      (layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
      (transformer_cells): HybridSequential(
        (0): BERTEncoderCell(
          (dropout_layer): Dropout(p = 0.1, axes=())
          (attention_cell): MultiHeadAttentionCell(
            (_base_cell): DotProductAttentionCell(
              (_dropout_layer): Dropout(p = 0.1, axes=())
            )
            (proj_query): Dense(768 -> 768, linear)
            (proj_key): Dense(768 -> 768, linear)
            (proj_value): Dense(768 -> 768, linear)
          )
          (proj): Dense(768 -> 768, linear)
          (ffn): BERTPositionwiseFFN(
            (ffn_1): Dense(768 -> 3072, linear)
            (activation): GELU()
            (ffn_2): Dense(3072 -> 768, linear)
            (dropout_layer): Dropout(p = 0.1, axes=())
            (layer_norm): BERTLayerNorm(eps=1e-12, axis

## Data Preprocessing

To use the pre-trained BERT model, we need to:
- tokenize the inputs into words,
- insert [CLS] at the beginning of a sentence, 
- insert [SEP] at the end of a sentence, and
- generate segment ids

### Data Transformations

We again use the IMDB dataset, but for this time, downloading using the GluonNLP data API. We then use the transform API to transform the raw scores to positive labels and negative labels. 
To process sentences with BERT-style '[CLS]', '[SEP]' tokens, you can use `data.BERTSentenceTransform` API.

In [4]:
train_dataset_raw = nlp.data.IMDB('train')
test_dataset_raw = nlp.data.IMDB('test')

tokenizer = nlp.data.BERTTokenizer(vocabulary)
transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=128, pad=False, pair=False)

def transform_fn(data):
    text, label = data
    # transform label into position / negative
    label = 1 if label >= 5 else 0
    data, length, segment_type = transform([text])
    return data.astype('float32'), length.astype('float32'), segment_type.astype('float32'), label

In [5]:
train_dataset = train_dataset_raw.transform(transform_fn)
test_dataset = test_dataset_raw.transform(transform_fn)

data, length, _, label = train_dataset[0]
print('original sentence = \n{}'.format(train_dataset_raw[0][0]))
print('word indices = \n{}'.format(data.astype('int32')))

original sentence = 
Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!
word indices = 
[    2 22953  2213  4381  2152  2003  1037  9476  4038  1012  2009  2743
  2012  1996  2168  2051  2004  2070  2060  3454  2055  2082  2166  1010
  2107  20

### Let's Train the Model!

Now we have all the pieces to put together, and we can finally start fine-tuning the
model with a few epochs.

In [6]:
batch_size = args.batch_size * len(ctx)
train_data, test_data = utils.get_dataloader(batch_size, vocabulary, train_dataset, test_dataset)
tick = time.time()
utils.fit(net, train_data, test_data, args.num_epochs, args.lr, ctx, loss_fn)
tock = time.time()
print('Elapsed time (sec): ', tock-tick)

Batch 0, Train Acc 0.5546875, Train Loss 0.7107145041227341
Batch 25, Train Acc 0.6682692307692307, Train Loss 0.5863065511847918
Batch 50, Train Acc 0.7225796568627451, Train Loss 0.5350419924977947
Batch 75, Train Acc 0.7552425986842105, Train Loss 0.49032151713771255
Batch 100, Train Acc 0.773128094059406, Train Loss 0.46394537477800163
Batch 125, Train Acc 0.7899925595238095, Train Loss 0.43633691333825625
Batch 150, Train Acc 0.8017901490066225, Train Loss 0.4163205347560494
Batch 175, Train Acc 0.8115678267045454, Train Loss 0.3999007060615854
Epoch 0, Train Acc 0.81832, Train Loss 0.38640599939687065
Test Acc 0.8814174885652504,
Elapsed time (sec):  245.78527092933655


Process ForkPoolWorker-8:
Process ForkPoolWorker-7:
Process ForkPoolWorker-4:
Process ForkPoolWorker-1:
Process ForkPoolWorker-5:
Process ForkPoolWorker-3:
Process ForkPoolWorker-6:
Process ForkPoolWorker-2:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    sel

### Inference

In [7]:
utils.predict_sentiment(net, ctx, vocabulary, tokenizer, 'this movie is so great')

'positive'

## Deploy on SageMaker
### Save Model Checkpoint and Upload to S3

In [8]:
net.export('checkpoint')
with open('vocab.json', 'w') as f:
    f.write(vocabulary.to_json())
with tarfile.open("model.tar.gz", "w:gz") as tar:
    tar.add("checkpoint-0000.params") # parameters
    tar.add("checkpoint-symbol.json") # model definition
    tar.add("vocab.json")             # vocabulary

session = sagemaker.Session()
uploaded_model = session.upload_data(path='model.tar.gz', key_prefix='model')
s3_path = 's3://' + session.default_bucket() + '/model/model.tar.gz'
print("Model was uploaded to", s3_path)

Model was uploaded to s3://sagemaker-us-east-1-397262719838/model/model.tar.gz


## serve.py - the Code for Inference

In [9]:
import inspect
from serve import model_fn, transform_fn

### model_fn to Deserialize Checkpoints

In [10]:
print(inspect.getsource(model_fn))

def model_fn(model_dir):
    """
    Load the gluon model. Called once when hosting service starts.
    :param: model_dir The directory where model files are stored.
    :return: a Gluon model and the vocabulary
    """
    prefix = 'checkpoint'
    net = mx.gluon.nn.SymbolBlock.imports(prefix + '-symbol.json',
                                          ['data0', 'data1', 'data2'],
                                          prefix + '-0000.params')
    net.load_parameters('%s/'%model_dir + prefix + '-0000.params', ctx=mx.cpu())
    vocab_json = open('%s/vocab.json'%model_dir).read()
    vocab = nlp.vocab.BERTVocab.from_json(vocab_json)
    return net, vocab



### transform_fn to Run Model Inference for an Input

In [11]:
print(inspect.getsource(transform_fn))

def transform_fn(model, data, input_content_type, output_content_type):
    """
    Transform a request using the Gluon model. Called once per request.
    :param net: The Gluon model and the vocab
    :param data: The request payload.
    :param input_content_type: The request content type.
    :param output_content_type: The (desired) response content type.
    :return: response payload and content type.
    """
    net, vocabulary = model
    sentence = json.loads(data)
    tokenizer = nlp.data.BERTTokenizer(vocabulary)
    result = predict_sentiment(net, mx.cpu(), vocabulary, tokenizer, sentence)
    response_body = json.dumps(result)
    return response_body, output_content_type



### Build a Docker Container for Serving

Let's prepare a docker container with all the dependencies required for model inference. Here we build a docker container based on the SageMaker MXNet inference container, and you can find the list of all available inference containers at https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html

In [12]:
!cat Dockerfile

FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3

RUN pip install mxnet-mkl d2l https://github.com/dmlc/gluon-nlp/tarball/master -U --user

COPY *.py /opt/ml/model/code/

And login to elastic container registry service to register the container

In [13]:
!$(aws ecr get-login --no-include-email --region us-east-1)
!docker build -t 397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test:inference . -f Dockerfile
!docker push 397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test:inference

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Sending build context to Docker daemon  845.5MB
Step 1/3 : FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3
 ---> 9a2aa1c6893e
Step 2/3 : RUN pip install mxnet-mkl d2l https://github.com/dmlc/gluon-nlp/tarball/master -U --user
 ---> Using cache
 ---> c268be8e81fb
Step 3/3 : COPY *.py /opt/ml/model/code/
 ---> Using cache
 ---> eb923b16f903
Successfully built eb923b16f903
Successfully tagged 397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test:inference
The push refers to repository [397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test]

[1B1c1c1537: Preparing 
[1B96fe5fd5: Preparing 
[1B7d751557: Preparing 
[1B513002ec: Preparing 
[1Bb180e24b: Preparing 
[1B32742db8: Preparing 
[1B32c12889: Preparing 
[1Bf7480aac: Preparing 
[1Be6c202db: Preparing 
[1B937fef50: Preparing 
[1Bb1acf2ed: Preparing 
[1B204a31d2: Preparing 
[8B32742db8: Waiting g

## Use SageMaker SDK to Deploy the Model

We create a MXNet model which can be deployed later, by specifying the docker image, and entry point for the inference code. If serve.py does not work, use dummy_hosting_module.py for debugging purpose. 

In [18]:
from sagemaker.mxnet.model import MXNetModel
sagemaker_model = MXNetModel(model_data=s3_path,
                             image='397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test:inference',
                             role=sagemaker.get_execution_role(),
                             py_version='py3',
                             framework_version='1.4.1',
                             entry_point='serve.py',
                             source_dir='.')

We use 'local' mode to test our deployment code, where the inference happens on the current instance.
If you are ready to deploy the model on a new instance, change the `instance_type` argument to values such as `ml.c4.xlarge`

In [19]:
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='local')

Attaching to tmpqsvetadr_algo-1-fsz17_1
[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:15,836 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
[36malgo-1-fsz17_1  |[0m MMS Home: /usr/local/lib/python3.6/site-packages
[36malgo-1-fsz17_1  |[0m Current directory: /
[36malgo-1-fsz17_1  |[0m Temp directory: /home/model-server/tmp
[36malgo-1-fsz17_1  |[0m Number of GPUs: 0
[36malgo-1-fsz17_1  |[0m Number of CPUs: 32
[36malgo-1-fsz17_1  |[0m Max heap size: 27305 M
[36malgo-1-fsz17_1  |[0m Python executable: /usr/local/bin/python3.6
[36malgo-1-fsz17_1  |[0m Config file: /etc/sagemaker-mms.properties
[36malgo-1-fsz17_1  |[0m Inference address: http://0.0.0.0:8080
[36malgo-1-fsz17_1  |[0m Management address: http://127.0.0.1:8081
[36malgo-1-fsz17_1  |[0m Model Store: /.sagemaker/mms/models
[36malgo-1-fsz17_1  |[0m Initial Models: ALL
[36malgo-1-fsz17_1  |[0m Log dir: /logs
[36malgo-1-fsz17_1  |[0m Metrics dir: /logs
[36malgo-1-fsz17_1  |[0m Netty threads: 0
[36malgo-1

[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:18,130 [INFO ] pool-1-thread-33 ACCESS_LOG - /172.18.0.1:35958 "GET /ping HTTP/1.1" 200 57
!

In [20]:
output = predictor.predict('The model is deployed. Great!')
print('\nPrediction output: {}\n\n'.format(output))

[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:18,264 [INFO ] W-9003-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2073
[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:18,271 [INFO ] W-9011-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2075
[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:18,278 [INFO ] W-9029-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2083
[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:18,292 [INFO ] W-9002-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2100
[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:18,297 [INFO ] W-9028-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2101
[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:18,302 [INFO ] W-9026-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2111
[36malgo-1-fsz17_1  |[0m 2019-10-09 18:31:18,309 [WARN ] W-9003-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 	data0: None
[36malgo-1-fsz17_1  |[0

### Clean Up

Remove the endpoint after we are done. 

In [21]:
predictor.delete_endpoint()

Gracefully stopping... (press Ctrl+C again to force)


## Conclusion

In this tutorial, we showed how to fine-tune sentiment analysis model with pre-trained BERT parameters. In GluonNLP, this can be done with such few, simple steps. All we did was apply a BERT-style data transformation to pre-process the data, automatically download the pre-trained model, and feed the transformed data into the model, all within 50 lines of code!

For more fine-tuning scripts, visit the [BERT model zoo webpage](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html).

## References

[1] Devlin, Jacob, et al. "Bert:
Pre-training of deep
bidirectional transformers for language understanding."
arXiv preprint
arXiv:1810.04805 (2018).

[2] Dolan, William B., and Chris
Brockett.
"Automatically constructing a corpus of sentential paraphrases."
Proceedings of
the Third International Workshop on Paraphrasing (IWP2005). 2005.

[3] Peters,
Matthew E., et al. "Deep contextualized word representations." arXiv
preprint
arXiv:1802.05365 (2018).

[4] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (gelus)." arXiv preprint arXiv:1606.08415 (2016).

For fine-tuning, we only need to initialize the last classifier layer from scratch. The other layers are already initialized from the pre-trained model weights.