# Fin-tuning BERT for Sentiment Analysis

## Preparation

First, let's import necessary modules.

In [1]:
!pip install mxnet-cu100mkl d2l https://github.com/dmlc/gluon-nlp/tarball/master sagemaker-containers -U -q

[31mmxnet-cu100mkl 1.5.1.post0 has requirement numpy<2.0.0,>1.16.0, but you'll have numpy 1.14.5 which is incompatible.[0m
[33mYou are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
import argparse, time, os
import d2l
import numpy as np
import mxnet as mx
from mxnet import gluon
import gluonnlp as nlp
from utils import train_loop, predict_sentiment

parser = argparse.ArgumentParser(description='BERT sentiment analysis fine-tune example.')
parser.add_argument('--batch_size', type=int, default=32,
                    help='batch size per GPU. total_batch_size = batch_size_per_gpu * num_gpus')
parser.add_argument('--num_epochs', type=int, default=1, help='The number of epochs to train')
parser.add_argument('--lr', type=float, default=5e-5, help='Learning rate')

args = parser.parse_args([])
print(args)

Namespace(batch_size=32, lr=5e-05, num_epochs=1)


In this section, we fine-tune the BERT Base model for sentiment analysis on the IMDB dataset.

### BERT for Sentiment Analysis

### Get Pre-trained BERT Model

We can load the pre-trained BERT fairly easily using the model API in GluonNLP, which returns the vocabulary along with the model. We include the pooler layer of the pre-trained model by setting `use_pooler` to `True`.
The list of pre-trained BERT models available in GluonNLP can be found [here](../../model_zoo/bert/index.rst).

Now that we have loaded the BERT model, we only need to attach an additional layer for classification.
The `BERTClassifier` class uses a BERT base model to encode sentence representation, followed by a `nn.Dense` layer for classification. We only need to initialize the classification layer. The encoding layers are already initialized with pre-trained weights.

In [3]:
ctx = d2l.try_all_gpus()
bert_base, vocabulary = nlp.model.get_model('bert_12_768_12',
                                            dataset_name='book_corpus_wiki_en_uncased',
                                            pretrained=True, ctx=ctx,
                                            use_decoder=False, use_classifier=False)
loss_fn = mx.gluon.loss.SoftmaxCELoss()
net = nlp.model.BERTClassifier(bert_base, 2)
net.classifier.initialize(ctx=ctx)
net.hybridize()
print(net)

Vocab file is not found. Downloading.
Downloading /home/ec2-user/.mxnet/models/1570161310.8147109book_corpus_wiki_en_uncased-a6607397.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/vocab/book_corpus_wiki_en_uncased-a6607397.zip...
Downloading /home/ec2-user/.mxnet/models/bert_12_768_12_book_corpus_wiki_en_uncased-75cc780f.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/bert_12_768_12_book_corpus_wiki_en_uncased-75cc780f.zip...
BERTClassifier(
  (bert): BERTModel(
    (encoder): BERTEncoder(
      (dropout_layer): Dropout(p = 0.1, axes=())
      (layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
      (transformer_cells): HybridSequential(
        (0): BERTEncoderCell(
          (dropout_layer): Dropout(p = 0.1, axes=())
          (attention_cell): MultiHeadAttentionCell(
            (_base_cell): DotProductAttentionCell(
              (_dropout_layer): Dropout(p = 0.1, axes=())
  

## Data Preprocessing

To use the pre-trained BERT model, we need to:
- tokenize the inputs into word pieces,
- insert [CLS] at the beginning of a sentence, 
- insert [SEP] at the end of a sentence, and
- generate segment ids

### Data Transformations

We again use the IMDB dataset, but for this time, downloading using the GluonNLP data API. We then use the transform API to transform the raw scores to positive labels and negative labels. 
To process sentences with BERT-style '[CLS]', '[SEP]' tokens, you can use `data.BERTSentenceTransform` API.

In [4]:
train_dataset_raw = nlp.data.IMDB('train')
train_dataset_raw = mx.gluon.data.SimpleDataset(train_dataset_raw[:100])
test_dataset_raw = nlp.data.IMDB('test')
test_dataset_raw = mx.gluon.data.SimpleDataset(test_dataset_raw[:100])

tokenizer = nlp.data.BERTTokenizer(vocabulary)

def transform_fn(data):
    text, label = data
    # Transform label into position / negative
    label = 1 if label >= 5 else 0
    transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=128,
                                               pad=False, pair=False)
    data, length, segment_type = transform([text])
    data = data.astype('float32')
    length = length.astype('float32')
    segment_type = segment_type.astype('float32')
    return data, length, segment_type, label

Downloading /home/ec2-user/.mxnet/datasets/imdb/train.json from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/imdb/train.json...
Downloading /home/ec2-user/.mxnet/datasets/imdb/test.json from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/imdb/test.json...


In [5]:
train_dataset = train_dataset_raw.transform(transform_fn)
test_dataset = test_dataset_raw.transform(transform_fn)

print(vocabulary)
print('Index for [CLS] = ', vocabulary['[CLS]'])
print('Index for [SEP] = ', vocabulary['[SEP]'])

data, length, segment_type, label = train_dataset[0]
print('words = ', data.astype('int32'))

Vocab(size=30522, unk="[UNK]", reserved="['[CLS]', '[SEP]', '[MASK]', '[PAD]']")
Index for [CLS] =  2
Index for [SEP] =  3
words =  [    2 22953  2213  4381  2152  2003  1037  9476  4038  1012  2009  2743
  2012  1996  2168  2051  2004  2070  2060  3454  2055  2082  2166  1010
  2107  2004  1000  5089  1000  1012  2026  3486  2086  1999  1996  4252
  9518  2599  2033  2000  2903  2008 22953  2213  4381  2152  1005  1055
 18312  2003  2172  3553  2000  4507  2084  2003  1000  5089  1000  1012
  1996 25740  2000  5788 13732  1010  1996 12369  3993  2493  2040  2064
  2156  2157  2083  2037 17203  5089  1005 13433  8737  1010  1996  9004
 10196  4757  1997  1996  2878  3663  1010  2035 10825  2033  1997  1996
  2816  1045  2354  1998  2037  2493  1012  2043  1045  2387  1996  2792
  1999  2029  1037  3076  8385  2699  2000  6402  2091  1996  2082  1010
  1045  3202  7383  1012  1012  1012  1012     3]


### Batchify and Data Loader

In [6]:
padding_id = vocabulary[vocabulary.padding_token]
batchify_fn = nlp.data.batchify.Tuple(
        # words: the first dimension is the batch dimension
        nlp.data.batchify.Pad(axis=0, pad_val=padding_id),
        # valid length
        nlp.data.batchify.Stack(),
        # segment type : the first dimension is the batch dimension
        nlp.data.batchify.Pad(axis=0, pad_val=padding_id),
        # label
        nlp.data.batchify.Stack(np.float32))

batch_size = args.batch_size * len(ctx)
train_data = mx.gluon.data.DataLoader(train_dataset,
                                   batchify_fn=batchify_fn, shuffle=True,
                                   batch_size=batch_size, num_workers=4)
test_data = mx.gluon.data.DataLoader(test_dataset,
                                  batchify_fn=batchify_fn,
                                  shuffle=False, batch_size=batch_size, num_workers=4)

### Training Loop

Now we have all the pieces to put together, and we can finally start fine-tuning the
model with a few epochs.

In [7]:
tick = time.time()
train_loop(net, train_data, test_data, args.num_epochs, args.lr, ctx, loss_fn)
tock = time.time()
print('Elapsed time (sec): ', tock-tick)

Batch 0, Train Acc 0.01, Train Loss 1.0298542380332947
Epoch 0, Train Acc ('accuracy', 0.01), Train Loss 1.0298542380332947
Test Acc 0.0,
Elapsed time (sec):  4.168322801589966


### Save model checkpoint 

In [12]:
net.export('checkpoint')
with open('vocab.json', 'w') as f:
    f.write(vocabulary.to_json())

### Inference

In [31]:
predict_sentiment(net, ctx, vocabulary, tokenizer, 'this movie is so great')

'positive'

## Deploy on SageMaker
### Upload checkpoint to S3

In [None]:
import sagemaker, tarfile

tar = tarfile.open("model.tar.gz", "w:gz")
tar.add("checkpoint-0000.params")
tar.add("checkpoint-symbol.json")
tar.add("vocab.json")
tar.close()

session = sagemaker.Session()
uploaded_model = session.upload_data(path='model.tar.gz', key_prefix='model')
s3_path = 's3://' + session.default_bucket() + '/model/model.tar.gz'
print("Model was uploaded to", s3_path)

Login to elastic container registry service

In [163]:
!$(aws ecr get-login --no-include-email --region us-east-1)
!docker build -t 397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test:inference . -f Dockerfile
!docker push 397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test:inference

Sending build context to Docker daemon    848MB
Step 1/3 : FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3
 ---> 9a2aa1c6893e
Step 2/3 : RUN pip install mxnet-mkl d2l https://github.com/dmlc/gluon-nlp/tarball/master -U --user
 ---> Running in 9ee594a0d5cd
Collecting https://github.com/dmlc/gluon-nlp/tarball/master
  Downloading https://github.com/dmlc/gluon-nlp/tarball/master
Collecting mxnet-mkl
  Downloading https://files.pythonhosted.org/packages/8d/d2/94dbb66de069ae47dcad4773c2968e9e950cff042c508bdd0e14f509230c/mxnet_mkl-1.5.1.post0-py2.py3-none-manylinux1_x86_64.whl (61.8MB)
Collecting d2l
  Downloading https://files.pythonhosted.org/packages/c9/d3/e9b92d14359953524609ed21f41508cfbba3814313668b125c904c7cc6e2/d2l-0.10.1.tar.gz
Collecting jupyter (from d2l)
  Downloading https://files.pythonhosted.org/packages/83/df/0f5dd132200728a86190397e1ea87cd76244e42d39ec5e88efd25b2abd7e/jupyter-1.0.0-py2.py3-none-any.whl
Collecting matplotlib (from d2l)
  Downlo

Collecting pandocfilters>=1.4.1 (from nbconvert->jupyter->d2l)
  Downloading https://files.pythonhosted.org/packages/4c/ea/236e2584af67bb6df960832731a6e5325fd4441de001767da328c33368ce/pandocfilters-1.4.2.tar.gz
Collecting bleach (from nbconvert->jupyter->d2l)
  Downloading https://files.pythonhosted.org/packages/ab/05/27e1466475e816d3001efb6e0a85a819be17411420494a1e602c36f8299d/bleach-3.1.0-py2.py3-none-any.whl (157kB)
Collecting mistune<2,>=0.8.1 (from nbconvert->jupyter->d2l)
  Downloading https://files.pythonhosted.org/packages/09/ec/4b43dae793655b7d8a25f76119624350b4d65eb663459eb9603d7f1f0345/mistune-0.8.4-py2.py3-none-any.whl
Collecting defusedxml (from nbconvert->jupyter->d2l)
  Downloading https://files.pythonhosted.org/packages/06/74/9b387472866358ebc08732de3da6dc48e44b0aacd2ddaa5cb85ab7e986a2/defusedxml-0.6.0-py2.py3-none-any.whl
Collecting jsonschema!=2.5.0,>=2.4 (from nbformat>=4.2.0->ipywidgets->jupyter->d2l)
  Downloading https://files.pythonhosted.org/packages/54/48/f5f11

You should consider upgrading via the 'pip install --upgrade pip' command.
[0mRemoving intermediate container 9ee594a0d5cd
 ---> c268be8e81fb
Step 3/3 : COPY *.py /opt/ml/model/code/
 ---> 3f65c2d3f1c0
Successfully built 3f65c2d3f1c0
Successfully tagged 397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test:inference


In [183]:
from sagemaker.mxnet.model import MXNetModel

sagemaker_model = MXNetModel(model_data=s3_path,
                             image='397262719838.dkr.ecr.us-east-1.amazonaws.com/haibin-test:inference',
                             role=sagemaker.get_execution_role(),
                             py_version='py3',
                             framework_version='1.4.1',
                             entry_point='serve.py',
                             source_dir='.')

In [186]:
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='local')

Attaching to tmppfeynr4a_algo-1-n3hmx_1
[36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:54,888 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
[36malgo-1-n3hmx_1  |[0m MMS Home: /usr/local/lib/python3.6/site-packages
[36malgo-1-n3hmx_1  |[0m Current directory: /
[36malgo-1-n3hmx_1  |[0m Temp directory: /home/model-server/tmp
[36malgo-1-n3hmx_1  |[0m Number of GPUs: 0
[36malgo-1-n3hmx_1  |[0m Number of CPUs: 32
[36malgo-1-n3hmx_1  |[0m Max heap size: 27305 M
[36malgo-1-n3hmx_1  |[0m Python executable: /usr/local/bin/python3.6
[36malgo-1-n3hmx_1  |[0m Config file: /etc/sagemaker-mms.properties
[36malgo-1-n3hmx_1  |[0m Inference address: http://0.0.0.0:8080
[36malgo-1-n3hmx_1  |[0m Management address: http://127.0.0.1:8081
[36malgo-1-n3hmx_1  |[0m Model Store: /.sagemaker/mms/models
[36malgo-1-n3hmx_1  |[0m Initial Models: ALL
[36malgo-1-n3hmx_1  |[0m Log dir: /logs
[36malgo-1-n3hmx_1  |[0m Metrics dir: /logs
[36malgo-1-n3hmx_1  |[0m Netty threads: 0
[36malgo-1

[36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:56,230 [INFO ] pool-1-thread-33 ACCESS_LOG - /172.18.0.1:60632 "GET /ping HTTP/1.1" 200 49
![36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:57,306 [INFO ] W-9012-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - test
[36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:57,309 [INFO ] W-9012-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2064
[36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:57,312 [INFO ] W-9001-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - test
[36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:57,314 [INFO ] W-9001-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2069
[36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:57,325 [INFO ] W-9004-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - test
[36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:57,325 [INFO ] W-9019-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - test
[36malgo-1-n3hmx_1  |[0m 2019-10-04 06:57:57,326 [INFO ] W-9004-model com.

In [195]:
output = predictor.predict('The model is deployed. Great!')
print('\nPrediction output: {}\n\n'.format(output))

[36malgo-1-n3hmx_1  |[0m 2019-10-04 16:49:24,992 [INFO ] W-9026-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_fn
[36malgo-1-n3hmx_1  |[0m 2019-10-04 16:49:25,014 [WARN ] W-9026-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 	data0: None
[36malgo-1-n3hmx_1  |[0m 2019-10-04 16:49:25,014 [WARN ] W-9026-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   input_sym_arg_type = in_param.infer_type()[0]
[36malgo-1-n3hmx_1  |[0m 2019-10-04 16:49:25,873 [INFO ] W-9026-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - transform_fn
[36malgo-1-n3hmx_1  |[0m 2019-10-04 16:49:26,009 [INFO ] W-9026-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1018
[36malgo-1-n3hmx_1  |[0m 2019-10-04 16:49:26,009 [INFO ] W-9026-model ACCESS_LOG - /172.18.0.1:32840 "POST /invocations HTTP/1.1" 200 1018

Prediction output: positive




## Conclusion

In this tutorial, we showed how to fine-tune sentiment analysis model with pre-trained BERT parameters. In GluonNLP, this can be done with such few, simple steps. All we did was apply a BERT-style data transformation to pre-process the data, automatically download the pre-trained model, and feed the transformed data into the model, all within 50 lines of code!

For more fine-tuning scripts, visit the [BERT model zoo webpage](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html).

## References

[1] Devlin, Jacob, et al. "Bert:
Pre-training of deep
bidirectional transformers for language understanding."
arXiv preprint
arXiv:1810.04805 (2018).

[2] Dolan, William B., and Chris
Brockett.
"Automatically constructing a corpus of sentential paraphrases."
Proceedings of
the Third International Workshop on Paraphrasing (IWP2005). 2005.

[3] Peters,
Matthew E., et al. "Deep contextualized word representations." arXiv
preprint
arXiv:1802.05365 (2018).

[4] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (gelus)." arXiv preprint arXiv:1606.08415 (2016).

For fine-tuning, we only need to initialize the last classifier layer from scratch. The other layers are already initialized from the pre-trained model weights.