# Finetuning PyTorch BERT with NGC

Pre-trained language representations have been shown to improve many downstream NLP tasks such as question answering, and natural language inference. Devlin et al proposed [BERT](https://arxiv.org/abs/1706.03762) (Bidirectional Encoder Representations from Transformers), which fine-tunes deep bidirectional representations on a wide range of tasks with minimal task-specific parameters, and obtained state-of-the-art results.

In this tutorial, we will focus on adapting the BERT model for the question answering task on the SQuAD dataset. Specifically, we will:

- learn how to pre-process the [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/) to leverage the learnt representation in BERT;
- adapt the BERT model to the question answering task;
- load a pretrained BERT model and finetune it;
- inference on the SQuAD test dataset.

[**NGC**](https://www.nvidia.com/en-us/gpu-cloud/) is the hub for GPU-optimized software for deep learning and high-performance computing (HPC) that takes care of all the plumbing, so that researchers can focus on building solutions, gathering insights, and delivering business value. We will be finetuning on the pretrained BERT model provided by NGC.

Now let's get started and we first import the packages for this tutorial.

In [5]:
import collections, datetime, json, math, os, tarfile, time
from io import StringIO

import boto3
import numpy as np
import torch
import sagemaker
from sagemaker.pytorch import estimator, PyTorch, PyTorchModel, PyTorchPredictor
from sagemaker.utils import name_from_base

from file_utils import PYTORCH_PRETRAINED_BERT_CACHE
from helper_funcs import *
from modeling import BertForQuestionAnswering, BertConfig, WEIGHTS_NAME, CONFIG_NAME
from tokenization import (BasicTokenizer, BertTokenizer, whitespace_tokenize)
from types import SimpleNamespace

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket() # can replace with your own S3 bucket

with open('s3_bucket.txt','w') as f:
    f.write(f's3://{bucket}')
with open('hyperparameters.json', 'r') as f:
    params = json.load(f)
params['save_to_s3'] = bucket
with open('hyperparameters.json', 'w') as f:
    json.dump(params, f)

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


## I. Create training docker container

### Step 1. Build the docker container
Now we create a custom docker container based on the NGC Bert container.

In [6]:
%%sh

# The name of our algorithm
algorithm_name=bert-torch-train

chmod +x train
chmod +x serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.
docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

Login Succeeded
Sending build context to Docker daemon  4.488MB
Step 1/15 : ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.12-py3
Step 2/15 : FROM ${FROM_IMAGE_NAME}
 ---> be021446e08c
Step 3/15 : RUN apt-get update && apt-get install -y pbzip2 pv bzip2 cabextract nginx wget
 ---> Using cache
 ---> 20e5df32afd9
Step 4/15 : ENV BERT_PREP_WORKING_DIR /workspace/bert/data
 ---> Using cache
 ---> d1c956a64d99
Step 5/15 : WORKDIR /workspace
 ---> Using cache
 ---> 8a9198ec2aef
Step 6/15 : RUN git clone https://github.com/attardi/wikiextractor.git
 ---> Using cache
 ---> f4e8f72275d7
Step 7/15 : RUN git clone https://github.com/soskek/bookcorpus.git
 ---> Using cache
 ---> adfa1eb810fc
Step 8/15 : WORKDIR /workspace/bert
 ---> Using cache
 ---> b9ab3f70fe46
Step 9/15 : RUN pip install --upgrade --no-cache-dir pip  && pip install --no-cache-dir  gevent flask pathlib gunicorn tqdm boto3 requests six ipdb h5py html2text nltk progressbar onnxruntime git+https://github.com/NVIDIA/dllogger
 ---> 

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



### Step 2. Edit ECR permission

Once the built container is done, we can push it to the AWS Elastic Container Registry (ECR). However, for the security consideration, you well need to access ECR and change the permission. To do that, we first create the individual **unique** Json file similar as below:

```{`json}
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "All-Allow",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::363160369090:role/TeamRole"
        ]
      },
      "Action": "*"
    }
  ]
}
```

Then, we [access this link](https://console.aws.amazon.com/ecr/repositories/bert-torch-train/permissions?region=us-east-1) and click `Edit Policy JSON`, then paste the above Json text and click `Save`.



(**NOTE: You only need to replace** the sample Json's `arn:aws:sts` with your own as shown below:)

In [3]:
role = sagemaker.get_execution_role()
role

'383827541835:role/service-role/AmazonSageMaker-ExecutionRole-20200409T103675'

### Step 3. Push the docker container

With the ECR permission, let's push our built docker container to ECR.

In [7]:
%%sh

algorithm_name=bert-torch-train
account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
# region=${region:-us-east-1}
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

docker push ${fullname}

The push refers to repository [383827541835.dkr.ecr.us-east-1.amazonaws.com/bert-torch-train]
ad39165090e3: Preparing
8a981c83dc4b: Preparing
41f8d376def7: Preparing
ba3235a79382: Preparing
24d1efc3cf77: Preparing
51a87510e77a: Preparing
f7af15bc1593: Preparing
4ca33f39c33d: Preparing
356caf7061e4: Preparing
73765adce069: Preparing
fddcdacefca4: Preparing
6c9a5f2bcdc9: Preparing
97f77c2bf551: Preparing
0ad644802067: Preparing
0988452d60ad: Preparing
22b2247d543c: Preparing
5130ccfce7b2: Preparing
005c189102b1: Preparing
b2541313126e: Preparing
7a631d1de8a8: Preparing
563ea1e7989f: Preparing
05b737b70379: Preparing
461c94146b25: Preparing
89f14d452cdc: Preparing
221c639fb572: Preparing
094a55ed8561: Preparing
c727ce4f07f0: Preparing
51a87510e77a: Waiting
5f4f32dbd55d: Preparing
577dd6013185: Preparing
78c62c90c01c: Preparing
6d5f1e49ad99: Preparing
73765adce069: Waiting
96a6eb08694f: Preparing
fddcdacefca4: Waiting
b0404397b1f6: Preparing
b2541313126e: Waiting
6c9a5f2bcdc9: Waiting
c558

## II. Instantiate the model

It's the time to instantiate our BERT model, we first specify our hyperparameters for training as below. Note that here `save_to_s3` is the place where the finetuned model is going to be living in, i.e., "s3://{`bucket`}/model.tar.gz".

In [20]:
# set our hyperparameters
hyperparameters = {'bert_model': 'bert-base-uncased',  
                   'vocab_file': '/workspace/bert/data/bert_vocab.txt',
                   'config_file':'/workspace/bert/bert_config.json', 
                   'output_dir': 'opt/ml/model',
                   'train_file': '/workspace/bert/data/squad/v1.1/train-v1.1.json',
                   'num_gpus':4, 
                   'num_train_epochs': 2, 
                   'train_batch_size':16, 
                   'max_seq_length':512, 
                   'doc_stride':128, 
                   'seed':1,
                   'learning_rate':3e-5,
                   'save_to_s3':bucket}


Once we have set our hyperparameters, we will instantiate a Sagemaker Estimator `PyTorch` that we will use to run our training job. Here are some other parameters we need to specify:

- The GPUs (or `train_instance_type`) we are going to use will be the AWS [`ml.p3.8xlarge` instance](https://aws.amazon.com/sagemaker/pricing/instance-types/). It contains four *V100* volta GPUs, making them ideal for this heavy duty deep learning training. 


- We specify the Docker image we just pushed to ECR with `image_name`. Our Docker container has two commands, train and serve. When we instantiate a training job, behind the scenes Sagemaker is running our Docker container and telling it to run the train command.


- We illustrate an entrypoint algorithm by `entry_point`. The entrypoint file will give instructions for what operations our container should perform when it starts up. 


- We use AWS Deep Learning Containers for PyTorch 1.4.0 for the `framework_version`.

In [59]:
account=!aws sts get-caller-identity --query Account --output text
region=!aws configure get region
algoname = 'bert-torch-train'

# instantiate model
torch_model = PyTorch(role=role,
                      train_instance_count=1,
                      train_instance_type='ml.p3.8xlarge',
                      entry_point='transform_script.py',
                      image_name="{}.dkr.ecr.{}.amazonaws.com/{}".format(account[0], region[0], algoname),
                      framework_version='1.4.0',
                      hyperparameters=hyperparameters
                     )

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.


## III. Fine-tune the model

To reuse the pretrained parameters from NGC, we first download the `bert_base` model. The downloading might take a little bit time to run.

In [10]:
!wget -q https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt -O bert_base.pt

If you use an [`ml.p3.8xlarge` instance](https://aws.amazon.com/sagemaker/pricing/instance-types/) with 4 GPUs and a batch size of 16, this process will take around 15 minutes to complete for this particular finetuning task with 1 epoch. It's recommended to use at minimum use a training instance with 4 GPUs, although you will likely get better performance with one of the `ml.p3.16xlarge` or `ml.p3dn.24xlarge` instances. 

Let's start the training!

In [22]:
training_start = time.time()
torch_model.fit()

2020-08-20 21:13:44 Starting - Starting the training job...
2020-08-20 21:13:47 Starting - Launching requested ML instances......
2020-08-20 21:15:01 Starting - Preparing the instances for training......
2020-08-20 21:16:14 Downloading - Downloading input data
[34m== PyTorch ==[0m
[0m
[34mNVIDIA Release 19.12 (build 9142930)[0m
[34mPyTorch Version 1.4.0a0+a5b4d78
[0m
[34mContainer image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
[0m
[34mCopyright (c) 2014-2019 Facebook Inc.[0m
[34mCopyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)[0m
[34mCopyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2013 NYU                      (Clement Farabet)[0m
[34mCopyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)[0m
[34mCopyright (c) 2006      Idiap Research Institute (Samy Beng


2020-08-20 21:19:31 Training - Training image download completed. Training in progress.[34mHTTP request sent, awaiting response... 200 OK[0m
[34mSyntax error in Set-Cookie: codalab_session=""; expires=Thu, 01 Jan 1970 00:00:00 GMT; Max-Age=-1; Path=/ at position 70.[0m
[34mLength: unspecified [text/x-python][0m
[34mSaving to: ‘v1.1/evaluate-v1.1.py’

     0K ...                                                     283M=0s
[0m
[34m2020-08-20 21:19:35 (283 MB/s) - ‘v1.1/evaluate-v1.1.py’ saved [3419]
[0m
[34m--2020-08-20 21:19:35--  https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json[0m
[34mResolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...[0m
[34mConnecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.[0m
[34mHTTP request sent, awaiting response... 200 OK[0m
[34mLength: 42123633 (40M) [application/json][0m
[34mSaving to: ‘v2.0/train-v2.0.json’

     0K .......

[34mTraceback (most recent call last):
  File "run_squad.py", line 1248, in <module>
    main()
  File "run_squad.py", line 1008, in main
    model.load_state_dict(torch.load(args.init_checkpoint, map_location='cpu')["model"], strict=False)
  File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 481, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 210, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 193, in __init__
    super(_open_file, self).__init__(open(name, mode))[0m
[34mFileNotFoundError: [Errno 2] No such file or directory: '/workspace/bert/bert_base.pt'[0m
[34mTraceback (most recent call last):
  File "run_squad.py", line 1248, in <module>
    main()
  File "run_squad.py", line 1008, in main
    model.load_state_dict(torch.load(args.init_checkpoint, map_location='cpu')["model

In [23]:
training_end = time.time()
print("Time for training : {} s".format(training_end - training_start))

Time for training : 405.2912292480469 s


## IV. Deploy our trained model

After finetuning the BERT base model, we are ready to deploy the trained model to an Sagemaker endpoint and test it with some question answering tast data.

Let's first deploy the model to the inference instance `'ml.g4dn.4xlarge'`. Since the deploy needs to launch new instance and upload the model, it will take about 10-12 minutes.

In [31]:
from sagemaker.predictor import RealTimePredictor, json_serializer, json_deserializer

class JSONPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(JSONPredictor, self).__init__(endpoint_name, sagemaker_session, json_serializer, json_deserializer)


In [65]:
deploy_start = time.time()

## endpoint name must satisfy regular expression pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*
endpoint_name = 'bert-kdd' 
model_data = f's3://{bucket}/model.tar.gz'
        
        
torch_model = PyTorchModel(model_data=model_data,
                           role=role,
                           entry_point='transform_script.py',
                           framework_version='1.4.0',
                           predictor_cls=JSONPredictor)

bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', 
                              initial_instance_count=1, 
                              endpoint_name=endpoint_name)

print("Time for deploying '{}' : {}".format(endpoint_name, time.time() - deploy_start))

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


---------------!Time for deploying 'bert-kdd' : 635.0143537521362


Now that our endpoint has been deployed, let's send it some requests! For the NLP QA setting, we need to packed the model both "question" and "context" (to answer the question) in a Json file as below:

```{`json}
{
  "context": "Danielle is a girl who really loves her cat, Steve. Steve is a large cat with a very furry belly. He gets very excited by the prospect of eating chicken covered in gravy.",

  "question": "who loves Steve?"
}
```

In [66]:
response = bert_end.predict(pass_in_data, initial_args={'ContentType':'application/json'}) 
response

'Danielle'

In [64]:
# !rm bert_base.pt
# !rm s3_bucket.txt
bert_end.delete_endpoint()