# Finetuning PyTorch BERT with NGC

Pre-trained language representations have been shown to improve many downstream NLP tasks such as question answering, and natural language inference. Devlin et al proposed [BERT](https://arxiv.org/abs/1706.03762) (Bidirectional Encoder Representations from Transformers), which fine-tunes deep bidirectional representations on a wide range of tasks with minimal task-specific parameters, and obtained state-of-the-art results.

In this tutorial, we will focus on adapting the BERT model for the question answering task on the SQuAD dataset. Specifically, we will:

- learn how to pre-process the [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/) to leverage the learnt representation in BERT;
- adapt the BERT model to the question answering task;
- load a pretrained BERT model and finetune it;
- inference on the SQuAD test dataset.

[**NGC**](https://www.nvidia.com/en-us/gpu-cloud/) is the hub for GPU-optimized software for deep learning and high-performance computing (HPC) that takes care of all the plumbing, so that researchers can focus on building solutions, gathering insights, and delivering business value. We will be finetuning on the pretrained BERT model provided by NGC.

Now let's get started and we first import the packages for this tutorial.

In [1]:
import collections, datetime, json, math, os, tarfile, time
from io import StringIO

import boto3
import numpy as np
import torch
import sagemaker
from sagemaker.pytorch import estimator, PyTorch, PyTorchModel, PyTorchPredictor
from sagemaker.utils import name_from_base

from file_utils import PYTORCH_PRETRAINED_BERT_CACHE
from helper_funcs import *
from modeling import BertForQuestionAnswering, BertConfig, WEIGHTS_NAME, CONFIG_NAME
from tokenization import (BasicTokenizer, BertTokenizer, whitespace_tokenize)
from types import SimpleNamespace

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket() # can replace with your own S3 bucket

with open('s3_bucket.txt','w') as f:
    f.write(f's3://{bucket}')
with open('hyperparameters.json', 'r') as f:
    params = json.load(f)
params['save_to_s3'] = bucket
with open('hyperparameters.json', 'w') as f:
    json.dump(params, f)

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


Building and training the BERT model from scratch is expensive, most of the time we only need to finetune on the pretrained model. To reuse the pretrained parameters from NGC, we first download the `bert_base` model. The downloading might take a little bit time to run.

In [None]:
!wget -q https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt -O bert_base.pt

## I. Create training docker container

### Step 1. Build the docker container

To build a end-to-end training environment, we first initiate a docker container. The container will install packages, setup PATH, copy the scripts (such as `train` and `serve`) and BERT base model to the container. The explicit steps are writing in `Dockerfile` in the same directory as this notebook.

Now we create a custom docker container based on the NGC Bert container.

In [35]:
%%sh

# The name of our algorithm
algorithm_name=bert-torch-train

chmod +x train
chmod +x serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.
docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

Login Succeeded
Sending build context to Docker daemon  3.379GB
Step 1/15 : ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.12-py3
Step 2/15 : FROM ${FROM_IMAGE_NAME}
 ---> be021446e08c
Step 3/15 : RUN apt-get update && apt-get install -y pbzip2 pv bzip2 cabextract nginx wget
 ---> Using cache
 ---> 531aa5803c25
Step 4/15 : ENV BERT_PREP_WORKING_DIR /workspace/bert/data
 ---> Using cache
 ---> e3e869e1e19f
Step 5/15 : WORKDIR /workspace
 ---> Using cache
 ---> ed3f4b37d228
Step 6/15 : RUN git clone https://github.com/attardi/wikiextractor.git
 ---> Using cache
 ---> 002b698feeb7
Step 7/15 : RUN git clone https://github.com/soskek/bookcorpus.git
 ---> Using cache
 ---> a95b8103c1db
Step 8/15 : WORKDIR /workspace/bert
 ---> Using cache
 ---> 8e1f9b1d1336
Step 9/15 : RUN pip install --upgrade --no-cache-dir pip  && pip install --no-cache-dir  gevent flask pathlib gunicorn tqdm boto3 requests six ipdb h5py html2text nltk progressbar onnxruntime git+https://github.com/NVIDIA/dllogger
 ---> 

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



### Step 2. Edit ECR permission

Once the built container is done, we can push it to the AWS Elastic Container Registry (ECR). However, for the security consideration, you well need to access ECR and change the permission. To do that, we first create the individual **unique** Json file similar as below:

```{`json}
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "All-Allow",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:sts::383827541835:assumed-role/AmazonSageMaker-ExecutionRole-20200409T103675/SageMaker"
        ]
      },
      "Action": "*"
    }
  ]
}
```

Then, we [access this link](https://console.aws.amazon.com/ecr/repositories/bert-torch-train/permissions?region=us-east-1) and click `Edit Policy JSON`, then paste the above Json text and click `Save`.



(**NOTE: You only need to replace** the sample Json's `arn:aws:sts` with your own as shown below:)

In [39]:
role = sagemaker.get_execution_role()
role

'arn:aws:iam::363160369090:role/TeamRole'

### Step 3. Push the docker container

With the ECR permission, let's push our built docker container to ECR.

In [40]:
%%sh

algorithm_name=bert-torch-train
account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
# region=${region:-us-east-1}
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

docker push ${fullname}

The push refers to repository [363160369090.dkr.ecr.us-east-1.amazonaws.com/bert-torch-train]
cc911bb77b9b: Preparing
3500da1e240d: Preparing
f3f0cff4d51a: Preparing
9e83dcd13292: Preparing
e4a6b04222e2: Preparing
9d626438992c: Preparing
859f67215f0f: Preparing
5c1235c1f6aa: Preparing
37a1e383fbcb: Preparing
3da47e0590c2: Preparing
fddcdacefca4: Preparing
6c9a5f2bcdc9: Preparing
97f77c2bf551: Preparing
0ad644802067: Preparing
0988452d60ad: Preparing
22b2247d543c: Preparing
5130ccfce7b2: Preparing
005c189102b1: Preparing
b2541313126e: Preparing
7a631d1de8a8: Preparing
563ea1e7989f: Preparing
05b737b70379: Preparing
461c94146b25: Preparing
89f14d452cdc: Preparing
221c639fb572: Preparing
094a55ed8561: Preparing
c727ce4f07f0: Preparing
5f4f32dbd55d: Preparing
577dd6013185: Preparing
78c62c90c01c: Preparing
6d5f1e49ad99: Preparing
96a6eb08694f: Preparing
b0404397b1f6: Preparing
c558708f95ac: Preparing
c003f7d80d34: Preparing
9f34fb2ba40d: Preparing
a9268194e7cd: Preparing
8e893f1677ca: Prep

## II. Instantiate the model

It's the time to instantiate our BERT model, we first specify our hyperparameters for training as below. Note that here `save_to_s3` is the place where the finetuned model is going to be living in, i.e., "s3://{`bucket`}/model.tar.gz".

In [43]:
# set our hyperparameters
hyperparameters = {'bert_model': 'bert-base-uncased',  
                   'vocab_file': '/workspace/bert/data/bert_vocab.txt',
                   'config_file':'/workspace/bert/bert_config.json', 
                   'output_dir': 'opt/ml/model',
                   'train_file': '/workspace/bert/data/squad/v1.1/train-v1.1.json',
                   'num_gpus':4, 
                   'num_train_epochs': 2, 
                   'train_batch_size':16, 
                   'max_seq_length':512, 
                   'doc_stride':128, 
                   'seed':1,
                   'learning_rate':3e-5,
                   'save_to_s3':bucket}


Once we have set our hyperparameters, we will instantiate a Sagemaker Estimator `PyTorch` that we will use to run our training job. Here are some other parameters we need to specify:

- The GPUs (or `train_instance_type`) we are going to use will be the AWS [`ml.p3.8xlarge` instance](https://aws.amazon.com/sagemaker/pricing/instance-types/). It contains four *V100* volta GPUs, making them ideal for this heavy duty deep learning training. 


- We specify the Docker image we just pushed to ECR with `image_name`. Our Docker container has two commands, train and serve. When we instantiate a training job, behind the scenes Sagemaker is running our Docker container and telling it to run the train command.


- We illustrate an entrypoint algorithm by `entry_point`. The entrypoint file will give instructions for what operations our container should perform when it starts up. 


- We use AWS Deep Learning Containers for PyTorch 1.4.0 for the `framework_version`.

In [42]:
account=!aws sts get-caller-identity --query Account --output text
region=!aws configure get region
algoname = 'bert-torch-train'

# instantiate model
torch_model = PyTorch(role=role,
                      train_instance_count=1,
                      train_instance_type='ml.p3.8xlarge',
                      entry_point='transform_script.py',
                      image_name="{}.dkr.ecr.{}.amazonaws.com/{}".format(account[0], region[0], algoname),
                      framework_version='1.4.0',
                      hyperparameters=hyperparameters,
#                       model_dir=f's3://{bucket}/model.tar.gz'
                     )

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.


## III. Fine-tune the model


If you use an [`ml.p3.8xlarge` instance](https://aws.amazon.com/sagemaker/pricing/instance-types/) with 4 GPUs and a batch size of 16, this process will take around 15 minutes to complete for this particular finetuning task with 1 epoch. It's recommended to use at minimum use a training instance with 4 GPUs, although you will likely get better performance with one of the `ml.p3.16xlarge` or `ml.p3dn.24xlarge` instances. 

Let's start the training!

In [45]:
training_start = time.time()
torch_model.fit()

2020-08-22 02:05:11 Starting - Starting the training job...
2020-08-22 02:05:16 Starting - Launching requested ML instances......
2020-08-22 02:06:39 Starting - Preparing the instances for training.........
2020-08-22 02:07:57 Downloading - Downloading input data
[34m== PyTorch ==[0m
[0m
[34mNVIDIA Release 19.12 (build 9142930)[0m
[34mPyTorch Version 1.4.0a0+a5b4d78
[0m
[34mContainer image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
[0m
[34mCopyright (c) 2014-2019 Facebook Inc.[0m
[34mCopyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)[0m
[34mCopyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2013 NYU                      (Clement Farabet)[0m
[34mCopyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)[0m
[34mCopyright (c) 2006      Idiap Research Institute (Samy B

[34mdevice: cuda:1 n_gpu: 1, distributed training: True, 16-bits training: True[0m
[34mdevice: cuda:2 n_gpu: 1, distributed training: True, 16-bits training: True[0m
[34mdevice: cuda:3 n_gpu: 1, distributed training: True, 16-bits training: True[0m
[34mNot sure whats up here[0m
[34mNot sure whats up here[0m
[34mdevice: cuda:0 n_gpu: 1, distributed training: True, 16-bits training: True[0m
[34mDLL 2020-08-22 02:12:11.821183 - PARAMETER Config : ["Namespace(bert_model='bert-base-uncased', config_file='/workspace/bert/bert_config.json', do_eval=False, do_lower_case=True, do_predict=False, do_train=True, doc_stride=128, eval_script='evaluate.py', fp16=True, gradient_accumulation_steps=1, init_checkpoint='/workspace/bert/bert_base.pt', json_summary='dllogger.json', learning_rate=3e-05, local_rank=0, log_freq=50, loss_scale=0, max_answer_length=30, max_query_length=64, max_seq_length=512, max_steps=-1.0, n_best_size=20, no_cuda=False, null_score_diff_threshold=0.0, num_train_epo


2020-08-22 02:12:58 Training - Training image download completed. Training in progress.[34mDLL 2020-08-22 02:18:30.566837 - PARAMETER Cached_train features_file : /workspace/bert/data/squad/v1.1/train-v1.1.json_bert-base-uncased_512_128_64 [0m
[34mDLL 2020-08-22 02:18:40.247406 - PARAMETER train_start : True [0m
[34mDLL 2020-08-22 02:18:40.247552 - PARAMETER training_samples : 87599 [0m
[34mDLL 2020-08-22 02:18:40.247599 - PARAMETER training_features : 87748 [0m
[34mDLL 2020-08-22 02:18:40.247635 - PARAMETER train_batch_size : 16 [0m
[34mDLL 2020-08-22 02:18:40.247674 - PARAMETER steps : 2737.0 [0m
[34mGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0[0m
[34mGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0[0m
[34mGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0[0m
[34mGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0[0m
[34mDLL 2020-08-22 02:18

[34mDLL 2020-08-22 02:19:36.761207 - Training Epoch: 0 Training Iteration: 301  step_loss : 2.208984375  learning_rate : 2.966751918158568e-05 [0m
[34mDLL 2020-08-22 02:19:45.583239 - Training Epoch: 0 Training Iteration: 351  step_loss : 1.328125  learning_rate : 2.9058579953720615e-05 [0m
[34mDLL 2020-08-22 02:19:54.408013 - Training Epoch: 0 Training Iteration: 401  step_loss : 2.12109375  learning_rate : 2.844964072585556e-05 [0m
[34mDLL 2020-08-22 02:20:03.244668 - Training Epoch: 0 Training Iteration: 451  step_loss : 0.92529296875  learning_rate : 2.7840701497990502e-05 [0m
[34m��▊        | 255/1372 [00:45<03:16,  5.67it/s]#015Iteration:  19%|█▊        | 256/1372 [00:46<03:16,  5.68it/s]#015Iteration:  19%|█▊        | 257/1372 [00:46<03:16,  5.68it/s]#015Iteration:  19%|█▉        | 258/1372 [00:46<03:16,  5.68it/s]#015Iteration:  19%|█▉        | 259/1372 [00:46<03:16,  5.68it/s]#015Iteration:  19%|█▉        | 260/1372 [00:46<03:15,  5.68it/s]#015Iteration:  19%|█▉      

[34mDLL 2020-08-22 02:20:20.949700 - Training Epoch: 0 Training Iteration: 551  step_loss : 1.4619140625  learning_rate : 2.6622823042260386e-05 [0m
[34mDLL 2020-08-22 02:20:29.823383 - Training Epoch: 0 Training Iteration: 601  step_loss : 1.0791015625  learning_rate : 2.6013883814395323e-05 [0m
[34mDLL 2020-08-22 02:20:38.678832 - Training Epoch: 0 Training Iteration: 651  step_loss : 1.642578125  learning_rate : 2.5404944586530263e-05 [0m
[34mDLL 2020-08-22 02:20:47.510642 - Training Epoch: 0 Training Iteration: 701  step_loss : 1.083984375  learning_rate : 2.4796005358665203e-05 [0m
[34m��▌      | 495/1372 [01:28<02:35,  5.65it/s]#015Iteration:  36%|███▌      | 496/1372 [01:28<02:34,  5.66it/s]#015Iteration:  36%|███▌      | 497/1372 [01:28<02:34,  5.66it/s]#015Iteration:  36%|███▋      | 498/1372 [01:28<02:34,  5.66it/s]#015Iteration:  36%|███▋      | 499/1372 [01:28<02:34,  5.65it/s]#015Iteration:  36%|███▋      | 500/1372 [01:29<02:34,  5.65it/s]#015Iteration:  37%|███▋

[34mDLL 2020-08-22 02:20:56.341983 - Training Epoch: 0 Training Iteration: 751  step_loss : 0.92431640625  learning_rate : 2.4187066130800147e-05 [0m
[34mDLL 2020-08-22 02:21:05.166635 - Training Epoch: 0 Training Iteration: 801  step_loss : 1.2890625  learning_rate : 2.3578126902935087e-05 [0m
[34mDLL 2020-08-22 02:21:13.988273 - Training Epoch: 0 Training Iteration: 851  step_loss : 1.150390625  learning_rate : 2.2969187675070027e-05 [0m
[34mDLL 2020-08-22 02:21:22.806336 - Training Epoch: 0 Training Iteration: 901  step_loss : 0.72705078125  learning_rate : 2.2360248447204967e-05 [0m
[34m�███▎    | 724/1372 [02:08<01:54,  5.66it/s]#015Iteration:  53%|█████▎    | 725/1372 [02:09<01:54,  5.66it/s]#015Iteration:  53%|█████▎    | 726/1372 [02:09<01:54,  5.66it/s]#015Iteration:  53%|█████▎    | 727/1372 [02:09<01:54,  5.66it/s]#015Iteration:  53%|█████▎    | 728/1372 [02:09<01:53,  5.66it/s]#015Iteration:  53%|█████▎    | 729/1372 [02:09<01:53,  5.66it/s]#015Iteration:  53%|████

[34mDLL 2020-08-22 02:21:40.441068 - Training Epoch: 0 Training Iteration: 1001  step_loss : 1.4287109375  learning_rate : 2.114236999147485e-05 [0m
[34mDLL 2020-08-22 02:21:49.256964 - Training Epoch: 0 Training Iteration: 1051  step_loss : 0.8369140625  learning_rate : 2.0533430763609788e-05 [0m
[34mDLL 2020-08-22 02:21:58.076392 - Training Epoch: 0 Training Iteration: 1101  step_loss : 1.375  learning_rate : 1.9924491535744735e-05 [0m
[34mDLL 2020-08-22 02:22:06.900645 - Training Epoch: 0 Training Iteration: 1151  step_loss : 1.095703125  learning_rate : 1.9315552307879675e-05 [0m
[34mt/s]#015Iteration:  69%|██████▊   | 943/1372 [02:47<01:15,  5.68it/s]#015Iteration:  69%|██████▉   | 944/1372 [02:47<01:15,  5.68it/s]#015Iteration:  69%|██████▉   | 945/1372 [02:47<01:15,  5.67it/s]#015Iteration:  69%|██████▉   | 946/1372 [02:48<01:15,  5.67it/s]#015Iteration:  69%|██████▉   | 947/1372 [02:48<01:14,  5.67it/s]#015Iteration:  69%|██████▉   | 948/1372 [02:48<01:14,  5.67it/s]#0

[34mDLL 2020-08-22 02:22:24.549998 - Training Epoch: 0 Training Iteration: 1251  step_loss : 1.15234375  learning_rate : 1.8097673852149556e-05 [0m
[34mDLL 2020-08-22 02:22:33.375513 - Training Epoch: 0 Training Iteration: 1301  step_loss : 0.8125  learning_rate : 1.7488734624284496e-05 [0m
[34meration:  84%|████████▍ | 1151/1372 [03:24<00:39,  5.66it/s]#015Iteration:  84%|████████▍ | 1152/1372 [03:24<00:38,  5.67it/s]#015Iteration:  84%|████████▍ | 1153/1372 [03:24<00:38,  5.67it/s]#015Iteration:  84%|████████▍ | 1154/1372 [03:24<00:38,  5.67it/s]#015Iteration:  84%|████████▍ | 1155/1372 [03:24<00:38,  5.67it/s]#015Iteration:  84%|████████▍ | 1156/1372 [03:25<00:38,  5.67it/s]#015Iteration:  84%|████████▍ | 1157/1372 [03:25<00:37,  5.67it/s]#015Iteration:  84%|████████▍ | 1158/1372 [03:25<00:37,  5.67it/s]#015Iteration:  84%|████████▍ | 1159/1372 [03:25<00:37,  5.66it/s]#015Iteration:  85%|████████▍ | 1160/1372 [03:25<00:37,  5.67it/s]#015Iteration:  85%|████████▍ | 1161/1372 [03

[34mDLL 2020-08-22 02:22:54.899810 - Training Epoch: 1 Training Iteration: 1423  step_loss : 0.209716796875  learning_rate : 1.600292290829375e-05 [0m
[34mDLL 2020-08-22 02:23:03.713938 - Training Epoch: 1 Training Iteration: 1473  step_loss : 0.56787109375  learning_rate : 1.5393983680428697e-05 [0m
[34mDLL 2020-08-22 02:23:12.529916 - Training Epoch: 1 Training Iteration: 1523  step_loss : 0.806640625  learning_rate : 1.4785044452563634e-05 [0m
[34mDLL 2020-08-22 02:23:21.346360 - Training Epoch: 1 Training Iteration: 1573  step_loss : 1.0302734375  learning_rate : 1.4176105224698576e-05 [0m
[34mDLL 2020-08-22 02:23:30.165609 - Training Epoch: 1 Training Iteration: 1623  step_loss : 0.673828125  learning_rate : 1.3567165996833516e-05 [0m
[34m#015Iteration:   0%|          | 0/1372 [00:00<?, ?it/s]#015Iteration:   0%|          | 1/1372 [00:00<04:07,  5.54it/s]#015Iteration:   0%|          | 2/1372 [00:00<04:05,  5.58it/s]#015Iteration:   0%|          | 3/1372 [00:00<04:03,  

[34mDLL 2020-08-22 02:23:38.989607 - Training Epoch: 1 Training Iteration: 1673  step_loss : 1.4140625  learning_rate : 1.2958226768968458e-05 [0m
[34mDLL 2020-08-22 02:23:47.820509 - Training Epoch: 1 Training Iteration: 1723  step_loss : 0.4833984375  learning_rate : 1.2349287541103397e-05 [0m
[34mDLL 2020-08-22 02:23:56.646180 - Training Epoch: 1 Training Iteration: 1773  step_loss : 0.80859375  learning_rate : 1.174034831323834e-05 [0m
[34mDLL 2020-08-22 02:24:05.472068 - Training Epoch: 1 Training Iteration: 1823  step_loss : 0.84521484375  learning_rate : 1.1131409085373279e-05 [0m
[34m��▊        | 255/1372 [00:44<03:16,  5.67it/s]#015Iteration:  19%|█▊        | 256/1372 [00:45<03:16,  5.67it/s]#015Iteration:  19%|█▊        | 257/1372 [00:45<03:16,  5.67it/s]#015Iteration:  19%|█▉        | 258/1372 [00:45<03:16,  5.67it/s]#015Iteration:  19%|█▉        | 259/1372 [00:45<03:16,  5.67it/s]#015Iteration:  19%|█▉        | 260/1372 [00:45<03:16,  5.67it/s]#015Iteration:  19%|█

[34mDLL 2020-08-22 02:24:23.123816 - Training Epoch: 1 Training Iteration: 1923  step_loss : 0.6611328125  learning_rate : 9.913530629643159e-06 [0m
[34mDLL 2020-08-22 02:24:31.954845 - Training Epoch: 1 Training Iteration: 1973  step_loss : 0.611328125  learning_rate : 9.304591401778103e-06 [0m
[34mDLL 2020-08-22 02:24:40.780129 - Training Epoch: 1 Training Iteration: 2023  step_loss : 1.0546875  learning_rate : 8.695652173913046e-06 [0m
[34mDLL 2020-08-22 02:24:49.617118 - Training Epoch: 1 Training Iteration: 2073  step_loss : 0.6953125  learning_rate : 8.086712946047985e-06 [0m
[34m��▌      | 495/1372 [01:27<02:34,  5.66it/s]#015Iteration:  36%|███▌      | 496/1372 [01:27<02:34,  5.66it/s]#015Iteration:  36%|███▌      | 497/1372 [01:27<02:34,  5.66it/s]#015Iteration:  36%|███▋      | 498/1372 [01:27<02:34,  5.66it/s]#015Iteration:  36%|███▋      | 499/1372 [01:28<02:34,  5.66it/s]#015Iteration:  36%|███▋      | 500/1372 [01:28<02:34,  5.66it/s]#015Iteration:  37%|███▋     

[34mDLL 2020-08-22 02:24:58.449433 - Training Epoch: 1 Training Iteration: 2123  step_loss : 0.47900390625  learning_rate : 7.477773718182927e-06 [0m
[34mDLL 2020-08-22 02:25:07.284058 - Training Epoch: 1 Training Iteration: 2173  step_loss : 0.8583984375  learning_rate : 6.868834490317865e-06 [0m
[34mDLL 2020-08-22 02:25:16.120209 - Training Epoch: 1 Training Iteration: 2223  step_loss : 0.64990234375  learning_rate : 6.2598952624528085e-06 [0m
[34mDLL 2020-08-22 02:25:24.954612 - Training Epoch: 1 Training Iteration: 2273  step_loss : 0.400146484375  learning_rate : 5.650956034587747e-06 [0m
[34m�███▎    | 724/1372 [02:07<01:54,  5.65it/s]#015Iteration:  53%|█████▎    | 725/1372 [02:07<01:54,  5.65it/s]#015Iteration:  53%|█████▎    | 726/1372 [02:08<01:54,  5.66it/s]#015Iteration:  53%|█████▎    | 727/1372 [02:08<01:53,  5.66it/s]#015Iteration:  53%|█████▎    | 728/1372 [02:08<01:53,  5.66it/s]#015Iteration:  53%|█████▎    | 729/1372 [02:08<01:53,  5.66it/s]#015Iteration:  5

[34mDLL 2020-08-22 02:25:42.617480 - Training Epoch: 1 Training Iteration: 2373  step_loss : 1.0302734375  learning_rate : 4.433077578857628e-06 [0m
[34mDLL 2020-08-22 02:25:51.450395 - Training Epoch: 1 Training Iteration: 2423  step_loss : 0.50048828125  learning_rate : 3.824138350992571e-06 [0m
[34mDLL 2020-08-22 02:26:00.285427 - Training Epoch: 1 Training Iteration: 2473  step_loss : 0.978515625  learning_rate : 3.21519912312751e-06 [0m
[34mDLL 2020-08-22 02:26:09.118910 - Training Epoch: 1 Training Iteration: 2523  step_loss : 0.5302734375  learning_rate : 2.6062598952624526e-06 [0m
[34mt/s]#015Iteration:  69%|██████▊   | 943/1372 [02:46<01:15,  5.66it/s]#015Iteration:  69%|██████▉   | 944/1372 [02:46<01:15,  5.66it/s]#015Iteration:  69%|██████▉   | 945/1372 [02:46<01:15,  5.66it/s]#015Iteration:  69%|██████▉   | 946/1372 [02:46<01:15,  5.66it/s]#015Iteration:  69%|██████▉   | 947/1372 [02:47<01:15,  5.66it/s]#015Iteration:  69%|██████▉   | 948/1372 [02:47<01:14,  5.66it

[34mDLL 2020-08-22 02:26:26.786738 - Training Epoch: 1 Training Iteration: 2623  step_loss : 0.8134765625  learning_rate : 1.3883814395323342e-06 [0m
[34mDLL 2020-08-22 02:26:35.618882 - Training Epoch: 1 Training Iteration: 2673  step_loss : 0.7412109375  learning_rate : 7.794422116672769e-07 [0m
[34meration:  84%|████████▍ | 1151/1372 [03:23<00:39,  5.66it/s]#015Iteration:  84%|████████▍ | 1152/1372 [03:23<00:38,  5.66it/s]#015Iteration:  84%|████████▍ | 1153/1372 [03:23<00:38,  5.65it/s]#015Iteration:  84%|████████▍ | 1154/1372 [03:23<00:38,  5.65it/s]#015Iteration:  84%|████████▍ | 1155/1372 [03:23<00:38,  5.66it/s]#015Iteration:  84%|████████▍ | 1156/1372 [03:24<00:38,  5.66it/s]#015Iteration:  84%|████████▍ | 1157/1372 [03:24<00:37,  5.66it/s]#015Iteration:  84%|████████▍ | 1158/1372 [03:24<00:37,  5.66it/s]#015Iteration:  84%|████████▍ | 1159/1372 [03:24<00:37,  5.66it/s]#015Iteration:  85%|████████▍ | 1160/1372 [03:24<00:37,  5.66it/s]#015Iteration:  85%|████████▍ | 1161/1


2020-08-22 02:29:50 Uploading - Uploading generated training model[34mDLL 2020-08-22 02:29:47.297628 -  e2e_train_time : 485.5270609855652  training_sequences_per_second : 361.4546213835392  final_loss : 0.0433349609375 [0m
[34m*****************************************[0m
[34mSetting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [0m
[34m*****************************************[0m
[34mstart squad_download.sh downloading ... ...[0m
[34m{'sagemaker_program': '"transform_script.py"', 'seed': '1', 'num_gpus': '4', 'bert_model': '"bert-base-uncased"', 'sagemaker_region': '"us-east-1"', 'vocab_file': '"/workspace/bert/data/bert_vocab.txt"', 'sagemaker_submit_directory': '"s3://sagemaker-us-east-1-363160369090/bert-torch-train-2020-08-22-02-05-11-407/source/sourcedir.tar.gz"', 'train_batch_size': '16', 'num_train_epochs': '2', '

In [23]:
!docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES


In [46]:
training_end = time.time()
print("Time for training : {} s".format(training_end - training_start))

Time for training : 1557.979591846466 s


## IV. Deploy our trained model

After finetuning the BERT base model, we are ready to deploy the trained model to an Sagemaker endpoint and test it with some question answering tast data.

Let's first deploy the model to the inference instance `'ml.g4dn.4xlarge'`. Since the deploy needs to launch new instance and upload the model, it will take about 10-12 minutes.

In [11]:
from sagemaker.predictor import RealTimePredictor, json_serializer, json_deserializer

class JSONPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(JSONPredictor, self).__init__(endpoint_name, sagemaker_session, json_serializer, json_deserializer)


In [None]:
## TODO

# save the model as a tarball
# with tarfile.open('bert.tar.gz', 'w:gz') as f:
#     f.add('bert_base.pt')

In [21]:
# ## TODO

# # upload model data to S3
# prefix = 'bert_pytorch'
# model_data = sagemaker_session.upload_data(path='bert.tar.gz',
#                                            bucket=bucket, # f's3://{bucket}/model.tar.gz',
#                                            key_prefix =os.path.join(prefix, 'model')
#                                           )
# model_data

's3://sagemaker-us-east-1-363160369090/bert_pytorch/model/bert.tar.gz'

In [48]:
deploy_start = time.time()

## endpoint name must satisfy regular expression pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*
endpoint_name = 'bert-kdd' 
        
        
torch_model = PyTorchModel(model_data=f's3://{bucket}/model.tar.gz',
                           role=role,
                           entry_point='transform_script.py',
                           framework_version='1.4.0',
                           predictor_cls=JSONPredictor)

bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', 
                              initial_instance_count=1, 
                              endpoint_name=endpoint_name)

print("Time for deploying '{}' : {}".format(endpoint_name, time.time() - deploy_start))

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


-----------------!Time for deploying 'bert-kdd' : 696.6776480674744


After the deployment, you should be able to see your endpoint [here](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints).

Now that our endpoint has been deployed, let's send it some questions and see how it will responde us?! For the NLP QA setting, we need to packed the model both "question" and "context" (to answer the question) in a Json file as below:

```{`json}
{
  "context": "Rachel is a girl who really loves her tiger, Kimi. Kimi is a large tiger with a very furry belly. He gets very excited by the prospect of eating chicken covered in gravy.",

  "question": "who loves Kimi?"
}
```

In [50]:
context = "Rachel is a girl who really loves her tiger, Kimi. Kimi is a large tiger with a very furry belly. \
        He gets very excited by the prospect of eating chicken covered in gravy."
question = "who loves Kimi?"
pass_in_data = {"context": context, "question": question}

In [51]:
%%time

response = bert_end.predict(pass_in_data, initial_args={'ContentType':'application/json'}) 
response

CPU times: user 14.5 ms, sys: 56 µs, total: 14.6 ms
Wall time: 6.15 s


'Rachel'

In [64]:
# !rm bert_base.pt
# !rm s3_bucket.txt
bert_end.delete_endpoint()