# Finetuning PyTorch BERT

Pre-trained language representations have been shown to improve many downstream NLP tasks such as question answering, and natural language inference. Devlin et al proposed [BERT](https://arxiv.org/abs/1706.03762) (Bidirectional Encoder Representations from Transformers), which fine-tunes deep bidirectional representations on a wide range of tasks with minimal task-specific parameters, and obtained state-of-the-art results.

In this tutorial, we will focus on adapting the BERT model for the question answering task on the SQuAD dataset. Specifically, we will:

- learn how to pre-process the [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/) to leverage the learnt representation in BERT;
- adapt the BERT model to the question answering task;
- load a pretrained BERT model and finetune it;
- inference on the SQuAD test dataset.

[**NGC**](https://www.nvidia.com/en-us/gpu-cloud/) is the hub for GPU-optimized software for deep learning and high-performance computing (HPC) that takes care of all the plumbing, so that researchers can focus on building solutions, gathering insights, and delivering business value. We will be finetuning on the pretrained BERT model provided by NGC.

Now let's get started and we first import the packages for this tutorial.

In [1]:
import collections, datetime, json, math, os, tarfile, time
from io import StringIO

import boto3
import numpy as np
import torch
import sagemaker
from sagemaker.pytorch import estimator, PyTorch, PyTorchModel, PyTorchPredictor
from sagemaker.utils import name_from_base

from file_utils import PYTORCH_PRETRAINED_BERT_CACHE
from helper_funcs import *
from modeling import BertForQuestionAnswering, BertConfig, WEIGHTS_NAME, CONFIG_NAME
from tokenization import (BasicTokenizer, BertTokenizer, whitespace_tokenize)
from types import SimpleNamespace

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket() # can replace with your own S3 bucket

with open('s3_bucket.txt','w') as f:
    f.write(f's3://{bucket}')
with open('hyperparameters.json', 'r') as f:
    params = json.load(f)
params['save_to_s3'] = bucket
with open('hyperparameters.json', 'w') as f:
    json.dump(params, f)

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


Building and training the BERT model from scratch is expensive, most of the time we only need to finetune on the pretrained model. To reuse the pretrained parameters from NGC, we first download the `bert_base` model. The downloading might take a little bit time to run.

In [2]:
%%time
!wget -q https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt -O bert_base.pt

CPU times: user 1.12 s, sys: 176 ms, total: 1.3 s
Wall time: 1min 36s


## I. Create training docker container

### Step 1. Build the docker container

To build a end-to-end training environment, we first initiate a docker container. The container will install packages, setup PATH, copy the scripts (such as `train` and `serve`) and BERT base model to the container. The explicit steps are writing in `Dockerfile` in the same directory as this notebook.

Now we create a custom docker container based on the NGC Bert container.

In [3]:
build_docker_start = time.time()

In [4]:
%%sh

# The name of our algorithm
algorithm_name=bert-train-kdd

chmod +x train
chmod +x serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.
docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

Login Succeeded
Sending build context to Docker daemon  1.941GB
Step 1/15 : ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.12-py3
Step 2/15 : FROM ${FROM_IMAGE_NAME}
 ---> be021446e08c
Step 3/15 : RUN apt-get update && apt-get install -y pbzip2 pv bzip2 cabextract nginx wget
 ---> Using cache
 ---> 22b1d5b1aec7
Step 4/15 : ENV BERT_PREP_WORKING_DIR /workspace/bert/data
 ---> Using cache
 ---> 24fdfd86f5f2
Step 5/15 : WORKDIR /workspace
 ---> Using cache
 ---> 7507b34aa969
Step 6/15 : RUN git clone https://github.com/attardi/wikiextractor.git
 ---> Using cache
 ---> 40cf9d0d4d6e
Step 7/15 : RUN git clone https://github.com/soskek/bookcorpus.git
 ---> Using cache
 ---> 74dea5f19474
Step 8/15 : WORKDIR /workspace/bert
 ---> Using cache
 ---> e2923624a6a9
Step 9/15 : RUN pip install --upgrade --no-cache-dir pip  && pip install --no-cache-dir  gevent flask pathlib gunicorn tqdm boto3 requests six ipdb h5py html2text nltk progressbar onnxruntime git+https://github.com/NVIDIA/dllogger
 ---> 

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



In [5]:
print("Time for building docker : {} s".format(time.time() - build_docker_start))

Time for building docker : 73.68929195404053 s


### Step 2. Edit ECR permission

Once the built container is done, we can push it to the AWS Elastic Container Registry (ECR). However, for the security consideration, you well need to access ECR and change the permission. To do that, we first create the individual **unique** Json file similar as below:

```{`json}
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "All-Allow",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::XXXXXXXXXXXXXXXXXXXXXXXXXX"
        ]
      },
      "Action": "*"
    }
  ]
}
```

Then, we [**access this link**](https://console.aws.amazon.com/ecr/repositories/bert-train-kdd/permissions?region=us-east-1) and click `Edit Policy JSON`, then paste the above Json text and click `Save`.



(**NOTE: You only need to replace** the sample Json's `arn:aws:iam` with your own as shown below:)

In [9]:
role = sagemaker.get_execution_role()
role

'arn:aws:iam::383827541835:role/service-role/AmazonSageMaker-ExecutionRole-20200409T103675'

### Step 3. Push the docker container

With the ECR permission, let's push our built docker container to ECR.

In [10]:
push_docker_start = time.time()

In [11]:
%%sh

algorithm_name=bert-train-kdd
account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
# region=${region:-us-east-1}
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

docker push ${fullname}

The push refers to repository [383827541835.dkr.ecr.us-east-1.amazonaws.com/bert-train-kdd]
c2b8caee5adc: Preparing
abd8764d9cee: Preparing
28a690722ab3: Preparing
7e55eb3ddb14: Preparing
3a667acb45c0: Preparing
19a858f1c4dd: Preparing
3fac541413a1: Preparing
d61b297ff92b: Preparing
9b5db43f614f: Preparing
3e96f70c3868: Preparing
fddcdacefca4: Preparing
6c9a5f2bcdc9: Preparing
97f77c2bf551: Preparing
0ad644802067: Preparing
0988452d60ad: Preparing
22b2247d543c: Preparing
3fac541413a1: Waiting
5130ccfce7b2: Preparing
d61b297ff92b: Waiting
005c189102b1: Preparing
19a858f1c4dd: Waiting
9b5db43f614f: Waiting
b2541313126e: Preparing
3e96f70c3868: Waiting
7a631d1de8a8: Preparing
563ea1e7989f: Preparing
fddcdacefca4: Waiting
05b737b70379: Preparing
461c94146b25: Preparing
6c9a5f2bcdc9: Waiting
89f14d452cdc: Preparing
221c639fb572: Preparing
97f77c2bf551: Waiting
094a55ed8561: Preparing
0ad644802067: Waiting
c727ce4f07f0: Preparing
0988452d60ad: Waiting
5f4f32dbd55d: Preparing
577dd6013185: Pr

In [12]:
print("Time for pushing docker : {} s".format(time.time() - push_docker_start))

Time for pushing docker : 181.9604253768921 s


## II. Instantiate the model

It's the time to instantiate our BERT model, we first specify our hyperparameters for training as below. Note that here `save_to_s3` is the place where the finetuned model is going to be living in, i.e., "s3://{`bucket`}/model.tar.gz".

In [13]:
# set our hyperparameters
hyperparameters = {'bert_model': 'bert-base-uncased',  
                   'vocab_file': '/workspace/bert/data/bert_vocab.txt',
                   'config_file':'/workspace/bert/bert_config.json', 
                   'output_dir': '/opt/ml/model',
                   'train_file': '/workspace/bert/data/squad/v1.1/train-v1.1.json',
                   'num_gpus':4, 
                   'num_train_epochs': 1, 
                   'train_batch_size':16, 
                   'max_seq_length':512, 
                   'doc_stride':128, 
                   'seed':1,
                   'learning_rate':3e-5,
                   'save_to_s3':bucket}


Once we have set our hyperparameters, we will instantiate a Sagemaker Estimator `PyTorch` that we will use to run our training job. Here are some other parameters we need to specify:

- The GPUs (or `train_instance_type`) we are going to use will be the AWS [`ml.p3.8xlarge` instance](https://aws.amazon.com/sagemaker/pricing/instance-types/). It contains four *V100* volta GPUs, making them ideal for this heavy duty deep learning training. 


- We specify the Docker image we just pushed to ECR with `image_name`. Our Docker container has two commands, train and serve. When we instantiate a training job, behind the scenes Sagemaker is running our Docker container and telling it to run the train command.


- We illustrate an entrypoint algorithm by `entry_point`. The entrypoint file will give instructions for what operations our container should perform when it starts up. 


- We use AWS Deep Learning Containers for PyTorch 1.4.0 for the `framework_version`.

In [14]:
account=!aws sts get-caller-identity --query Account --output text
region=!aws configure get region
algoname = 'bert-train-kdd'

# instantiate model
torch_model = PyTorch(role=role,
                      train_instance_count=1,
                      train_instance_type='ml.p3.8xlarge',
                      entry_point='transform_script.py',
                      image_name="{}.dkr.ecr.{}.amazonaws.com/{}".format(account[0], region[0], algoname),
                      framework_version='1.4.0',
                      hyperparameters=hyperparameters,
#                       model_dir=f's3://{bucket}/model.tar.gz'
                     )

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.


## III. Fine-tune the model


If you use an [`ml.p3.8xlarge` instance](https://aws.amazon.com/sagemaker/pricing/instance-types/) with 4 GPUs and a batch size of 16, this process will take around 15 minutes to complete for this particular finetuning task with 1 epoch. It's recommended to use at minimum use a training instance with 4 GPUs, although you will likely get better performance with one of the `ml.p3.16xlarge` or `ml.p3dn.24xlarge` instances. 

Let's start the training!

In [15]:
training_start = time.time()
torch_model.fit()

2020-08-23 20:41:16 Starting - Starting the training job...
2020-08-23 20:41:18 Starting - Launching requested ML instances......
2020-08-23 20:42:30 Starting - Preparing the instances for training......
2020-08-23 20:43:38 Downloading - Downloading input data
2020-08-23 20:43:38 Training - Downloading the training image.....................
[34m== PyTorch ==[0m
[0m
[34mNVIDIA Release 19.12 (build 9142930)[0m
[34mPyTorch Version 1.4.0a0+a5b4d78
[0m
[34mContainer image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
[0m
[34mCopyright (c) 2014-2019 Facebook Inc.[0m
[34mCopyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)[0m
[34mCopyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2013 NYU                      (Clement Farabet)[0m
[34mCopyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin,

[34mdevice: cuda:2 n_gpu: 1, distributed training: True, 16-bits training: True[0m
[34mdevice: cuda:3 n_gpu: 1, distributed training: True, 16-bits training: True[0m
[34mdevice: cuda:1 n_gpu: 1, distributed training: True, 16-bits training: True[0m
[34mdevice: cuda:0 n_gpu: 1, distributed training: True, 16-bits training: True[0m
[34mDLL 2020-08-23 20:47:19.117129 - PARAMETER Config : ["Namespace(bert_model='bert-base-uncased', config_file='/workspace/bert/bert_config.json', do_eval=False, do_lower_case=True, do_predict=False, do_train=True, doc_stride=128, eval_script='evaluate.py', fp16=True, gradient_accumulation_steps=1, init_checkpoint='/workspace/bert/bert_base.pt', json_summary='dllogger.json', learning_rate=3e-05, local_rank=0, log_freq=50, loss_scale=0, max_answer_length=30, max_query_length=64, max_seq_length=512, max_steps=-1.0, n_best_size=20, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=1.0, output_dir='/opt/ml/model', predict_batch_size=8, predi

[34mDLL 2020-08-23 20:53:37.641220 - PARAMETER Cached_train features_file : /workspace/bert/data/squad/v1.1/train-v1.1.json_bert-base-uncased_512_128_64 [0m
[34mDLL 2020-08-23 20:53:47.434952 - PARAMETER train_start : True [0m
[34mDLL 2020-08-23 20:53:47.435093 - PARAMETER training_samples : 87599 [0m
[34mDLL 2020-08-23 20:53:47.435141 - PARAMETER training_features : 87748 [0m
[34mDLL 2020-08-23 20:53:47.435178 - PARAMETER train_batch_size : 16 [0m
[34mDLL 2020-08-23 20:53:47.435213 - PARAMETER steps : 1368.0 [0m
[34mGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0[0m
[34mGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0[0m
[34mGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0[0m
[34mGradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0[0m
[34mDLL 2020-08-23 20:53:51.069260 - Training Epoch: 0 Training Iteration: 1  step_loss : 6.265625  learning_rat

[34mDLL 2020-08-23 20:54:43.362326 - Training Epoch: 0 Training Iteration: 301  step_loss : 2.06640625  learning_rate : 2.59990253411306e-05 [0m
[34mDLL 2020-08-23 20:54:52.050100 - Training Epoch: 0 Training Iteration: 351  step_loss : 1.203125  learning_rate : 2.478070175438597e-05 [0m
[34mDLL 2020-08-23 20:55:00.742000 - Training Epoch: 0 Training Iteration: 401  step_loss : 1.94921875  learning_rate : 2.3562378167641325e-05 [0m
[34mDLL 2020-08-23 20:55:09.443747 - Training Epoch: 0 Training Iteration: 451  step_loss : 0.939453125  learning_rate : 2.2344054580896682e-05 [0m
[34m��▊        | 255/1372 [00:45<03:13,  5.76it/s]#015Iteration:  19%|█▊        | 256/1372 [00:45<03:13,  5.76it/s]#015Iteration:  19%|█▊        | 257/1372 [00:45<03:13,  5.76it/s]#015Iteration:  19%|█▉        | 258/1372 [00:45<03:13,  5.76it/s]#015Iteration:  19%|█▉        | 259/1372 [00:46<03:13,  5.76it/s]#015Iteration:  19%|█▉        | 260/1372 [00:46<03:12,  5.76it/s]#015Iteration:  19%|█▉        | 

[34mDLL 2020-08-23 20:55:26.843576 - Training Epoch: 0 Training Iteration: 551  step_loss : 1.244140625  learning_rate : 1.9907407407407406e-05 [0m
[34mDLL 2020-08-23 20:55:35.542806 - Training Epoch: 0 Training Iteration: 601  step_loss : 1.1591796875  learning_rate : 1.868908382066277e-05 [0m
[34mDLL 2020-08-23 20:55:44.244250 - Training Epoch: 0 Training Iteration: 651  step_loss : 1.5859375  learning_rate : 1.747076023391813e-05 [0m
[34mDLL 2020-08-23 20:55:52.968846 - Training Epoch: 0 Training Iteration: 701  step_loss : 1.10546875  learning_rate : 1.6252436647173486e-05 [0m
[34m��▌      | 495/1372 [01:27<02:32,  5.75it/s]#015Iteration:  36%|███▌      | 496/1372 [01:27<02:32,  5.75it/s]#015Iteration:  36%|███▌      | 497/1372 [01:27<02:32,  5.75it/s]#015Iteration:  36%|███▋      | 498/1372 [01:27<02:32,  5.74it/s]#015Iteration:  36%|███▋      | 499/1372 [01:27<02:31,  5.74it/s]#015Iteration:  36%|███▋      | 500/1372 [01:28<02:31,  5.74it/s]#015Iteration:  37%|███▋      

[34mDLL 2020-08-23 20:56:01.688924 - Training Epoch: 0 Training Iteration: 751  step_loss : 0.8291015625  learning_rate : 1.5034113060428853e-05 [0m
[34mDLL 2020-08-23 20:56:10.410643 - Training Epoch: 0 Training Iteration: 801  step_loss : 1.28125  learning_rate : 1.3815789473684211e-05 [0m
[34mDLL 2020-08-23 20:56:19.130821 - Training Epoch: 0 Training Iteration: 851  step_loss : 1.0673828125  learning_rate : 1.259746588693957e-05 [0m
[34mDLL 2020-08-23 20:56:27.852123 - Training Epoch: 0 Training Iteration: 901  step_loss : 0.845703125  learning_rate : 1.1379142300194933e-05 [0m
[34m�███▎    | 724/1372 [02:07<01:52,  5.74it/s]#015Iteration:  53%|█████▎    | 725/1372 [02:07<01:52,  5.74it/s]#015Iteration:  53%|█████▎    | 726/1372 [02:07<01:52,  5.74it/s]#015Iteration:  53%|█████▎    | 727/1372 [02:07<01:52,  5.74it/s]#015Iteration:  53%|█████▎    | 728/1372 [02:07<01:52,  5.74it/s]#015Iteration:  53%|█████▎    | 729/1372 [02:07<01:52,  5.74it/s]#015Iteration:  53%|█████▎   

[34mDLL 2020-08-23 20:56:45.295516 - Training Epoch: 0 Training Iteration: 1001  step_loss : 1.4111328125  learning_rate : 8.942495126705652e-06 [0m
[34mDLL 2020-08-23 20:56:54.015710 - Training Epoch: 0 Training Iteration: 1051  step_loss : 0.8193359375  learning_rate : 7.724171539961016e-06 [0m
[34mDLL 2020-08-23 20:57:02.741450 - Training Epoch: 0 Training Iteration: 1101  step_loss : 1.345703125  learning_rate : 6.505847953216375e-06 [0m
[34mDLL 2020-08-23 20:57:11.467380 - Training Epoch: 0 Training Iteration: 1151  step_loss : 1.14453125  learning_rate : 5.287524366471733e-06 [0m
[34mt/s]#015Iteration:  69%|██████▊   | 943/1372 [02:45<01:14,  5.73it/s]#015Iteration:  69%|██████▉   | 944/1372 [02:45<01:14,  5.73it/s]#015Iteration:  69%|██████▉   | 945/1372 [02:45<01:14,  5.73it/s]#015Iteration:  69%|██████▉   | 946/1372 [02:45<01:14,  5.74it/s]#015Iteration:  69%|██████▉   | 947/1372 [02:45<01:14,  5.73it/s]#015Iteration:  69%|██████▉   | 948/1372 [02:46<01:13,  5.73it/s]

[34mDLL 2020-08-23 20:57:28.925476 - Training Epoch: 0 Training Iteration: 1251  step_loss : 1.166015625  learning_rate : 2.8508771929824557e-06 [0m
[34mDLL 2020-08-23 20:57:37.655220 - Training Epoch: 0 Training Iteration: 1301  step_loss : 0.8642578125  learning_rate : 1.632553606237815e-06 [0m
[34meration:  84%|████████▍ | 1151/1372 [03:21<00:38,  5.73it/s]#015Iteration:  84%|████████▍ | 1152/1372 [03:21<00:38,  5.73it/s]#015Iteration:  84%|████████▍ | 1153/1372 [03:21<00:38,  5.73it/s]#015Iteration:  84%|████████▍ | 1154/1372 [03:22<00:38,  5.72it/s]#015Iteration:  84%|████████▍ | 1155/1372 [03:22<00:37,  5.73it/s]#015Iteration:  84%|████████▍ | 1156/1372 [03:22<00:37,  5.73it/s]#015Iteration:  84%|████████▍ | 1157/1372 [03:22<00:37,  5.73it/s]#015Iteration:  84%|████████▍ | 1158/1372 [03:22<00:37,  5.73it/s]#015Iteration:  84%|████████▍ | 1159/1372 [03:22<00:37,  5.73it/s]#015Iteration:  85%|████████▍ | 1160/1372 [03:23<00:37,  5.73it/s]#015Iteration:  85%|████████▍ | 1161/13


2020-08-23 21:00:55 Uploading - Uploading generated training model[34mDLL 2020-08-23 21:00:51.943415 -  e2e_train_time : 240.11868405342102  training_sequences_per_second : 365.4359524162561  final_loss : 0.125 [0m
[34m*****************************************[0m
[34mSetting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [0m
[34m*****************************************[0m
[34mstart squad_download.sh downloading ... ...[0m
[34m{'sagemaker_program': '"transform_script.py"', 'seed': '1', 'num_gpus': '4', 'bert_model': '"bert-base-uncased"', 'sagemaker_region': '"us-east-1"', 'vocab_file': '"/workspace/bert/data/bert_vocab.txt"', 'sagemaker_submit_directory': '"s3://sagemaker-us-east-1-383827541835/bert-train-kdd-2020-08-23-20-41-15-988/source/sourcedir.tar.gz"', 'train_batch_size': '16', 'num_train_epochs': '1', 'sagemaker_c

In [16]:
training_end = time.time()
print("Time for training : {} s".format(training_end - training_start))

Time for training : 1467.5661361217499 s


## IV. Deploy our trained model

After finetuning the BERT base model, we are ready to deploy the trained model to an Sagemaker endpoint and test it with some question answering tast data.

Let's first deploy the model to the inference instance `'ml.g4dn.4xlarge'`. Since the deploy needs to launch new instance and upload the model, it will take about 10-12 minutes.

In [17]:
from sagemaker.predictor import RealTimePredictor, json_serializer, json_deserializer

class JSONPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(JSONPredictor, self).__init__(endpoint_name, sagemaker_session, json_serializer, json_deserializer)


In [18]:
deploy_start = time.time()

## endpoint name must satisfy regular expression pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*
endpoint_name = 'bert-kdd' 
        
        
torch_model = PyTorchModel(model_data=f's3://{bucket}/model.tar.gz',
                           role=role,
                           entry_point='transform_script.py',
                           framework_version='1.4.0',
                           predictor_cls=JSONPredictor)

bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', 
                              initial_instance_count=1, 
                              endpoint_name=endpoint_name)

print("Time for deploying '{}' : {}".format(endpoint_name, time.time() - deploy_start))

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


---------------!Time for deploying 'bert-kdd' : 635.0710117816925


After the deployment, you should be able to see your endpoint [here](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints).

Now that our endpoint has been deployed, let's send it some questions and see how it will responde us?! For the NLP QA setting, we need to packed the model both "question" and "context" (to answer the question) in a Json file as below:

```{`json}
{
  "context": "Rachel is a girl who really loves her tiger, Kimi. Kimi is a large tiger with a very furry belly. He gets very excited by the prospect of eating chicken covered in gravy.",

  "question": "who loves Kimi?"
}
```

In [19]:
context = "Rachel is a girl who really loves her tiger, Kimi. Kimi is a large tiger with a very furry belly. \
        He gets very excited by the prospect of eating chicken covered in gravy."
question = "who loves Kimi?"
pass_in_data = {"context": context, "question": question}

In [20]:
%%time

response = bert_end.predict(pass_in_data, initial_args={'ContentType':'application/json'}) 
response

CPU times: user 13.1 ms, sys: 8 µs, total: 13.1 ms
Wall time: 5.78 s


'Rachel'

In [21]:
# !rm bert_base.pt
# !rm s3_bucket.txt
bert_end.delete_endpoint()