# Hands-on: Deploying Question Answering with BERT

Pre-trained language representations have been shown to improve many downstream NLP tasks such as question answering, and natural language inference. Devlin, Jacob, et al proposed BERT [1] (Bidirectional Encoder Representations from Transformers), which fine-tunes deep bidirectional representations on a wide range of tasks with minimal task-specific parameters, and obtained state- of-the-art results.

Using a pre-trained QA with BERT (see alternate notebook "training.ipydb" for details), we'll load a trained model to perform inference on the SQuAD dataset

### A quick overview: an example from SQuAD dataset is like below:

    (2, 
    '56be4db0acb8001400a502ee', 
    'Where did Super Bowl 50 take place?', 

    'Super Bowl 50 was an American football game to determine the champion of the National 
    Football League (NFL) for the 2015 season. The American Football Conference (AFC) 
    champion Denver Broncos defeated the National Football Conference (NFC) champion 
    Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played 
    on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, 
    California. As this was the 50th Super Bowl, the league emphasized the "golden 
    anniversary" with various gold-themed initiatives, as well as temporarily suspending 
    the tradition of naming each Super Bowl game with Roman numerals (under which the 
    game would have been known as "Super Bowl L"), so that the logo could prominently 
    feature the Arabic numerals 50.', 

    ['Santa Clara, California', "Levi's Stadium", "Levi's Stadium 
    in the San Francisco Bay Area at Santa Clara, California."], 

    [403, 355, 355])

## Deploy on SageMaker in 4 steps

1. Preparing the environment 
2. Grabbing the model, parameters, code etc
3. Building a docker container with dependencies installed
4. Launching a serving end-point with SageMaker SDK

### Step 1 : Preparing the environment

In this step we create a couple of variables, and call out to a [helper script](files/environment.config) to prepare our environment. 

This helper script has a guard condition on it, so it only runs once.


In [None]:
import os
pwd = os.getcwd()
model_path = "{}/model.tar.gz".format(pwd)

# install some requirements specific to this notebook
!bash environment.config

### Step 2 : Grabbing the model, parameters, code etc

In this step we grab the pre-trained BERT model, containing parameters, vocabulary file, and all the inference files (code/serve.py, bert/data/qa.py, bert_qa_evaluate.py) from S3 and save to a local file called model.tar.gz.

(Note that the serve.py is the "entry_point" for Sagemaker to do the inference, and it needs to be under the code/ directory.)


In [None]:
print("downloading model to : " + model_path)
!aws s3 cp s3://matrow-public-data/bert/model.tar.gz {model_path}

Ok, let's have a look at what's in the archive

In [None]:
!ls -lh {model_path}
!if [ -d ./tmp ]; then rm -rf ./tmp; fi
!mkdir -p ./tmp
print("\nextracting {}\n".format(model_path))
!tar -xzf {model_path} --directory ./tmp/
!tree ./tmp

#### Supporting files

[serve.py](./tmp/code/serve.py) has two essential functions in it 

1. ```model_fn``` to load model parameters
2. ```transform_fn``` to run model inference given an input

Let's have a quick look in the file....

In [None]:
%load ./code/serve.py

### Step 3 : Building a docker container with everything installed

Let's prepare a docker container with all the dependencies required for model inference. 

Here we build a docker container based on the SageMaker MXNet inference container.

You can find the list of all available inference containers at https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html

Let's start by having a look at the Dockerfile.

In [None]:
!cat Dockerfile

Next we kick of the build process, which is 2 steps.
After authenticating to the public repo that our base image comes from, we issue the docker build command.

I am assigning the image tag to a variable to re-use later.

In [None]:
myTag = "my-docker:inference"

!$(aws ecr get-login --no-include-email --region ${REGION} --registry-ids 763104351884)
buildCmd = "docker build --no-cache --build-arg REGION=${{REGION}} -t {} . -f ./Dockerfile".format(myTag)
!{buildCmd}

Here we are using local mode for demonstration purpose. 

To deploy on actual instances, you need to login into AWS elastic container registry (ECR) service, and push the container to ECR, so the fully managed prediction endpoint can pull it when it is being created.

It would look like this ... 

```
docker build -t $YOUR_EDR_DOCKER_TAG . -f Dockerfile
$(aws ecr get-login --no-include-email --region $YOUR_REGION)
docker push $YOUR_EDR_DOCKER_TAG
```

### Step 4 : Launching a serving end-point with SageMaker SDK

In this step, we create an MXNet model which can be deployed later, by specifying the docker image, and entry point for the inference code. 

As we mentioned previously, to deploy a non-local endpoint for predictions, we would need to push our container to a container registry, like ECR.

In [None]:
import sagemaker
from sagemaker.mxnet.model import MXNetModel

full_path = "file://{}".format(model_path)
sagemaker_model = MXNetModel(model_data=full_path,
                             image=myTag, # local docker image
                             role=sagemaker.get_execution_role(), 
                             py_version='py3',            # python version
                             entry_point='serve.py',
                             source_dir='.', 
                             framework_version='1.2')

We use 'local' mode to test our deployment code, where the inference happens on the current instance. 

If you are ready to deploy the model on a remote instance, change the `instance_type` argument to values such as `ml.c4.xlarge`.

For this workshop, stay with local, unless you have altered the notebook, and pushed your docker image to ECR already.

In [None]:
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='local')

Now let's actually submit a inference job. 

Here we simply grab two datapoints from the SQuAD dataset and pass the examples to our predictor by calling ```predictor.predict```

In [None]:
## test
my_test_example_0 = ('Which NFL team represented the AFC at Super Bowl 50?',
 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.')

my_test_example_1 = ('Where did Super Bowl 50 take place?',
 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.')

my_test_examples = (my_test_example_0, my_test_example_1)

output = predictor.predict(my_test_examples)  

In [None]:
print("\nPrediction output: \n\n")

for k in output.keys():
    print('{}\n\n'.format(output[k]))

### Clean Up

Remove the endpoint after we are done. This is non-essential for local testing, but important if you deployed to a remote endpoint.

In [None]:
predictor.delete_endpoint()