# TAPAS deployment via Sagemaker-Neuron

## Overview

This notebook creates an instance of ```TAPAS_Deployer``` and calls all neccessary actions to build, deploy, and test a mini variant of TAPAS for tabular question answering. For details, please refer to the source files included in ```./source``` and ```./entrypoint``` which were refactored to be easy to read.

## How to use this notebook..
- Create an AWS account.
- Create an IAM role with the following access permissions: ```AmazonSageMakerFullAccess, EC2InstanceProfileForImageBuilderECRContainerBuilds, AWSAppRunnerServicePolicyForECRAccess```
- Start a new Notebook instance in Sagemaker using the role created above.
- Clone this repository and run this notebook.

## Some notes for Scrub..
- ```Deployer``` is a generic class template from which many models can be built and deployed directly.
- ```TAPAS_Deployer``` inherits Deployer and any other model can be similarly created with minimum effort.
- To avoid timeouts and and random kernel restarts, the running code is separated from the noteboook running it. 
- Everything in ```./source``` can be easily imported as an API.
- Some integration pytest samples are included in ```./tests```

### Install local dependencies

In [1]:
!pip install --upgrade --no-cache-dir torch-neuron neuron-cc[tensorflow] torchvision torch torch-scatter --extra-index-url=https://pip.repos.neuron.amazonaws.com
!pip install --upgrade --no-cache-dir 'transformers==4.6.0'

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com, https://pip.repos.neuron.amazonaws.com
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


### Prepare deployer

In [2]:
from source.tapas import TAPAS_Deployer
tapas_deployer = TAPAS_Deployer()

### Retrieve model from Huggingface Hub and prepare its respective tokeniser

In [3]:
tapas_deployer.get_model_and_tokeniser()

### Trace the model to be deployed into a Neuron instance

In [None]:
tapas_deployer.trace_model()

### Upload the traced model into S3

In [4]:
tapas_deployer.upload_model_to_s3()

neuron_compiled_model.pt
Uploaded model to S3: s3://sagemaker-eu-north-1-058095970122/inf1_compiled_model/model/model.tar.gz


### Build the docker image that will serve as the hosting environment of the deployed model
To see all the instructions used to build the image, check the Dockerfile at ```./Dockerfile```

In [5]:
tapas_deployer.build_ecr_image()

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded
Sending build context to Docker daemon  327.5MB
Step 1/4 : FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuron:1.10.2-transformers4.20.1-neuron-py37-sdk1.19.1-ubuntu18.04
 ---> 2cfdb213ad90
Step 2/4 : RUN pip install "pandas==1.3.5"
 ---> Using cache
 ---> 97db5d042d63
Step 3/4 : RUN pip install --upgrade --no-cache-dir torch-neuron neuron-cc[tensorflow] torchvision torch torch-scatter --extra-index-url=https://pip.repos.neuron.amazonaws.com
 ---> Running in 3b8cba23f5b0
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting torch-neuron
  Downloading https://pip.repos.neuron.amazonaws.com/torch-neuron/torch_neuron-1.13.1.2.7.1.0-py3-none-linux_x86_64.whl (38.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.5/38.5 MB 58.5 MB/s eta 0:00:00
Collecting neuron-cc[tensorflow]
  Downloading https://pip.repos.neuron.amazonaws.com/neuron-cc/neuron_cc-1.15.0.0%2Beec0c3604-cp37-cp37m-linux_x86_64.whl

The command '/bin/sh -c pip install --upgrade --no-cache-dir torch-neuron neuron-cc[tensorflow] torchvision torch torch-scatter --extra-index-url=https://pip.repos.neuron.amazonaws.com' returned a non-zero code: 1
Error response from daemon: No such image: inference-to-deploy:latest


Login Succeeded
The push refers to repository [058095970122.dkr.ecr.eu-north-1.amazonaws.com/inference-to-deploy]


https://docs.docker.com/engine/reference/commandline/login/#credentials-store

An image does not exist locally with the tag: 058095970122.dkr.ecr.eu-north-1.amazonaws.com/inference-to-deploy


### Deploy the built environment using the entrypoint ```./code/inference.py``` to define how the image starts and how it reacts to queries

In [None]:
tapas_deployer.deploy_ecr_image()

058095970122.dkr.ecr.eu-north-1.amazonaws.com/inference-to-deploy:latest
s3://sagemaker-eu-north-1-058095970122/inf1_compiled_model/model/model.tar.gz
----------------------------------

### Test endpoint

In [None]:
print(tapas_deployer.test_endpoint())

### Delete the endpoint after testing it

In [None]:
tapas_deployer.terminate()

###  Final notes for Scrub ..
- This deployer successfully builds and deploys CPU and Neuron instances.
- If run on an ```inf1``` instance, the deployer will test entrypoints locally to make sure CPU and Neuron inference work as expected in the deployed endpoints.
- The Neuron deployer works as expected when testing with classic BERT models.
- Specifically for TAPAS, the tracing step always returns the following warning for TAPAS mini:
```
WARNING:Neuron:torch.neuron.trace was unable to compile > 50% of the operators in the compiled model!
WARNING:Neuron:Please review the torch.neuron.analyze_model output and if you believe you are seeing a failure
WARNING:Neuron:Lodge an issue on https://github.com/aws/aws-neuron-sdk/issues if you believe the model is not compiling as expected
```
- The warning above means that traced TAPAS models randomly crash with "Unkown Reasons" when used for inference.
- Using the API included here, other BERT models work well during Neuron deployment and inference.
- The neuron service will always try running predictions through Neuron models first, and will fall back on the CPU if the neuron model acts funny.
- Seems like the randomness of TAPAS Neuron tracing would take a fair bit of time to resolve, so I am including a typical Neuron deployment build here with CPU fallback.

Thanks for the clear test and please let me know if you have any questions.