# TAPAS deployment via Sagemaker-Neuron

## Overview

This notebook creates an instance of ```TAPAS_Deployer``` and calls all neccessary actions to build, deploy, and test a mini variant of TAPAS for tabular question answering. For details, please refer to the source files included in ```./source``` and ```./entrypoint``` which were refactored to be easy to read.

## How to use this notebook..
- Create an AWS account.
- Create an IAM role with the following access permissions: ```AmazonSageMakerFullAccess, EC2InstanceProfileForImageBuilderECRContainerBuilds, AWSAppRunnerServicePolicyForECRAccess```
- Start a new Notebook instance in Sagemaker using the role created above.
- Clone this repository and run this notebook.

## Some notes for Scrub..
- ```Deployer``` is a generic class template from which many models can be built and deployed directly.
- ```TAPAS_Deployer``` inherits Deployer and any other model can be similarly created with minimum effort.
- To avoid timeouts and and random kernel restarts, the running code is separated from the noteboook running it. 
- Everything in ```./source``` can be easily imported as an API.
- Some integration pytest samples are included in ```./tests```

### Install local dependencies

In [None]:
!pip install --upgrade --no-cache-dir torch-neuron neuron-cc[tensorflow] torchvision torch torch-scatter --extra-index-url=https://pip.repos.neuron.amazonaws.com
!pip install --upgrade --no-cache-dir 'transformers==4.6.0'

### Prepare deployer

In [None]:
from source.tapas import TAPAS_Deployer
tapas_deployer = TAPAS_Deployer()

### Retrieve model from Huggingface Hub and prepare its respective tokeniser

In [None]:
tapas_deployer.get_model_and_tokeniser()

### Trace the model to be deployed into a Neuron instance

In [None]:
tapas_deployer.trace_model()

### Upload the traced model into S3

In [None]:
tapas_deployer.upload_model_to_s3()

### Build the docker image that will serve as the hosting environment of the deployed model
To see all the instructions used to build the image, check the Dockerfile at ```./Dockerfile```

In [None]:
tapas_deployer.build_ecr_image()

### Deploy the built environment using the entrypoint ```./entrypoint/inference.py``` to define how the image starts and how it reacts to queries

In [None]:
tapas_deployer.deploy_ecr_image()

### Test endpoint

In [None]:
print(tapas_deployer.test_endpoint())

### Delete the endpoint after testing it

In [None]:
tapas_deployer.terminate()

### Notes
- This deployer successfully builds and deploys CPU and Neuron instances.
- If run on an ```inf1``` instance, the deployer will test entrypoints locally to make sure CPU and Neuron inference work as expected in the deployed endpoints.
- The Neuron deployer works as expected when testing with classic BERT models.
- The warning above means that traced TAPAS models randomly crash with "Unkown Reasons" when used for inference.
- Using the API included here, other BERT models work well during Neuron deployment and inference.
- The neuron service will always try running predictions through Neuron models first, and will fall back on the CPU if the neuron model acts funny.