# TAPAS deployment via Sagemaker-Neuron

## Overview

This notebook creates an instance of ```TAPAS_Deployer``` and calls all neccessary actions to build, deploy, and test a mini variant of TAPAS for tabular question answering. For details, please refer to the source files included in ```./source``` and ```./entrypoint``` which were refactored to be easy to read.

## How to use this notebook..
- Create an AWS account.
- Create an IAM role with the following access permissions: ```AmazonSageMakerFullAccess, EC2InstanceProfileForImageBuilderECRContainerBuilds, AWSAppRunnerServicePolicyForECRAccess```
- Start a new Notebook instance in Sagemaker using the role created above.
- Clone this repository and run this notebook.

## Some notes for Scrub..
- ```Deployer``` is a generic class template from which many models can be built and deployed directly.
- ```TAPAS_Deployer``` inherits Deployer and any other model can be similarly created with minimum effort.
- To avoid timeouts and and random kernel restarts, the running code is separated from the noteboook running it. 
- Everything in ```./source``` can be easily imported as an API.
- Some integration pytest samples are included in ```./tests```

### Install local dependencies

In [None]:
!pip install --upgrade --no-cache-dir torch-neuron neuron-cc[tensorflow] torchvision torch torch-scatter --extra-index-url=https://pip.repos.neuron.amazonaws.com
!pip install --upgrade --no-cache-dir 'transformers==4.6.0'

### Prepare deployer

In [1]:
from source.tapas import TAPAS_Deployer
tapas_deployer = TAPAS_Deployer(use_neuron=True)

### Retrieve model and from Huggingface Hub its respective tokeniser.

In [2]:
tapas_deployer.get_model_and_tokeniser()

### Trace the model to be deployed into a Neuron instance

In [None]:
tapas_deployer.trace_model()

### Upload the traced model into S3

In [None]:
tapas_deployer.upload_model_to_s3()

### Build the docker image that will serve as the hosting environment of the deployed model
To see all the instructions used to build the image, check the Dockerfile at ```./Dockerfile```

In [None]:
tapas_deployer.build_ecr_image()

### Deploy the built environment using the entrypoint ```./code/inference.py``` to define how the image starts and how it reacts to queries

In [3]:
tapas_deployer.deploy_ecr_image()

058095970122.dkr.ecr.eu-north-1.amazonaws.com/inference-to-deploy:latest
s3://sagemaker-eu-north-1-058095970122/inf1_compiled_model/model/model.tar.gz
-----------------!

### Test endpoint

In [4]:
print(tapas_deployer.test_endpoint())

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged\nto your system logs. See the Neuron Runtime\u0027s troubleshooting guide for help on this\ntopic: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/"
}
". See https://eu-north-1.console.aws.amazon.com/cloudwatch/home?region=eu-north-1#logEventViewer:group=/aws/sagemaker/Endpoints/inference-to-deploy-ml-inf1-2023-06-09-10-24-15-746 in account 058095970122 for more information.

### Delete the endpoint after testing it

In [None]:
tapas_deployer.terminate()

###  Final notes for Scrub ..
- This deployer successfully builds and deploys CPU and Neuron instances.
- If run on an ```inf1``` instance, the deployer will test entrypoints locally to make sure CPU and Neuron inference work as expected in the deployed endpoints.
- The Neuron deployer works as expected when testing with classic BERT models.
- Specifically for TAPAS, the tracing step always returns the following warning for TAPAS mini:
```
WARNING:Neuron:torch.neuron.trace was unable to compile > 50% of the operators in the compiled model!
WARNING:Neuron:Please review the torch.neuron.analyze_model output and if you believe you are seeing a failure
WARNING:Neuron:Lodge an issue on https://github.com/aws/aws-neuron-sdk/issues if you believe the model is not compiling as expected
```
- The warning above means that traced TAPAS models crash with "Unkown Reasons" when used for inference.
- Using the API included here, other BERT models work well during Neuron deployment and inference.
- The neuron service will always try running predictions through Neuron models first, and will fall back on the CPU if the neuron model acts funny.
- Seems like the randomness of TAPAS Neuron tracing would take a fair bit of time to resolve, so I am including a typical Neuron deployment build here with CPU fallback.

Thanks for the clear test and please let me know if you have any questions.