# Huggingface Sagemaker-sdk - Deploy 🤗 Transformers for inference


1. [Introduction](#Introduction)  
2. [Deploy a trained Hugging Face Transformer model on to SageMaker for inference](#Deploy-a-trained-Hugging-Face-Transformer-model-on-to-SageMaker-for-inference)  
    a. [Deploy the model directly after training](#Deploy-the-model-directly-after-training)  
    b. [Deploy the model using `model_data`](#Deploy-the-model-using-model_data)  
3. [Deploy one of the 10 000+ Hugging Face Transformers to Amazon SageMaker for Inference](#Deploy-one-of-the-10-000+-Hugging-Face-Transformers-to-Amazon-SageMaker-for-Inference)   


Welcome to this getting started guide, we will use the new Hugging Face Inference DLCs and Amazon SageMaker Python SDK to deploy two transformer model for inference. 
In the first example we deploy a trained Hugging Face Transformer model on to SageMaker for inference.
In the second example we directly deploy one of the 10 000+ Hugging Face Transformers from the [Hub](https://huggingface.co/models) to Amazon SageMaker for Inference.

You can find the documentation for the inference solution [here](add link).

## API - [SageMaker Hugging Face Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit)


Using the `transformers pipelines`, we designed an API, which makes it easy for you to benefit from all `pipelines` features. The API is oriented at the API of the [🤗  Accelerated Inference API](https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html), meaning your inputs need to be defined in the `inputs` key and if you want additional supported `pipelines` parameters you can add them in the `parameters` key. Below you can find examples for requests. 

**text-classification request body**
```python
{
	"inputs": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days."
}
```
**question-answering request body**
```python
{
	"inputs": {
		"question": "What is used for inference?",
		"context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
	}
}
```
**zero-shot classification request body**
```python
{
	"inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!",
	"parameters": {
		"candidate_labels": [
			"refund",
			"legal",
			"faq"
		]
	}
}
```

## Deep Learning Container

TODO: Update to correct containers

PyTorch:  
CPU - `801740330924.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch:1.7-cpu`  
GPU - `801740330924.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch:1.7-gpu`

TensorFlow:  
CPU - `801740330924.dkr.ecr.us-west-2.amazonaws.com/huggingface-tensorflow:2.4-cpu`  
GPU - `801740330924.dkr.ecr.us-west-2.amazonaws.com/huggingface-tensorflow:2.4-gpu`


In [None]:
#pytorch
pytorch_cpu_image_uri="558105141721.dkr.ecr.us-east-1.amazonaws.com/huggingface-inference-pytorch:0.0.1-cpu"
pytorch_gpu_image_uri="558105141721.dkr.ecr.us-east-1.amazonaws.com/huggingface-inference-pytorch:0.0.1-gpu"

#tensorflow
tensorflow_cpu_image_uri=""
tensorflow_gpu_image_uri=""

## Models table

| model                                                 | task                     | framework  | link                                                                          |
|-------------------------------------------------------|--------------------------|------------|-------------------------------------------------------------------------------|
| distilbert-base-uncased-finetuned-sst-2-english       | text-classification      | PyTorch    | https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english        |
| distilbert-base-uncased-finetuned-sst-2-english       | text-classification      | TensorFlow | https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english        |
| joeddav/xlm-roberta-large-xnli                        | zero-shot-classification | PyTorch    | https://huggingface.co/joeddav/xlm-roberta-large-xnli                         |
| bert-base-uncased                                     | feature-extraction       | PyTorch    | https://huggingface.co/bert-base-uncased                                      |
| bert-base-uncased                                     | feature-extraction       | TensorFlow | https://huggingface.co/bert-base-uncased                                      |
| dbmdz/bert-large-cased-finetuned-conll03-english      | ner                      | PyTorch    | https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english       |
| dbmdz/bert-large-cased-finetuned-conll03-english      | ner                      | TensorFlow | https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english       |
| distilbert-base-uncased-distilled-squad               | question-answering       | PyTorch    | https://huggingface.co/distilbert-base-uncased-distilled-squad                |
| distilbert-base-uncased-distilled-squad               | question-answering       | TensorFlow | https://huggingface.co/distilbert-base-uncased-distilled-squad                |
| albert-base-v2                                        | fill-mask                | PyTorch    | https://huggingface.co/albert-base-v2                                         |
| albert-base-v2                                        | fill-mask                | TensorFlow | https://huggingface.co/albert-base-v2                                         |
| sshleifer/distilbart-cnn-12-6                         | summarization            | PyTorch    | https://huggingface.co/sshleifer/distilbart-cnn-12-6                          |
| google/pegasus-xsum                                   | summarization            | TensorFlow | https://huggingface.co/google/pegasus-xsum                                    |
| Helsinki-NLP/opus-mt-en-de                            | translation_xx_to_yy     | PyTorch    | https://huggingface.co/Helsinki-NLP/opus-mt-en-de                             |
| Helsinki-NLP/opus-mt-en-de                            | translation_xx_to_yy     | TensorFlow | https://huggingface.co/Helsinki-NLP/opus-mt-en-de                             |
| t5-base                                               | text2text-generation     | PyTorch    | https://huggingface.co/t5-base                                                |
| t5-base                                               | text2text-generation     | TensorFlow | https://huggingface.co/t5-base                                                |
| gpt2                                                  | text-generation          | PyTorch    | https://huggingface.co/gpt2                                                   |
| gpt2                                                  | text-generation          | TensorFlow | https://huggingface.co/gpt2                                                   |
| google/tapas-base-finetuned-wtq                       | table-question-answering | PyTorch    | https://huggingface.co/google/tapas-base-finetuned-wtq                        |
| bert-large-uncased-whole-word-masking-finetuned-squad | question-answering       | PyTorch    | https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad  |
| nlptown/bert-base-multilingual-uncased-sentiment      | text-classification      | PyTorch    | https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment       |
| ProsusAI/finbert                                      | text-classification      | PyTorch    | https://huggingface.co/ProsusAI/finbert                                       |
| mrm8488/bert-mini-finetuned-age_news-classification   | text-classification      | PyTorch    | https://huggingface.co/mrm8488/bert-mini-finetuned-age_news-classification    |
| typeform/distilbert-base-uncased-mnli                 | zero-shot-classification | PyTorch    | https://huggingface.co/typeform/distilbert-base-uncased-mnli                  |

## Setting up environment

In [None]:
# change to correct version when merged
#!pip install sagemaker>=2.46.0 
!pip install git+https://github.com/icywang86rui/sagemaker-python-sdk.git@hf-inference --upgrade

In [4]:
from sagemaker.huggingface import HuggingFaceModel

In [5]:
import sagemaker

sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

# # ⚠️ local config ⚠️ (comment in if you run this notebook outside of SageMaker)
# import boto3

# iam_client = boto3.client('iam')
# # role = iam_client.get_role(RoleName='role-name-of-your-iam-role-with-right-permissions')['Role']['Arn']
# role = iam_client.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
# sess = sagemaker.Session()    

# ⚠️ SageMaker config ⚠️ (comment out if you run this notebook outside of SageMaker)

role = sagemaker.get_execution_role()
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker role arn: arn:aws:iam::558105141721:role/sagemaker_execution_role
sagemaker bucket: sagemaker-us-east-1-558105141721
sagemaker session region: us-east-1


## Deploy a trained Hugging Face Transformer model on to SageMaker for inference

There are two ways on how you can deploy you SageMaker trained Hugging Face model. You can either deploy it after your training is finished or you can deploy it later using the `model_data` pointing to you saved model on s3.

### Deploy the model directly after training
If you deploy you model directly after training you need to make sure that all required files are saved in your training script, including the Tokenizer and the Model. 
```python
from sagemaker.huggingface import HuggingFace

############ pseudo code start ############

# create HuggingFace estimator for running training
huggingface_estimator = HuggingFace(....)

# starting the train job with our uploaded datasets as input
huggingface_estimator.fit(...)

############ pseudo code end ############

# deploy model to SageMaker Inference
predictor = hf_estimator.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")

# example request, you always need to define "inputs"
data = {
   "inputs": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days."
}

# request
predictor.predict(data)

```

### Deploy the model using `model_data`

In [6]:
from sagemaker.huggingface.model import HuggingFaceModel


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://sagemaker-us-east-1-558105141721/sagemaker-huggingface-serving/models/model.tar.gz",  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   image_uri=pytorch_cpu_image_uri, # Beta DLC image uri
   #transformers_version="4.6", # transformers version used
   #pytorch_version="1.7", # pytorch version used
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)


# example request, you always need to define "inputs"
data = {
   "inputs": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days."
}

# request
predictor.predict(data)

-------------!

[{'label': 'POSITIVE', 'score': 0.5001324415206909}]

In [7]:
# delete endpoint
predictor.delete_endpoint()

## Deploy one of the 10 000+ Hugging Face Transformers to Amazon SageMaker for Inference

_This is an experimental feature, where the model will be loaded after the endpoint is created. This could lead to errors, e.g. models > 10GB_

To deploy a model directly from the Hub to SageMaker we need to define 2 environment variables when creating the `HuggingFaceModel` . We need to define:

- `HF_MODEL_ID`: defines the model id, which will be automatically loaded from [huggingface.co/models](http://huggingface.co/models) when creating or SageMaker Endpoint. The 🤗 Hub provides +10 000 models all available through this environment variable.
- `HF_TASK`: defines the task for the used 🤗 Transformers pipeline. A full list of tasks can be find [here](https://huggingface.co/transformers/main_classes/pipelines.html).

In [7]:
from sagemaker.huggingface.model import HuggingFaceModel

# Hub Model configuration. https://huggingface.co/models
hub = {
  'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', # model_id from hf.co/models
  'HF_TASK':'question-answering' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role, # iam role with permissions to create an Endpoint
   image_uri=pytorch_cpu_image_uri", # Beta DLC image uri
   #transformers_version="4.6", # transformers version used
   #pytorch_version="1.7", # pytorch version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

# example request, you always need to define "inputs"
data = {
"inputs": {
	"question": "What is used for inference?",
	"context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
	}
}


# request
predictor.predict(data)

-------------!

{'score': 0.9987210035324097, 'start': 68, 'end': 77, 'answer': 'sagemaker'}

In [8]:

# delete endpoint
predictor.delete_endpoint()