# Huggingface Sagemaker-sdk - Deploy 🤗 Transformers for inference


1. [Introduction](#Introduction)  
2. [Deploy a trained Hugging Face Transformer model on to SageMaker for inference](#Deploy-a-trained-Hugging-Face-Transformer-model-on-to-SageMaker-for-inference)  
    a. [Deploy the model directly after training](#Deploy-the-model-directly-after-training)  
    b. [Deploy the model using `model_data`](#Deploy-the-model-using-model_data)  
3. [Deploy one of the 10 000+ Hugging Face Transformers to Amazon SageMaker for Inference](#Deploy-one-of-the-10-000+-Hugging-Face-Transformers-to-Amazon-SageMaker-for-Inference)   


Welcome to this getting started guide, we will use the new Hugging Face Inference DLCs and Amazon SageMaker Python SDK to deploy two transformer model for inference. 
In the first example we deploy a trained Hugging Face Transformer model on to SageMaker for inference.
In the second example we directly deploy one of the 10 000+ Hugging Face Transformers from the [Hub](https://huggingface.co/models) to Amazon SageMaker for Inference.

## Setting up environment

In [None]:
# change to correct version when merged
#!pip install sagemaker>=2.46.0 
!pip install git+https://github.com/icywang86rui/sagemaker-python-sdk.git@hf-inference --upgrade

In [4]:
from sagemaker.huggingface import HuggingFaceModel

## Deploy one of the 10 000+ Hugging Face Transformers to Amazon SageMaker for Inference

_This is an experimental feature, where the model will be loaded after the endpoint is created. This could lead to errors, e.g. models > 10GB_

To deploy a model directly from the Hub to SageMaker we need to define 2 environment variables when creating the `HuggingFaceModel` . We need to define:

- `HF_MODEL_ID`: defines the model id, which will be automatically loaded from [huggingface.co/models](http://huggingface.co/models) when creating or SageMaker Endpoint. The 🤗 Hub provides +10 000 models all available through this environment variable.
- `HF_TASK`: defines the task for the used 🤗 Transformers pipeline. A full list of tasks can be find [here](https://huggingface.co/transformers/main_classes/pipelines.html).

In [None]:
from sagemaker.huggingface import HuggingFaceModel

role = sagemaker.get_execution_role()

# Hub Model configuration. https://huggingface.co/models
hub = {
  'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', # model_id from hf.co/models
  'HF_TASK':'question-answering' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role="", # iam role with permissions to create an Endpoint
   image_uri="", # Beta DLC image uri
   #transformers_version="4.6", # transformers version used
   #pytorch_version="1.7", # pytorch version used
)

In [None]:
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

In [None]:
# example request, you always need to define "inputs"
data = {
"inputs": {
    "question": "What is used for inference?",
    "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
    }
}


# request
predictor.predict(data)

In [8]:
# delete endpoint
predictor.delete_endpoint()