# Huggingface Sagemaker-sdk - Deploy 🤗 Transformers for inference


1. [Introduction](#Introduction)  
2. [Deploy a trained Hugging Face Transformer model on to SageMaker for inference](#Deploy-a-trained-Hugging-Face-Transformer-model-on-to-SageMaker-for-inference)  
    a. [Deploy the model directly after training](#Deploy-the-model-directly-after-training)  
    b. [Deploy the model using `model_data`](#Deploy-the-model-using-model_data)  
3. [Deploy one of the 10 000+ Hugging Face Transformers to Amazon SageMaker for Inference](#Deploy-one-of-the-10-000+-Hugging-Face-Transformers-to-Amazon-SageMaker-for-Inference)   


Welcome to this getting started guide, we will use the new Hugging Face Inference DLCs and Amazon SageMaker Python SDK to deploy two transformer model for inference. 
In the first example we deploy a trained Hugging Face Transformer model on to SageMaker for inference.
In the second example we directly deploy one of the 10 000+ Hugging Face Transformers from the [Hub](https://huggingface.co/models) to Amazon SageMaker for Inference.

You can find the documentation for the inference solution [here](add link).

In [1]:
#prepare
pytorch_cpu_image_uri="801740330924.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-cpu-py36-ubuntu18.04"
!pip install git+https://github.com/icywang86rui/sagemaker-python-sdk.git@hf-inference --upgrade

In [None]:
!pip install sagemaker --upgrade

## Deploy a Hugging Face Transformer model from S3 to SageMaker for inference


In [4]:
from sagemaker.huggingface import HuggingFaceModel

role = sagemaker.get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://sagemaker-us-east-1-558105141721/sagemaker-huggingface-serving/models/model.tar.gz",  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   image_uri=pytorch_cpu_image_uri, # Beta DLC image uri
   #transformers_version="4.6", # transformers version used
   #pytorch_version="1.7", # pytorch version used
)

In [None]:
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

In [None]:

# example request, you always need to define "inputs"
data = {
   "inputs": "The new Hugging Face SageMaker DLC makes it super easy to deploy models in production. I love it!"
}

# request
predictor.predict(data)

In [7]:
# delete endpoint
predictor.delete_endpoint()

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
  
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

model.save_pretrained('./tmp_model')
tokenizer.save_pretrained('./tmp_model')

('./tmp_model/tokenizer_config.json',
 './tmp_model/special_tokens_map.json',
 './tmp_model/vocab.txt',
 './tmp_model/added_tokens.json',
 './tmp_model/tokenizer.json')

In [2]:
!python archive.py

In [3]:
!rm -rf tmp_model