# Deploying the model to an endpoint
In this notebook we will deploy an endpoint with the model whisper-large-v2. We will write our inference code because we will use Whisper's API rather than the Hugging Face API. This is because the Whisper API allows for transcriptions longer than 30 seconds out of the box. The API can be found here: https://github.com/openai/whisper#python-usage

In [1]:
!mkdir model
!mkdir model/code

We write out custome inference code here

In [2]:
%%writefile model/code/inference.py
import whisper


def model_fn(model_dir):
    model = whisper.load_model("large-v2")
    return model


def predict_fn(audio_bytes, model):
    audio_file = "tmp.mp3"
    
    with open(audio_file, "wb") as binary_file:
        binary_file.write(audio_bytes['inputs'])
        
    result = model.transcribe(audio_file)

    # print the recognized text
    return {"detected_language": result["language"], "transcription": result["text"]}

Writing model/code/inference.py


And into the `requirements.txt` we put the libraries we will need to run the inference code

In [3]:
%%writefile model/code/requirements.txt
git+https://github.com/openai/whisper.git

Writing model/code/requirements.txt


## Uploading the model to S3

In [4]:
%cd model

/home/ec2-user/SageMaker/whisper-demo-sagemaker/model


In [5]:
!tar zcvf model.tar.gz *

code/
code/requirements.txt
code/inference.py


In [2]:
import sagemaker

sess = sagemaker.Session()
sagemaker_session_bucket = sess.default_bucket()
role = sagemaker.get_execution_role()

In [3]:
role

'arn:aws:iam::905847418383:role/service-role/AmazonSageMaker-ExecutionRole-20210804T091905'

In [7]:
s3_location = f"s3://{sagemaker_session_bucket}/whisper/model/model.tar.gz"

In [8]:
!aws s3 cp model.tar.gz $s3_location

upload: ./model.tar.gz to s3://sagemaker-us-east-1-905847418383/whisper/model/model.tar.gz


## Deplying the model to en endpoint

In [9]:
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.utils import name_from_base
from sagemaker.serializers import DataSerializer

huggingface_model = HuggingFaceModel(
   model_data=s3_location,
   role=role,
   transformers_version="4.17",
   pytorch_version="1.10",
   py_version='py38',
)

In [10]:
endpoint_name = name_from_base("whisper-large-v2")
audio_serializer = DataSerializer(content_type='audio/x-audio')

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge",
    endpoint_name=endpoint_name,
    serializer=audio_serializer,
)

---------!

In [12]:
%cd ..

/home/ec2-user/SageMaker/whisper-demo-sagemaker


In [14]:
audio_path = "mpthreetest.mp3"

res = predictor.predict(data=audio_path)
print(res)

{'detected_language': 'en', 'transcription': ' Okay, we are trying this for a second time to test the ability to upload an MP3 file. Hopefully this will work.'}
