# Deploying the Endpoint
In this notebook we will deploy an endpoint with the model whisper-large-v2. We will write our inference code because we will use Whisper's API rather than the Hugging Face API. This is because the Whisper API allows for transcriptions longer tahn 30 seconds out of the box. The API can be found here: https://github.com/openai/whisper#python-usage

In [None]:
!mkdir model
!mkdir model/code

We write out custome inference code here

In [70]:
%%writefile model/code/inference.py
import whisper
import boto3
from urllib.parse import urlparse


def model_fn(model_dir):
    model = whisper.load_model("large-v2")
    return model


def transcribe_from_s3(model, s3_file, language=None):
    s3 = boto3.client('s3')
    o = urlparse(s3_file, allow_fragments=False)
    bucket = o.netloc
    key = o.path.lstrip('/')
    
    s3.download_file(bucket, key, 'tmp.wav')
    result = model.transcribe('tmp.wav', language=language)
    
    return result["language"], result["text"]


def predict_fn(data, model):
    s3_file = data.pop("s3_file")
    language = data.pop("language", None)

    detected_language, transcription = transcribe_from_s3(model, s3_file, language)
    
    return {"detected_language": detected_language, "transcription": transcription}

Overwriting model/code/inference.py


And into the `requirements.txt` we put the libraries we will need to run the inference code

In [71]:
%%writefile model/code/requirements.txt
transformers==4.25.1
git+https://github.com/openai/whisper.git
boto3

Overwriting model/code/requirements.txt


## Uploading the model to S3

In [72]:
%cd model

/home/ec2-user/SageMaker/transcription-testing/model


In [73]:
!rm model.tar.gz

In [74]:
!tar zcvf model.tar.gz *

code/
code/.ipynb_checkpoints/
code/.ipynb_checkpoints/inference-checkpoint.py
code/requirements.txt
code/inference.py


In [75]:
import sagemaker
import boto3
sess = sagemaker.Session()

sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker role arn: arn:aws:iam::905847418383:role/service-role/AmazonSageMaker-ExecutionRole-20210804T091905
sagemaker bucket: sagemaker-us-east-1-905847418383
sagemaker session region: us-east-1


In [76]:
s3_location = f"s3://{sagemaker_session_bucket}/whisper/model/model.tar.gz"

In [77]:
!aws s3 cp model.tar.gz $s3_location

upload: ./model.tar.gz to s3://sagemaker-us-east-1-905847418383/whisper/model/model.tar.gz


## Deplying the model to en endpoint

In [None]:
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.utils import name_from_base

endpoint_name = name_from_base("whisper-large-custom")

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=s3_location,       # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.17",  # transformers version used
   pytorch_version="1.10",        # pytorch version used
   py_version='py38',            # python version used
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge",
    endpoint_name=endpoint_name,
)

In [None]:
data = {
    "s3_file": "s3://sagemaker-us-east-1-905847418383/whisper/data/test/he/test-he-000.wav",
    "language": "he"
}

res = predictor.predict(data=data)
print(res)