# SageMaker JumpStart - deploy automatic speech recognition model

This notebook demonstrates how to use the SageMaker Python SDK to deploy a JumpStart automatic speech recognition model.

In [34]:
import sagemaker
import time
import boto3
import json
from sagemaker.utils import name_from_base

sess = sagemaker.Session()
bucket = sess.default_bucket() # Set a default S3 bucket
prefix = 'whisper-sm-js'
role = sagemaker.get_execution_role()
session = boto3.session.Session()
region = session.region_name

# below boto3 clients are for invoking asynchronous endpoint 
sm = boto3.client("sagemaker")
sm_runtime = boto3.client("sagemaker-runtime")

# S3 client
s3 = boto3.client('s3')

In [35]:
import json

import boto3
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.jumpstart.utils import get_jumpstart_content_bucket

Select your desired model ID. You can search for available models in the [Built-in Algorithms with pre-trained Model Table](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html).

In [36]:
model_id = "huggingface-asr-whisper-small"

#"huggingface-asr-whisper-large-v3"

## Deploy model

Using the model ID, define your model as a JumpStart model. You can deploy the model on other instance types by passing `instance_type` to `JumpStartModel`. See [Deploy publicly available foundation models with the JumpStartModel class](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-use-python-sdk.html#jumpstart-foundation-models-use-python-sdk-model-class) for more configuration options.


In [37]:
model = JumpStartModel(model_id=model_id)

Using model 'huggingface-asr-whisper-small' with wildcard version identifier '*'. You can pin to version '3.0.0' for more stable results. Note that models may have different input/output signatures after a major version upgrade.


In [38]:
container = model.image_uri

In [39]:
code_artifacts = model.model_data["S3DataSource"]["S3Uri"]
code_artifacts

's3://jumpstart-cache-prod-us-west-2/huggingface-asr/huggingface-asr-whisper-small/artifacts/inference-prepack/v2.0.0/'

You can now deploy your JumpStart model. The deployment might take few minutes.

In [40]:
!mkdir -p code

!aws s3 sync {code_artifacts} code

download: s3://jumpstart-cache-prod-us-west-2/huggingface-asr/huggingface-asr-whisper-small/artifacts/inference-prepack/v2.0.0/added_tokens.json to code/added_tokens.json
download: s3://jumpstart-cache-prod-us-west-2/huggingface-asr/huggingface-asr-whisper-small/artifacts/inference-prepack/v2.0.0/__model_info__.json to code/__model_info__.json
download: s3://jumpstart-cache-prod-us-west-2/huggingface-asr/huggingface-asr-whisper-small/artifacts/inference-prepack/v2.0.0/code/__init__.py to code/code/__init__.py
download: s3://jumpstart-cache-prod-us-west-2/huggingface-asr/huggingface-asr-whisper-small/artifacts/inference-prepack/v2.0.0/code/__script_info__.json to code/code/__script_info__.json
download: s3://jumpstart-cache-prod-us-west-2/huggingface-asr/huggingface-asr-whisper-small/artifacts/inference-prepack/v2.0.0/code/constants/__pycache__/constants.cpython-310.pyc to code/code/constants/__pycache__/constants.cpython-310.pyc
download: s3://jumpstart-cache-prod-us-west-2/huggingface

In [41]:
uncompressed_path = f"s3://{bucket}/{prefix}/uncompressed/model/"
!aws s3 sync code {uncompressed_path}

upload: code/__model_info__.json to s3://sagemaker-us-west-2-376678947624/whisper-sm-js/uncompressed/model/__model_info__.json
upload: code/config.json to s3://sagemaker-us-west-2-376678947624/whisper-sm-js/uncompressed/model/config.json
upload: code/preprocessor_config.json to s3://sagemaker-us-west-2-376678947624/whisper-sm-js/uncompressed/model/preprocessor_config.json
upload: code/added_tokens.json to s3://sagemaker-us-west-2-376678947624/whisper-sm-js/uncompressed/model/added_tokens.json
upload: code/generation_config.json to s3://sagemaker-us-west-2-376678947624/whisper-sm-js/uncompressed/model/generation_config.json
upload: code/special_tokens_map.json to s3://sagemaker-us-west-2-376678947624/whisper-sm-js/uncompressed/model/special_tokens_map.json
upload: code/tokenizer_config.json to s3://sagemaker-us-west-2-376678947624/whisper-sm-js/uncompressed/model/tokenizer_config.json
upload: code/vocab.json to s3://sagemaker-us-west-2-376678947624/whisper-sm-js/uncompressed/model/vocab

In [42]:
model_data={
    'S3DataSource': {
        'S3Uri': uncompressed_path,
        'S3DataType': 'S3Prefix',
        'CompressionType': 'None'
    }
}

In [43]:
from sagemaker.model import Model

model_name = name_from_base(f"{prefix}-model")

model = Model(
    image_uri=container,
    model_data=model_data,
    role=role,
    env={
        "ENDPOINT_SERVER_TIMEOUT":"3600",
        "MODEL_CACHE_ROOT": "/opt/ml/model",
        'SAGEMAKER_ENV': '1',
        'SAGEMAKER_PROGRAM': 'inference.py'
    },
    name=model_name
)

In [44]:
endpoint_name = name_from_base(f"{prefix}-endpoint")
# deploy model to SageMaker Inference
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    endpoint_name=endpoint_name,
)

------------!

## Invoke endpoint

The following cell creates a helper function to download an audio file for the automatic speech recognition. You will pass this file to the predictor for inference.


In [45]:
def download_and_load_audio_file(file_name):
    s3_bucket = get_jumpstart_content_bucket()
    s3_prefix = "training-datasets/asr_notebook_data"
    s3_client = boto3.client("s3")
    s3_client.download_file(s3_bucket, f"{s3_prefix}/{file_name}", file_name)
    with open(file_name, "rb") as file:
        input_audio = file.read()
    return input_audio

audio_file = "sample1.wav"
input_audio = download_and_load_audio_file(audio_file)

Now you can obtain the audio file with `download_and_load_audio_file` and perform a prediction through your predictor object. The wav files must be sampled at 16kHz. This is required by the automatic speech recognition models, so make sure to resample them if required. The input audio file must be less than 30 seconds. If you receive client error (413) please check the payload size to the endpoint. Payloads for SageMaker invoke endpoint requests are limited to about 5MB.

In [46]:
response = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='audio/wav',
    Body=input_audio
)
output=json.loads(response["Body"].read().decode('utf-8'))['text'][0]
output

" We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won't get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody."

The following example performs a translation task to convert a french audio file to english text.

## > Remove endpoint

In [47]:
sm.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': 'f8a736e8-d608-4ec4-a816-25d0c22c402f',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'f8a736e8-d608-4ec4-a816-25d0c22c402f',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Thu, 13 Jun 2024 22:03:32 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}

## > Asynchronous Inference

In [48]:
## upload the sample file
s3_key = f"{prefix}/input/{audio_file}"
s3.upload_file(audio_file, bucket, s3_key)

In [49]:
num_duplicates = 10

# Create duplicate copies
for i in range(num_duplicates):
    duplicate_key = f"{s3_key.rsplit('.', 1)[0]}_{i+1}.{s3_key.rsplit('.', 1)[1]}"
    copy_source = {
        'Bucket': bucket,
        'Key': s3_key
    }
    s3.copy_object(Bucket=bucket, Key=duplicate_key, CopySource=copy_source)
    print(f"Created duplicate copy s3://{bucket}/{duplicate_key}")

Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_1.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_2.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_3.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_4.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_5.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_6.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_7.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_8.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_9.wav
Created duplicate copy s3://sagemaker-us-west-2-376678947624/whisper-sm-js/input/sample1_10.wav


In [50]:
from sagemaker.async_inference import AsyncInferenceConfig

# Create an AsyncInferenceConfig object
async_config = AsyncInferenceConfig(
    output_path=f"s3://{bucket}/{prefix}/output", 
    max_concurrent_invocations_per_instance = 4,
    # notification_config = {
            #   "SuccessTopic": "arn:aws:sns:us-east-2:123456789012:MyTopic",
            #   "ErrorTopic": "arn:aws:sns:us-east-2:123456789012:MyTopic",
    # }, #  Notification configuration 
)

# Deploy the model for async inference
endpoint_name = name_from_base(f"{prefix}-async-endpoint")
async_predictor = model.deploy(
    async_inference_config=async_config,
    initial_instance_count=1, # number of instances
    instance_type='ml.g5.2xlarge', # instance type
    endpoint_name=endpoint_name
)

Using already existing model: whisper-sm-js-model-2024-06-13-21-50-59-718


------------!

In [52]:
response = s3.list_objects_v2(Bucket=bucket, Prefix=f"{prefix}/input/")

for obj in response.get('Contents', []):
    key = obj['Key']
    input_path = f"s3://{bucket}/{key}"

    response = sm_runtime.invoke_endpoint_async(
        EndpointName=endpoint_name,
        InputLocation=input_path,
        ContentType='audio/wav',
        InvocationTimeoutSeconds=3600  # Set a 1 hour timeout
    )

    print(response["ResponseMetadata"]["HTTPHeaders"]["x-amzn-sagemaker-outputlocation"])

s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/44e5dc82-64b7-4a2b-bfdb-de390ebfea28.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/a2e53501-1517-45ae-ae23-04686c1da4bc.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/748af65c-bb2a-4fe9-bc0b-1252484041aa.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/53864457-57a3-4ee5-bb2b-46d6b9f61a04.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/e7fd84dc-f43f-4e05-9859-b5ca18545fe3.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/4e547c7a-d491-4121-ad0e-3eb1c1ead9e5.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/18bbcf7a-c9d3-4c52-9cc7-6069442d8266.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/47d76571-4469-4aa1-832c-04256e6ded02.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/e8028488-616b-4a96-b6d7-37d99292fde4.out
s3://sagemaker-us-west-2-376678947624/whisper-sm-js/output/a6be3c2a-0729-4f73-b91a-839c3cd4b9dc.out


## > Remove endpoint

In [33]:
sm.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': '4b5713bb-f65f-4291-b77b-45908aa537c8',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '4b5713bb-f65f-4291-b77b-45908aa537c8',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Thu, 13 Jun 2024 21:49:42 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}