# Using SageMaker JumpStart to Host the Whisper Model for Automatic Speech Recognition Tasks


❗This notebook works well with the `PyTorch 2.0.0 Python 3.10 CPU Optimized` kernel on a SageMaker Studio `ml.t3.medium` instance.

## Setup

Before executing the notebook, there are some initial steps required for set up.

In [None]:
%%sh
pip install -Uq pip
pip install -Uq sagemaker>=2.221.1
pip install -Uq datasets==2.16.1
pip install -Uq soundfile==0.12.1
pip install -Uq librosa==0.10.2.post1
pip install -Uq soundfile==0.12.1

In [None]:
!pip freeze | grep -E "datasets|librosa|sagemaker|soundfile|torch"

datasets==2.16.1
librosa==0.10.2.post1
sagemaker==2.221.1
sagemaker-experiments==0.1.43
sagemaker-pytorch-training==2.8.0
sagemaker-training==4.5.0
smdebug @ file:///tmp/sagemaker-debugger
soundfile==0.12.1
torch==2.0.0
torchaudio==2.0.1
torchdata @ file:///opt/conda/conda-bld/torchdata_1679615656247/work
torchtext==0.15.1
torchvision==0.15.1


## Dowonload a test data sample from Hugging Face dataset

In [None]:
import soundfile as sf
from datasets import load_dataset

dataset = load_dataset('MLCommons/peoples_speech', split='train', streaming=True, trust_remote_code=True)
sample = next(iter(dataset))

In [None]:
audio_data = sample['audio']['array']
audio_path = 'sample_audio.wav'
sf.write(audio_path, audio_data, sample['audio']['sampling_rate'])

print(f"Audio sample saved to '{audio_path}'")

In [None]:
import boto3
import sagemaker

aws_region = boto3.Session().region_name
sess = sagemaker.session.Session()
role = sagemaker.get_execution_role()

bucket = sess.default_bucket()
prefix = 'openai-whisper'

## Upload the test data into S3

In [None]:
input_path = f"s3://{bucket}/{prefix}/input/{audio_path}"
!aws s3 cp {audio_path} {input_path}

## Async Inference

In [None]:
import boto3
from typing import List


def get_cfn_outputs(stackname: str, region_name: str) -> List:
    cfn = boto3.client('cloudformation', region_name=region_name)
    outputs = {}
    for output in cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']:
        outputs[output['OutputKey']] = output['OutputValue']
    return outputs

In [None]:
CFN_STACK_NAME = "ASRAsyncEndpointStack"
cfn_stack_outputs = get_cfn_outputs(CFN_STACK_NAME, aws_region)

endpoint_name = cfn_stack_outputs['EndpointName']

In [None]:
from sagemaker import Predictor
from sagemaker.predictor_async import AsyncPredictor
from sagemaker.serializers import DataSerializer
from sagemaker.deserializers import JSONDeserializer


audio_serializer = DataSerializer(content_type="audio/x-audio")
deserializer = JSONDeserializer()

predictor = Predictor(
    endpoint_name=endpoint_name,
    serializer=audio_serializer,
    deserializer=deserializer
)

async_predictor = AsyncPredictor(predictor=predictor)

In [None]:
%%time
from sagemaker.async_inference.waiter_config import WaiterConfig

try:
    response = async_predictor.predict_async(
        input_path=input_path
    )

    response.get_result(waiter_config=WaiterConfig(max_attempts=2, delay=15))
except Exception as ex:
    print(ex)

Model returned error: {'code': 400, 'type': 'InternalServerException', 'message': '{"error": "unsupported content type audio/x-wav"}'} 
CPU times: user 152 ms, sys: 32.3 ms, total: 184 ms
Wall time: 16.1 s


In [None]:
%%time
from sagemaker.async_inference.waiter_config import WaiterConfig


initial_args = {'ContentType': 'audio/wav'}
response = async_predictor.predict_async(
    initial_args=initial_args,
    input_path=input_path
)

response.get_result(waiter_config=WaiterConfig(max_attempts=2, delay=15))

CPU times: user 114 ms, sys: 2.23 ms, total: 117 ms
Wall time: 16.1 s


{'text': [" I wanted to share a few things, but I'm going to not share as much as I wanted to share because we are starting late. I'd like to get this thing going so we can all get home at a decent hour. This election is very important to us."]}

## Optional: Test autoscaling configurations for Async inference

In [None]:
CFN_STACK_NAME = "ASRAsyncEndpointAutoScalingStack"
cfn_stack_outputs = get_cfn_outputs(CFN_STACK_NAME, aws_region)

resource_id = cfn_stack_outputs['AppScalingResourceId']

In [None]:
%%time

# Trigger 1000 asynchronous invocations with autoscaling from 1 to 4
# then scale down to 0 on completion

initial_args = {'ContentType': 'audio/wav'}

for i in range(1, 1000):
    response = async_predictor.predict_async(
        initial_args=initial_args,
        input_path=input_path
    )

print("\nAsync invocations for Hugging Face model serving with autoscaling\n")

In [None]:
import boto3

appscaling_client = boto3.client("application-autoscaling", region_name=aws_region)
response = appscaling_client.describe_scalable_targets(
    ServiceNamespace='sagemaker',
    ResourceIds=[resource_id]
)

response

{'ScalableTargets': [{'ServiceNamespace': 'sagemaker',
   'ResourceId': 'endpoint/jumpstart-huggingface-asr-whisper-medium-5164372/variant/AllTraffic',
   'ScalableDimension': 'sagemaker:variant:DesiredInstanceCount',
   'MinCapacity': 0,
   'MaxCapacity': 3,
   'RoleARN': 'arn:aws:iam::1234567890:role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint',
   'CreationTime': datetime.datetime(2024, 6, 3, 14, 34, 45, 46000, tzinfo=tzlocal()),
   'SuspendedState': {'DynamicScalingInSuspended': False,
    'DynamicScalingOutSuspended': False,
    'ScheduledScalingSuspended': False},
   'ScalableTargetARN': 'arn:aws:application-autoscaling:us-east-1:1234567890:scalable-target/056mc6fbbea0e60a440d9525c703daffb538'}],
 'ResponseMetadata': {'RequestId': '96436107-8c4a-4aa4-85c0-d1ddd85e3e6f',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '96436107-8c4a-4aa4-85c0-d1ddd85e3e6f',
   'content-type': 'application

## References

- [(AWS Blog) Host the Whisper Model on Amazon SageMaker: exploring inference options (2024-01-16)](https://aws.amazon.com/blogs/machine-learning/host-the-whisper-model-on-amazon-sagemaker-exploring-inference-options/)
- [(Example Jupyter Notebooks) Using Huggingface DLC to Host the Whisper Model for Automatic Speech Recognition Tasks](https://github.com/aws-samples/amazon-sagemaker-host-and-inference-whisper-model/blob/main/huggingface/huggingface.ipynb)