# Deploy pyannoteAI Diarization Model Package from AWS Marketplace 

The Speaker Diarization API enables accurate segmentation of audio recordings by detecting and labeling individual speakers across time. Designed for seamless integration into transcription pipelines, media workflows, and audio analytics systems, it supports a wide range of formats including WAV, MP3, FLAC, and OGG. The service is language-agnostic and works across diverse audio sourcecalls, meetings, interviews, podcasts, and more. With built-in support for mono and stereo channels, varying sample rates, and flexible input options it can be deployed in batch or near-real-time use cases. Key features include automatic speaker count estimation, precise time-stamped speaker labeling, and detection of overlapping speech. Outputs are returned in structured JSON for easy integration with transcription engines, search indexes, or business intelligence tools. Whether you are enriching speech-to-text transcripts, analyzing call center performance, or processing long-form media, this API improves clarity, organization, and data usability.

This sample notebook shows you how to deploy the **pyannoteAI Diarization Model** using Amazon SageMaker.

> **Note**: This reference notebook cannot run unless you make the suggested changes in the notebook.

## Pre-requisites:
1. **Note**: This notebook contains elements that render correctly in the Jupyter interface. Open it from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that the IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions, and you have the authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. your AWS account has a **pyannoteAI Diarization Model** subscription. If so, skip the step: [Subscribe to the model package](#1.-Subscribe-to-the-model-package)

## Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
    1. [Create an endpoint](#A.-Create-an-endpoint)
    2. [Perform real-time inference](#B.-Perform-real-time-inference)
    3. [Delete endpoint and model](#C.-Delete-endpoint-and-model)
3. [Troubleshooting](#3.-Troubleshooting)
4. [Questions](#4.-Questions)
    
We recommend using **ml.g4dn.xlarge** instance for real-time.

## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page **pyannoteAI Diarization Model**.
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agree with EULA, pricing, and support terms. 
1. Once you click on the **Continue to configuration** button and choose a **region**, you will see a **Product ARN** displayed. This is the model package ARN that you need to specify while creating a deployable model using `boto3`. Copy the ARN corresponding to your region and specify it in the following cell.

In [None]:
model_package_arn = "arn:aws:sagemaker:{zone}:{account_id}:model-package/xxx"

In [None]:
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
import boto3

In [None]:
role = get_execution_role()

runtime = boto3.client("runtime.sagemaker")

sagemaker_session = sage.Session()

## 2. Create an endpoint and perform real-time inference

See [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html) if you want to understand how real-time inference with Amazon SageMaker works.

NOTE: it's not streaming

### A. Create an endpoint

In [None]:
model_name = "pyannoteai-diarization" # Write the endpoint name

In [None]:
# Specify instance type
real_time_inference_instance_type = "ml.g4dn.xlarge"

>  **Note**: We recommend using ml.g4dn.xlarge instance for real-time inference.

In [None]:
# Create a deployable model from the model package.
model = ModelPackage(role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session)

# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

# Wait until it prints "!" after "----------"

Once the endpoint has been created, you can perform real-time inference.

If you get an error here, please see the [Troubleshooting](#6.-Troubleshooting).

**WARNING!** 

**Remember to** [**Delete your endpoint and resources**](#D.-Delete-endpoint-and-model) whenever you finish your work with real-time inference to stop incurring your charges!

For more information, please visit this [page](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-delete-resources.html).

### B. Perform real-time inference

In [None]:
import json
import base64

def invoke_endpoint(body):
    results = runtime.invoke_endpoint(
        EndpointName=model_name,
        Body=json.dumps(body), 
        ContentType="application/json",
    )

    res = results['Body'].read().decode('utf-8')

    return json.loads(res)

### B.1 Run diarization

In [None]:
# load audio files into memory
with open("../example_files/marklex1min.wav", "rb") as f:
    marklex_audio_wav = f.read()

diarization_result = invoke_endpoint({
    "audio": base64.b64encode(marklex_audio_wav).decode('utf-8'),
    "num_speakers": 2, # Optional, specify number of speakers if known beforehand
})

print(diarization_result["diarization"])

In [None]:
with open("../example_files/marklex1min.mp3", "rb") as f:
    marklex_audio_mp3 = f.read()
    
diarization_result = invoke_endpoint({
    "audio": base64.b64encode(marklex_audio_mp3).decode('utf-8'),
    # automatically detect number of speakers
})

print(diarization_result["diarization"])

### C. Delete endpoint and model

Now that you have successfully performed a real-time inference, you no longer need the endpoint. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)
model.delete_model()

**WARNING!** 

**Remember to** [**Delete your endpoint and resources**](#D.-Delete-endpoint-and-model) whenever you finish your work with real-time inference to stop incurring your charges!

For more information, please visit this [page](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-delete-resources.html).

## 3. Troubleshooting

### Cannot create already existing endpoint configuration

This error occurs when the user interrupts the inference deployment and tries to rerun it. To restart the deployment, first delete the previously created configurations. You can find this command in the [Delete endpoint and model](#D.-Delete-endpoint-and-model) cell.

Please wait for the deployment to complete. This process may take several minutes.

### ResourceLimitExceeded

If you receive an error due to the lack of a quota for your instance type, you can increase it by sending a request:
1. Open the **Amazone SageMaker** [**Service Quotas**](https://console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas) page.
2. Check that you are in the correct AWS region where you want to increase the quota.
3. Filter **Service quotas** by "ml.g4dn.xlarge for endpoint usage" for real-time inference.
4. Select and click on the **Request increase at account-level** button.
5. Enter the total amount you want the quota to be and click the **Request** button.
6. Wait until AWS Support increases your quotas for this instance type.

> **Note**: To speed up the processing of your request, please indicate in your correspondence with AWS Support that this type of instance is required for this product.

For more information about requesting a quota increase, visit this [page](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html).

## 4. Questions

If you have any questions about our product, feel free to email us at support@pyannote.ai.