# **pyannoteAI** STT Orchestration

> Enhance your own transcription with the most accurate speaker diarization

In [None]:
# visualize demo file
import demo
demo.STTOrchestration(demo.AUDIO, demo.GOLD_DIARIZATION)

# press SPACE to play/pause

## Setting up `pyannoteAI` Python SDK

* Create an account on [dashboard.pyannote.ai](https://dashboard.pyannote.ai)
* Create a pyannoteAI API key (stored in `PYANNOTEAI_API_KEY` environment variable below)

In [None]:
import os
PYANNOTEAI_API_KEY = os.environ["PYANNOTEAI_API_KEY"]

from pyannoteai.sdk import Client
client = Client(PYANNOTEAI_API_KEY)

## Getting an audio URL

In this demo, the audio file is available locally and has to be uploaded to `pyannoteAI` cloud servers to get an `audio_url` back.  
However, when using our API in production, we recommend you use your own set of [signed urls](https://docs.pyannote.ai/tutorials/use-s3-private-files).

In [None]:
audio_url = client.upload(demo.AUDIO)

## Submitting a diarization job

In [None]:
diarization_job = client.diarize(audio_url)
print(diarization_job)

## Retrieving the output the diarization job

In this demo, the `client` is polling `pyannoteAI` cloud servers periodically until the job has completed.  
However, when using our API in production, we recommend you setup your own [webhook url](https://docs.pyannote.ai/webhooks/receiving-webhooks) to get the output as soon as it is available.

In [None]:
diarization = client.retrieve(diarization_job)
diarization['output'].keys()

## Visualizing diarization output

In [None]:
# printing diarization
for turn in diarization['output']['diarization']:
    print(f"{turn['speaker']} [{turn['start']:6.3f}s - {turn['end']:6.3f}s]")

In [None]:
demo.STTOrchestration(demo.AUDIO, diarization['output']['diarization'])

## Submitting a STT orchestration job

To benefit from STT orchestration, simply add `transcription=True` to the previous call.  
It will automatically orchestrate `pyannoteAI` diarization with `parakeet` STT.  
Support for additional STTs will be released progressively in 2026.

In [None]:
orchestration_job = client.diarize(audio_url, transcription=True)
print(orchestration_job)

## Retrieving the output of the STT orchestration job

In [None]:
orchestration = client.retrieve(orchestration_job)

In [None]:
widget = demo.STTOrchestration(audio=demo.AUDIO, diarization=orchestration['output']['diarization'])
widget

In [None]:
orchestration['output'].keys()

Two new keys have been added to the job output!
* `wordLevelTranscription` provide timestamps for each word;
* `turnLevelTranscription` is aligned with the speaker turn returned by our diarization.

## Visualizing STT orchestration output

In [None]:
# printing turn-level transcription
for turn in orchestration['output']['turnLevelTranscription']:
    print(f"{turn['speaker']} [{turn['start']:6.3f}s - {turn['end']:6.3f}s] {turn['text']}") 

In [None]:
# printing first 10 word-level transcription
for word in orchestration['output']['wordLevelTranscription'][:10]:
    print(f"{word['speaker']} [{word['start']:5.3f}s - {word['end']:5.3f}s] {word['text']}") 

In [None]:
widget

In [None]:
widget.transcript = orchestration['output']['wordLevelTranscription']