# Async audio transcription with the Azure Speech service and Python

In this notebook, you'll learn how to read multiple audio files from an Azure Blob Container, then use the Azure Speech service to transcribe the files. 

> If you're looking to do single-shot speech to text without writing a ton of code, I recommend using the [Speech CLI](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/spx-basics?tabs=windowsinstall).

## Before you get started

You'll need:

* An [Azure subscription](https://azure.microsoft.com/en-us/free/cognitive-services/)
* An [Azure Blob storage container](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-portal#create-a-container).
  * **Note:** For this tutorial you can use mine
  * **Note:** You'll need a second container if you want to write the transcripts to storage. Or you can just write them locally. Up to you.
* An [Azure Speech service resource](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/overview#create-the-azure-resource) in the S0 pricing tier. This tutorial **won't** work with a *Free (F0)* key.
* Install the `requests` and `xmltodict` to your environment. We strongly recommend that you run all of these notebooks in a virtual environment (virtualenv, venv, pyenv, pipenv, etc.). Run this command from your terminal/command line: `pip install requests xmltodict`.

## Import modules

The first thing we need to do is import a few modules. Here's what they are and what you'll use them for:
1. `requests` - This module is used to make HTTP requests. Since Azure Blob storage and batch transcription are REST services, we'll be making a series of POST and GET requests to send and retrieve data from Azure.
2. `xmltodict` - This modules quickly converts XML into a dictionary. From here we're going to turn the dictionary into JSON, which is a bit easier to use. 
   * **Note**: This is a personal preference, you can use ElementTree if XML is your jam.
3. `json` - This module is used to encode and decode JSON. We'll use this module a fair amount in this guide.
4. `time` - When we poll for results, we need to use the `time` module to add a delay to our code.

In [None]:
import requests
import xmltodict
import json
import time

## Get audio file URLs from Azure Blob storage

This tutorial presumes that you have audio stored in an Azure Blob Container. If you have your own audio, you can replace this URL with your own, however, this container is publicly accessible and contains open source files created from Project Gutenberg and the Azure Text to Speech service.

> **Important**: This container is public. It is set up this way for the tutorial. We recommend using the appropriate security measuures required for your specific use case.

In [None]:
# Get list of containers in storage account
response = requests.get('https://speechsamples21.blob.core.windows.net/audio-files-test-21/?comp=list')

## Convert XML to JSON and iterate over the response

Here we're converting the XML returned by the Azure Blob storage REST API into a dictionary. Then we're encoding and deconding the JSON for use with the Speech service. 

> About halfway through this code block you'll get a printout of the JSONified response from Azure Blob storage. I'll also say, that if you're comfortable (or prefer) working with XML it won't offend me. Feel free to parse and iterate through the Azure Blob storage response directly ☺. 

In [None]:
# Turn the XML into a dict so we can convert it to JSON
# This is a personal preference. If you prefer working with 
# XML you can use ElementTree.

parsed_xml = xmltodict.parse(response.content)
json_data = json.dumps(parsed_xml, indent=2)
print(json_data)
clean_json = json.loads(json_data)

Before we can call the Speech service, we need a URL for each audio file that we're transcribing. Here, we're going to loop through the JSONified response, pluck out the URLs, and add them to a list as we go. 

In a bit, we'll pass this list to the Speech service in our transcription request.

In [None]:
# A list of audio URLs we'll send to the Speech service
audio_urls = []

for i in clean_json:
    for blob in clean_json[i]['Blobs']:
        for audio_file in clean_json[i]['Blobs'][blob]:
            audio_urls.append(audio_file['Url'])
            print(f"Audio added to list: {audio_file['Url']}")


## Create a transcription job

Here we're sending a batch of audio files to the Azure Speech service to be transcribed. While our request is synchronous, the service will process each audio file asynchronously. 

Keep in mind, that the service will send a response almost immediately. However, the service may take up to a few minutes to transcribe your audio files depending on how many files you've sent and their size. The response contains a URL (possibly more than one) that we can use to fetch the status of our transcription request.

Let's take a look at the request, review the response, and in the next section we'll discuss retrieving the job status and your transcriptions.

* `region` - The regions for your Speech resource. For example: "westus"
* `key` - The key for your Speech resource. 
* `displayName` - Give your transcription job a unique name. This will help you identify it if you run this more than once.



In [None]:
# Key and region for Speech resource
# You'll use these for all requests in this tutorial
region = 'PASTE_YOUR_SPEECH_RESOURCE_REGION'
key = 'PASTE_YOUR_SPEECH_KEY'

In [None]:
# Base URL
speech_base_url = f'https://{region}.api.cognitive.microsoft.com/speechtotext/v3.0/'

# Operation
operation = 'transcriptions'

# Build the request
request_headers = {
    'Ocp-Apim-Subscription-Key': key,
    'Content-Type': 'application/json'
}

request_body = {
    "contentUrls": audio_urls,
    "locale": "en-US",
    "displayName": "GIVE_YOUR_JOB_A_NAME"
}

request = requests.post(speech_base_url + operation, headers=request_headers, json=request_body)
response = request.json()

print(json.dumps(response, sort_keys=True, indent=4, ensure_ascii=False, separators=(',', ': ')))

## Get transcription status

In the next two sections we're going to show you how to get the status of your transcription job and how to retrieve the transcriptions for each audio file. 

This specific operation will get the status for all transcription jobs that you've run that haven't been deleted (active and complete). You can call this specific API to determine if your transcription files are ready to be retrieved. 

In [None]:
# This code snippet will infinitely loop if the previous call fails. 
# A transcription job must be created before running this basic sample.

# Base URL
speech_base_url = f'https://{region}.api.cognitive.microsoft.com/speechtotext/v3.0/'

# Operation
operation = 'transcriptions'

request_headers = {
    'Ocp-Apim-Subscription-Key': key,
    'Content-Type': 'application/json'
}

request = requests.get(speech_base_url + operation, headers=request_headers)
response = request.json()

# I need to clean this up to use the success param.
if request.status_code == 200:
    while not response['values']:
        print('Waiting for service. Trying again in 5 seconds.')
        time.sleep(5)
        request = requests.get(speech_base_url + operation, headers=request_headers)
        response = request.json()      
    else: 
        print(json.dumps(response, sort_keys=True, indent=4, ensure_ascii=False, separators=(',', ': ')))
else:        
    print(f'Status code: {request.status_code}')

### Get transcriptions of your audio files

To get your trancsriptions, you'll need the `['links']['files]` URL from either of these requests (which you've made previously):

* Create transcription (POST)
* Get transcriptions (GET)

In this example, we'll pull the URL from the **Get transcriptions** request.


In [None]:
transcription_url = response['values'][0]['links']['files']
print(transcription_url)

In [None]:
request_headers = {
    'Ocp-Apim-Subscription-Key': key,
    'Content-Type': 'application/json'
}

request = requests.get(transcription_url, headers=request_headers)
response = request.json()
if request.status_code == 200:
    while not response['values']:
        print('Waiting for service. Trying again in 5 seconds.')
        time.sleep(5)
        request = requests.get(transcription_url, headers=request_headers)
        response = request.json()    
    else:
        print(json.dumps(response, sort_keys=True, indent=4, ensure_ascii=False, separators=(',', ': ')))
else: 
    print(f'Status code {request.status_code}')

## View a raw transcript

In the previous section, the script printed a list of audio files that were transcribed with links to their transcripts. Here we're going to grab one of the transcripts from the response and print the output.

> If you'd like, you can replace `get_a_transcript` with a `contentUrl` from the previous section. 

In [None]:
# Get a transcript URL
# This code is checking to make sure that we pull down an 
# actual transcription and not the report

if not response['values'][0]['kind'] == 'TranscriptionReport':
    get_a_transcript = response['values'][0]['links']['contentUrl']
else: 
    get_a_transcript = response['values'][1]['links']['contentUrl']

# Get the transcript
requests.get(get_a_transcript).json()

## Delete a job

Here we're going to make a GET request to retrieve all activie and completed jobs. Then we're going to delete the first job in that list. 

> If you prefer, you can manually replace `id` with a job ID of your choice.

In [None]:
# Base URL
speech_base_url = f'https://{region}.api.cognitive.microsoft.com/speechtotext/v3.0/'

# Operation
operation = 'transcriptions'

request_headers = {
    'Ocp-Apim-Subscription-Key': key,
    'Content-Type': 'application/json'
}

request = requests.get(speech_base_url + operation, headers=request_headers)
response = request.json()

# Get the first job ID
id = response['values'][0]['self'].replace(speech_base_url + operation, '')

request = requests.delete(speech_base_url + operation + id, headers=request_headers)
print(f'Status code: {request.status_code}')
if request.status_code == 204:
    print(f'Job {id} successfully deleted.')

## Delete all jobs (optional)

> **Important**: This will only work if you have multiple jobs in the transcription service.

If you sent multiple jobs to the Speech service for transcription and you need to clear everything, you can use the following snippet to build a list of job IDs and delete them all at once.

In [None]:
job_id_list = []

# Base URL
speech_base_url = f'https://{region}.api.cognitive.microsoft.com/speechtotext/v3.0/'

# Operation
operation = 'transcriptions'

request_headers = {
    'Ocp-Apim-Subscription-Key': key,
    'Content-Type': 'application/json'
}

request = requests.get(speech_base_url + operation, headers=request_headers)
response = request.json()

for i in response['values']:
    job_id_list.append(i['self'].replace(speech_base_url + operation + '/', ''))


for job in job_id_list:
    
    job_id = f'/{job}'
    
    request = requests.delete(speech_base_url + operation + job_id , headers=request_headers)
    print(f'Status code: {request.status_code}')
    if request.status_code == 204:
        print(f'Job {id} successfully deleted.')
    

## Reference

The Speech service v3 REST APIs allow you to do more than transcribe audio. They also allow you to create and manage custom speech models. To learn more, see the [Speech v3 REST API specification](https://westus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0/operations/GetTranscriptions).

## Sample code

Sample code for batch transcriiption is available in many programming languages on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch).

* [Node.js](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch/js/node)
* [Python](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch/python)
* [C#](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch/csharp)
* [Batch ingestion client](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch/batch-ingestion-client)

## Other tools

* [Speech CLI](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/spx-basics?tabs=windowsinstall) - Convert speech to text, text to speech, or run translation tasks from the command line.