# Azure Cognitive Services Speech API

## Batch Transcription

This Notebook uses the REST API for Batch transcription of audio files from an Azure Blob Storage Container.

## Prerequisites

- Data must be uploaded into an Azure Blob Storage Container
- Audio files must be in either WAV or MP3 format (with PCM Codec) or OGG (OPUS Codec), with a bitrate of 16-bit and a sample rate of either 8 or 16 kHz, in either mono or stero.
- A [Cognitive Services Speech Subscription](https://ms.portal.azure.com/#blade/Microsoft_Azure_Marketplace/GalleryFeaturedMenuItemBlade/selectedMenuItemId/home/searchQuery/speech/resetMenuId/) will be required, using a **standard** tier subscription

Further details are available from the documentation here - https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription#prerequisites

A Swagger is available for API testing and documentation here (see "Custom Speech Transcriptions" section) - https://uksouth.cris.ai/docs/v2.0/swagger

## Process

In [None]:
# Import modules

import requests
import json
import pandas as pd

# Replace <Region> to match the region in which you've provisioned your Speech service.
region = "uksouth"

request_url = "https://" + region + ".cris.ai/api/speechtotext/v2.0/transcriptions"

# Replace <Subscription Key> with your valid subscription key.
subscription_key = "<Subscription Key>"

# Replace <blob URL> with the link to your Azure Blob Storage container
blob_url = "<blob URL>"

### Access Token

An access token is needed to interact with the Speech API.

In [None]:
def get_token(subscription_key):
    
    fetch_token_url = 'https://' + region + '.api.cognitive.microsoft.com/sts/v1.0/issueToken?scope=speechservicesmanagement'
    headers = {
        'Ocp-Apim-Subscription-Key': subscription_key
    }
    response = requests.post(fetch_token_url, headers=headers)
    access_token = str(response.text)
    # print(access_token)

### Job Parameters
Give your transcription job a name and description, and provide details of any custom acoustic or language models you wish to use.

In [None]:
parameters = {
    "recordingsUrl": blob_url,
    "models": [],
    "locale": "en-GB",
    "name": "Sample Transcription",
    "description": "Batch Audio Transcription Test",
    "properties": {
        "ProfanityFilterMode": "Masked",
        "PunctuationMode": "DictatedAndAutomatic",
        "AddWordLevelTimestamps": "True"
    }
}

### Create a Transcription Job

In [None]:
req = requests.post(request_url, headers = {'Content-Type': 'application/json', 'Ocp-Apim-Subscription-Key': subscription_key, 'Authorization': get_token(subscription_key)}, data = json.dumps(parameters))

if req.status_code == 202:
    print("Job successfully submitted")
else:
    print("Job Failed - " + req.status_code + ". Refer to Swagger for error code documentation")

### Get List of Transcription Jobs and Job Status

Once submitted, you will need to check the progress of the transcription.

In [None]:
req = requests.get(request_url, headers = {'Ocp-Apim-Subscription-Key': subscription_key, 'Authorization': get_token(subscription_key)})

response = req.json()

i = 0

if len(response) == 0:
    print("No jobs submitted")
else:
    while i < len(response):
        r = "ID: " + json.dumps(response[i]["id"]) \
        + "\nStatus: " + json.dumps(response[i]["status"]) \
        + "\nURI: " + json.dumps(response[i]["recordingsUrl"]) \
        + "\nStatus Message: " + json.dumps(response[i]["statusMessage"]) \
        + "\nTranscription URL: " + json.dumps(response[i]["resultsUrls"]) + '\n\n'
        print(r)
        i += 1

## Transcription Data

This script downloads the JSON response for a completed transcription, and will load each output (if stereo) into a pandas dataframe showing each word, the offset from the start of the audio file, the duration, and the speaker.

In [None]:
# Replace <Transcription_ID> below with the ID of the relevant job
t_id = "/3f42c932-9b3a-4371-bedb-610327dfa7a3"

req = requests.get(request_url + t_id, headers = {'Ocp-Apim-Subscription-Key': subscription_key, 'Authorization': get_token(subscription_key)})


#transcription results 
if len(response[0]["resultsUrls"]) >1:
        
    # Channel 1 audio transcription
    tr_0 = json.dumps(response[0]["resultsUrls"]["channel_0"]).strip('""')
    c_0 = requests.get(tr_0)
    channel_0 = json.dumps(c_0.json(), indent=4)
    
    # Channel 2 audio transcription
    tr_1 = json.dumps(response[0]["resultsUrls"]["channel_1"]).strip('""')
    c_1 = requests.get(tr_1)
    channel_1 = json.dumps(c_1.json(), indent=4)
    
    # Channel_0 output
    output_0 = json.loads(channel_0)
    
    segments = len(output_0['AudioFileResults'][0]['SegmentResults'])
    
    i = 0
    while i < segments:
        if i < 1:
            df0 = pd.io.json.json_normalize(output_0['AudioFileResults'][0]['SegmentResults'][i]['NBest'][0]['Words'])
            
            cols = ['Word','Offset','Duration']
            df0 = df0[cols]
        else:
            df0 = df0.append(pd.io.json.json_normalize(output_0['AudioFileResults'][0]['SegmentResults'][i]['NBest'][0]['Words']),ignore_index=True)
            
        i += 1
    
    df0['Speaker'] = 1
    
    # Channel_1 output
    output_1 = json.loads(channel_1)

    segments_1 = len(output_1['AudioFileResults'][0]['SegmentResults'])
    
    i = 0
    while i < segments:
        if i < 1:
            df1 = pd.io.json.json_normalize(output_0['AudioFileResults'][0]['SegmentResults'][i]['NBest'][0]['Words'])
            
            cols = ['Word','Offset','Duration']
            df1 = df1[cols]
        else:
            df1 = df1.append(pd.io.json.json_normalize(output_0['AudioFileResults'][0]['SegmentResults'][i]['NBest'][0]['Words']),ignore_index=True)
            
        i += 1
    
    df1['Speaker'] = 2
    
    
    df = df0.append(df1,ignore_index=True)
    
else:
    
    # Channel 1 audio transcription
    tr_0 = json.dumps(response[0]["resultsUrls"]["channel_0"]).strip('""')
    c_0 = requests.get(tr_0) 
    
    # Channel_0 output
    output_0 = json.loads(channel_0)
    
    segments = len(output_0['AudioFileResults'][0]['SegmentResults'])
    
    print(segments)
    i = 0
    
    while i < segments:
        if i < 1:
            df = pd.io.json.json_normalize(output_0['AudioFileResults'][0]['SegmentResults'][i]['NBest'][0]['Words'])
            
            cols = ['Word','Offset','Duration']
            df = df[cols]
        else:
            df = df.append(pd.io.json.json_normalize(output_0['AudioFileResults'][0]['SegmentResults'][i]['NBest'][0]['Words']),ignore_index=True)

            
        i += 1
    df['Speaker'] = 1


In [None]:
df['EndTime'] = df['Offset'] + df['Duration']

cols = ['Word','Offset','Duration','EndTime', 'Speaker']
df = df[cols]


df.sort_values(by=['Offset'],inplace=True)
df.reset_index()

print(df)