# Step 2.4: Getting Transcriptions from Automatic Speech Recognition (ASR) Services

This code will take the parsed audio files produced in Step 2.1, run them through five ASR services, return the transcriptions, and write them to a pandas dataframe that will be exported to a CSV. Setting up the pipeline for each of these is a bit of a detailed and complex process, so be sure to read and follow the instructions provided in each section carefully.

The five ASR services are:

1. [Amazon Transcribe](https://aws.amazon.com/transcribe/)
2. [DeepSpeech](https://deepspeech.readthedocs.io/en/r0.9/)
3. [Google Cloud Speech-to-Text](https://cloud.google.com/speech-to-text)
4. [IBM Watson Speech-to-Text](https://www.ibm.com/cloud/watson-speech-to-text)
5. [Microsoft Azure Cognitive Services Speech-to-Text](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/)

## Required Packages

The following packages are necessary to run this code:
os, time, urllib, json, wave, [pandas](https://pypi.org/project/pandas/), [numpy](https://pypi.org/project/numpy/), [ibm_watson](https://pypi.org/project/ibm-watson/), [ibm_cloud_sdk_core](https://pypi.org/project/ibm-cloud-sdk-core/), [google-cloud-speech](https://pypi.org/project/google-cloud-speech/), [azure-cognitiveservices-speech](https://pypi.org/project/azure-cognitiveservices-speech/), [boto3](https://pypi.org/project/boto3/), [deepspeech](https://pypi.org/project/deepspeech/) 



## Initial Set-Up for All Services

In [None]:
#import the necessary packages
import pandas as pd
import numpy as np
import os

In [None]:
# Designate the file path where the audio is. AUDIO SHOULD BE SINGLE CHANNEL        
aint_audio_file_path = "path"

be_audio_file_path = "path"

done_audio_file_path = "path"

In [None]:
#filepath for the csv produced in Step 2.3
aint_file_path = "path"

be_file_path = "path"

done_file_path = "path"

#reads in the gold standard dataframe    
aint_gs_df = pd.read_csv(aint_file_path)

be_gs_df = pd.read_csv(be_file_path)

done_gs_df = pd.read_csv(done_file_path)

# 1. Amazon Transcribe

These instructions assume you have set up an Amazon Transcribe account. Use this [link](https://aws.amazon.com/getting-started/hands-on/create-audio-transcript-transcribe/) for help with start-up.

## References:

This code is adapted directly from the work of two developers:

1. [Viet Hoang Tran Duong](https://github.com/viethoangtranduong) (code [here](https://github.com/viethoangtranduong/AWS-Transcribe-Tutorial/blob/master/AWS_Transcribe_Tutorial.ipynb))
2. [Rekhu Gopal](https://github.com/RekhuGopal) (code [here](https://github.com/RekhuGopal/PythonHacks/blob/main/AWSBoto3Hacks/AWSboto3SpeechToText-AWSTranscribe.py))

In [None]:
# Import the required packages
import time
import boto3
import urllib
import json

## 1.0 Uploading Data to the Amazon Web Services Console

Before running transcription services, your data must be uploaded to your Amazon S3 Web Services Console. To do so, either (1) follow these steps to use the Amazon Console through a browser:

1. Go to https://s3.console.aws.amazon.com/s3.
    - Click 'Create Bucket'.
    - Name the bucket.
    - Set the region (I left it on the default).
    - Make sure 'Block all public access' is checked.
    - Choose whether you would like bucket versioning disabled or enabled.
    - Add tags if you like.
    - Choose whether to enable or disable encryption.
    - Click 'Create Bucket'.
2. Find your newly created bucket in the list and click the bucket name.
3. Upload audio files here.

Or (2) run the following code to upload directly from your local machine. If you use this method, you'll have to run it for each folder of data to upload:

In [None]:
s3 = boto3.client('s3', 
                  aws_access_key_id = "key_id",
                  aws_secret_access_key = "access_key",
                          
                  #the region of the data center
                  region_name = "region")

In [None]:
# specify the file path of the files you want to upload
filepath = "path"

# creates a list of files to be uploaded
amazon_files = os.listdir(filepath)

## if you would like a progress flag, uncomment this line along with the last two lines in the for loop
#iteration_number = 1

# loops through the files to be uploaded and uploads them
for amazon_file in amazon_files:
    
    # the arguments here are:
    #  (1) the file path of the file to be uploaded,
    #  (2) the Amazon s3 Cloud storage bucket name,
    #  (3) the file name of the file to be uploaded (which is called the Key in the boto3 code
    
    s3.upload_file(f"{path}{amazon_file}", 'coraal-aint-variations-2021-amazon', f"{amazon_file}")
    
#     print(f"{iteration_number}/{len(amazon_files)} completed. {amazon_file}")
          
#     iteration_number+=1

One more thing to consider is that when an Amazon Transcription job already has the same name as a file you want to process, it will cause the code to throw an error. It may be a good idea to copy the audio snippet files and append a *_aint*, *_be*, or *_done* to the end and upload that. That will help to solve the issue in case there is overlap between the features.

## 1.1 Set up the Connection to Amazon Transcribe

To set up the initial connection, you will need two keys: (1) Access Key ID, and (2) Secret Access Key. To do so, follow these steps:

1. Go to https://console.aws.amazon.com/ and log into your account.
2. In the top right corner, click on your Amazon profile name.
3. Click 'My Security Credentials'.
4. Click 'Access keys'.
5. If you already have access keys, take them from here (make sure they are active).
    - If not, click 'Create New Access Key'. You can then download a CSV with the keys or copy them directly from the console.

In [None]:
#sets up the connection to the Amazon Transcribe (Amazon Web Services)
transcribe = boto3.client('transcribe',
                          aws_access_key_id = "key_id",
                          aws_secret_access_key = "access_key",
                          
                          #the region of the data center
                          region_name = "region")

## 1.2 Defining the Job Checker Function

Amazon Transcribe works on jobs. If you try to create a job with the same name as another, it will need to override the previous one. This code will check if the job exists and, if so, override it so the ASR service can be run.

This function takes the following arguments:
1. Job name

In [None]:
def check_job_name(job_name):
    
    """
    when Amazon Transcribe transcribes audio, it creates a job and each job must have
    a unique name. This function checks if the job name has already been used and if so
    asks if the user would like to override that job
    """
    
    #lists the jobs currently in the console
    # you can also delete jobs manually by going to the transcription jobs
    #  in the console: https://us-east-2.console.aws.amazon.com/transcribe/
    existing_jobs = transcribe.list_transcription_jobs()
    
    #loops through the existing job names
    for job in existing_jobs['TranscriptionJobSummaries']:
        
        #if the name of the job name given already exists, this will delete the job
        #  and create a new one with that name
        if job_name == job['TranscriptionJobName']:
            
            #delete the job
            transcribe.delete_transcription_job(TranscriptionJobName=job_name)
        
        #if a job with that name doesn't exist, continues past 
        else:
            
            continue
    
    #returns the job name
    # this will be the one provided to the function whether the job already
    #  existed in the Amazon Transcribe console or not
    return job_name

## 1.3 Defining the Transcriber Function

This code will define the function that runs the transcriber.

This function takes the following arguments:
1. The path to the S3 Bucket you created in 1.0 above. To find this path:
    - Go to https://s3.console.aws.amazon.com.
    - Click Buckets.
    - Find the bucket where your data is (or create a bucket and store the data).
    - Copy the s3 path (you'll probably have to go to an individual file, copy the S3 path, paste it, and chop off the audio file name).
2. The audio file name (taken from the dataframe)

In [None]:
#this creates the function that will run the Amazon Transcription service
# this is tuned to identify multiple speakers for optimal transciption
def amazon_transcribe(s3_bucket_path, audio_file_name):
    
    """
    takes the variables:
    (1) audio_file_name which is the audio filename
    (2) s3_bucket_path which is the path to the audio files stored
        in Amazon's s3 cloud storage. Go to https://s3.console.aws.amazon.com >>
        Buckets >> Find the bucket where your data is (or create a bucket and store the data) >>
        Copy the s3 path (you'll probably have to go to an individual file and copy the s3
        path that way, paste it here and chop off the audio file name
    """
    
    
    #combines the s3 bucket path with the audio file name to create the job URI
    job_uri = s3_bucket_path + audio_file_name
    
    #creates a variable with the audio filename to use as the job name
    job_name = (audio_file_name.split('.')[0]).replace(" ", "")
  
    # check if name is taken or not
    job_name = check_job_name(job_name)
  
    # file format of the audio file (Should be .wav for best results)
    file_format = audio_file_name.split('.')[1]

    #transcribe the audio
    transcribe.start_transcription_job(
      TranscriptionJobName=job_name,
      Media={'MediaFileUri': job_uri},
      MediaFormat = file_format,
      LanguageCode='en-US')

    
    #parses the results from the speech to text
    while True:
        result = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        
        #checks the status of the job
        if result['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            
            #if the status is either COMPLETED or FAILED, breaks the loop
            break
        
        # waits 15 seconds before looping again (if the loop hasn't been broken)
        time.sleep(15)
    
    
    #if the result is completed rather than failed, accesses the information
    if result['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
        
        #the transcription is actually stored online, so to get it directly in the code
        #  it is necessary to use urllib to access the json file which stores it
        response = urllib.request.urlopen(result['TranscriptionJob']['Transcript']['TranscriptFileUri'])
        
        #reads and stores the complete json file data
        data = json.loads(response.read())
        
        #gets only the transcript text
        text = data['results']['transcripts'][0]['transcript']
    
    
    #if the result is failed rather than completed, returns an empty string
    elif result['TranscriptionJob']['TranscriptionJobStatus'] == 'FAILED':
        
        #reads and stores the complete json file data
        #only use this if you want to return the data anyways
        #data = json.loads(response.read())
        
        #creates an empty string
        text = ""
        
        #returns an empty string
        return text
    
        #if you want to return the data, uncomment the data statement above and use this return statement
        #return data, text
    
    #returns the transcription text
    return text

    #if you want both the whole json and the transcription text, use this return statement
    # you'll just need to adjust the code below to return the data variable to get the json
    # the json file can be written out to a json file or the text of it can be appended to the dataframe
#     return data, text

## 1.4 Executing the Code

In [None]:
# the path to the Amazon s3 cloud storage which is the path to the audio files stored
#     in Amazon's s3 cloud storage. Go to https://s3.console.aws.amazon.com >>
#     Buckets >> Find the bucket where your data is (or create a bucket and store the data) >>
#     Copy the s3 path (you'll probably have to go to an individual file and copy the s3
#     path that way, paste it here and chop off the audio file name

aint_s3_bucket_path = "path"

be_s3_bucket_path = "path"

done_s3_bucket_path = "path"

### Feature: Ain't

Before you run the Amazon code for *ain't* variations, you'll want to reorder your dataframe to put duplicate rows on the bottom. If the code runs into a duplicate File and Line, it won't delete the previous job already processed and will throw an error and stop the code. The following cell will do that for you before you run the transcriber.

In [None]:
# creates a dataframe of only duplicated lines
duplicates_df = aint_gs_df[aint_gs_df.duplicated(['File', 'Line'])]

# creates a dataframe with no duplicated lines
no_duplicates_df = aint_gs_df[~aint_gs_df.duplicated(['File', 'Line'])]

# concatenates the two dataframes with the duplicates last
aint_gs_df = pd.concat([no_duplicates_df, duplicates_df])

In [None]:
#creates a column for the ibm transcripts
aint_gs_df['amazon_transcription'] = np.nan

## enable this if you'd like to print a message that will show the progress
#iteration_number = 1

#cycles through the dataframe rows
for file_row in aint_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    ## if you uploaded different file names, use this:
    # audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}_aint.wav"

    #run the code and get transcription
    transcription = amazon_transcribe(aint_s3_bucket_path, audio_filename)
    
    #writes the transcription to the dataframe
    aint_gs_df.loc[file_row.Index, "amazon_transcription"] = transcription
    
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(aint_gs_df)} completed.")
        
    #iteration_number += 1

In [None]:
# sorts the data frame back to the correct order for the rest of the transcribers
aint_gs_df = aint_gs_df.sort_values(by=['File', 'Line'])

## Feature: Be

In [None]:
#creates a column for the ibm transcripts
# be_gs_df['amazon_transcription'] = np.nan

## enable this if you'd like to print a message that will show the progress
#iteration_number = 1

#cycles through the dataframe rows
for file_row in be_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    ## if you uploaded different file names, use this:
    # audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}_be.wav"
    
    
    #run the code and get transcription
    transcription = amazon_transcribe(be_s3_bucket_path, audio_filename)
    
    #writes the transcription to the dataframe
    be_gs_df.loc[file_row.Index, "amazon_transcription"] = transcription
    
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(be_gs_df)-755} completed.")
        
    #iteration_number += 1

## Feature: Done

In [None]:
#creates a column for the ibm transcripts
done_gs_df['amazon_transcription'] = np.nan

# enable this if you'd like to print a message that will show the progress
iteration_number = 1

#cycles through the dataframe rows
for file_row in done_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    ## if you uploaded different file names, use this:
    # audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}_done.wav"
    
    
    #run the code and get transcription
    transcription = amazon_transcribe(done_s3_bucket_path, audio_filename)
    
    #writes the transcription to the dataframe
    done_gs_df.loc[file_row.Index, "amazon_transcription"] = transcription
    
    #enable this if you'd like to print a message that will show the progress
    print(f"{iteration_number} / {len(done_gs_df)} completed.")
        
    iteration_number += 1

# 2. DeepSpeech

## References:

This code is adapted directly from the code presented in the following video:

1. https://www.youtube.com/watch?v=iWha--55Lz0

Helpful links:

- https://scgupta.medium.com/how-to-build-python-transcriber-using-mozilla-deepspeech-5485b8d234cf

In [None]:
# Import the required packages
from deepspeech import Model
import wave

## 2.1 Initial Setup

In [None]:
#sets the model path
model_file_path = "deepspeech-0.9.3-models.pbmm"

#sets the scorer path
lm_file_path = "deepspeech-0.9.3-models.scorer"

#creates an instance of the model
model = Model(model_file_path)

#enables the scorer
model.enableExternalScorer(lm_file_path)

## 2.2 Defining the Wave File Reader Function

This code will define a function to read the wave file. This function is defined here in order to be used within the next function.

**NOTE**: Files for DeepSpeech must have a sampling rate of 16khz. All files for CORAAL are 44.1khz and need to be resampled to 16khz. Please refer to Step 2.1 for instructions on how to do that.

This function takes the following arguments:
1. The filename of the wave file

In [None]:
def read_wave_file(filename):
    
    #open the file
    with wave.open(filename, 'rb') as w:
        
        #get frame rate
        rate = w.getframerate()
        
        #get number of frames
        frames = w.getnframes()
        
        #get buffer
        buffer = w.readframes(frames)
        
        #return buffer and rate
        return buffer, rate

## 2.3 Defining the Transcriber Function

This code will define a function that runs the transcriber.

This function will return two variables:
1. The transcript from the speech to text
2. The confidence value
    - NOTE: The confidence value here is not measured in the way that the other services do. The other services use a probability measure of 0 to 1. Deepspeech's confidence value can be huge numbers beyond 0 and 1. The confidence value here is more meant to compare two alternatives within Deepspeech itself and may have no comparable value outside of DeepSpeech. I will record it for information sake, but be cautious in using it to measure anything. Here's the official explanation for that: https://github.com/mozilla/DeepSpeech/issues/2053

This function takes the following arguments:
1. The audio filepath defined above
2. The filename of the wave file

In [None]:
def deepspeech_transcribe(audio_filepath, audio_filename):
    
    buffer, rate = read_wave_file(f"{audio_filepath}{audio_filename}")
    
    data16 = np.frombuffer(buffer, dtype=np.int16)
    
    return model.stt(data16), model.sttWithMetadata(data16).transcripts[0].confidence

## 2.4 Executing the Code

### Feature: Ain't

In [None]:
#creates a column for the transcripts
aint_gs_df['deepspeech_transcription'] = np.nan

#creates a column for the confidence level
aint_gs_df['deepspeech_ConfidenceLevel'] = np.nan

#enable this if you'd like to print a message that will the progress
# iteration_number = 1

#cycles through the dataframe rows
for file_row in aint_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #performs the speech to text and returns the transcription and the confidence value
    transcription, confidence_value = deepspeech_transcribe(aint_audio_file_path, filename)

    #writes the text to the dataframe
    aint_gs_df.loc[file_row.Index, "deepspeech_transcription"] = transcription

    #writes the confidence level to the dataframe
    aint_gs_df.loc[file_row.Index, "deepspeech_ConfidenceLevel"] = confidence_value
    
#     #enable this if you'd like to print a message that will show the progress
#     print(f"{iteration_number} / {len(aint_gs_df)} completed.")
        
#     iteration_number += 1

### Feature: Be

In [None]:
#creates a column for the transcripts
be_gs_df['deepspeech_transcription'] = np.nan

#creates a column for the confidence level
be_gs_df['deepspeech_ConfidenceLevel'] = np.nan

## enable this if you'd like to print a message that will show the progress
#iteration_number = 1

#cycles through the dataframe rows
for file_row in be_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #performs the speech to text and returns the transcription and the confidence value
    transcription, confidence_value = deepspeech_transcribe(be_audio_file_path, filename)

    #writes the text to the dataframe
    be_gs_df.loc[file_row.Index, "deepspeech_transcription"] = transcription

    #writes the confidence level to the dataframe
    be_gs_df.loc[file_row.Index, "deepspeech_ConfidenceLevel"] = confidence_value
    
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(be_gs_df)} completed.")
        
    #iteration_number += 1

### Feature: Done

In [None]:
#creates a column for the transcripts
done_gs_df['deepspeech_transcription'] = np.nan

#creates a column for the confidence level
done_gs_df['deepspeech_ConfidenceLevel'] = np.nan

# # enable this if you'd like to print a message that will show the progress
# iteration_number = 1

#cycles through the dataframe rows
for file_row in done_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #performs the speech to text and returns the transcription and the confidence value
    transcription, confidence_value = deepspeech_transcribe(done_audio_file_path, filename)

    #writes the text to the dataframe
    done_gs_df.loc[file_row.Index, "deepspeech_transcription"] = transcription

    #writes the confidence level to the dataframe
    done_gs_df.loc[file_row.Index, "deepspeech_ConfidenceLevel"] = confidence_value
    
#     #enable this if you'd like to print a message that will show the progress
#     print(f"{iteration_number} / {len(done_gs_df)} completed.")
        
#     iteration_number += 1

# 3. Google Cloud Speech-to-Text

This code assumes you have set-up a Google Cloud account. Follow the steps listed [here](https://cloud.google.com/speech-to-text/docs/before-you-begin) for start-up help.

**Note:** If you run into a permissions error, it probably says this: "PermissionDenied: 403 _____@_______.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object." Copy the email address in this error message, and then follow the directions here: https://cloud.google.com/storage/docs/access-control/using-iam-permissions. 

# 3.0 Uploading Data to Google Cloud Storage

Before running transcription services, your data must be uploaded to your Google Cloud Console. To do so, follow these steps:

1. Go to https://console.cloud.google.com/storage/browser.
    - Click 'Create Bucket'.
    - Name the bucket.
    - Set the region (I selected single region and left it on the default).
    - Choose a default storage class for your data.
    - Ensure "Enforce public access prevention on this bucket" is clicked.
    - Choose the encryption you want (I left it on default).
    - Click 'Create'.
2. Upload audio files in the bucket which should pop up once you click create.

# 3.1 Initial Set-up

To connect to the Google Cloud Speech-to-Text service, you must have a credential path. The credential is a json file which you download to your computer and then insert the file path to that json in the code below. To do this, follow the steps here: https://cloud.google.com/speech-to-text/docs/before-you-begin (especially [this section](https://cloud.google.com/speech-to-text/docs/before-you-begin#creating_a_json_key_for_your_service_account)).

# 3.2 Defining the Transcriber Function

This code will define a function that runs the transcriber. It will return the transcription and write to the dataframe.

This function takes two arguments:
1. The file path to the audio file in Google Cloud storage. This is the directory of your audio files' bucket folder.
2. The audio filename (which will be taken from the dataframe).

**IMPORTANT NOTE:** You must change the credential_path variable in the function code here in order for the code to work.

In [None]:
def transcribe_gcs(gcs_uri_path, gcs_uri_filename):
    
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    
    import os
    
    from google.cloud import speech
    
    #this is taken from the Google Cloud Project with the correct API
    #it will have to change depending on the user
    credential_path = "path"
    
    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path
    
    client = speech.SpeechClient()

    #I ran into a lot of issues here. If you use help(speech.RecongitionConfig) it helps
    audio = speech.RecognitionAudio(uri=gcs_uri_path+gcs_uri_filename)
    
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        language_code="en-US",
        enable_automatic_punctuation=True,
    )

    operation = client.long_running_recognize(config=config, audio=audio)
    
#     enable this if you want a flag to print
#     print("Waiting for operation to complete...")
    
    response = operation.result(timeout=90)
    
    return response

## 3.3 Executing the Code

### Feature: Ain't

In [None]:
#this is the path in the google cloud storage
#google cloud storage MUST be used for files longer than a minute
#only files under a minute can be processed locally
# store files in Google Cloud Platform storage here: https://console.cloud.google.com/storage/browser
aint_gcs_uri_path = "path"


#creates a column for the ibm transcripts
aint_gs_df['google_transcription'] = np.nan

#creates a column for the ibm confidence level
aint_gs_df['google_ConfidenceLevel'] = np.nan

## enable this if you'd like to print a message that will show the progress
# iteration_number = 1

#cycles through the dataframe rows
for file_row in aint_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #performs the transcription
    file_response = transcribe_gcs(aint_gcs_uri_path, filename)
    
    #if the service isn't able to produce a transcript, writes a NaN to the dataframe
    if len(file_response.results) == 0:
        
        aint_gs_df.loc[file_row.Index, "google_transcription"] = ""

        #writes a zero level confidence to the dataframe
        aint_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = 0

    else:

        #if the result only has one transcript, writes the transcript to the dataframe
        if len(file_response.results) == 1:

            # this is coded this way because of the structure of the json
            #   res is the json object, which is essentially a python dictionary in this context
            #   res['results'] opens the value paired with the key 'results' which is
            #   a list of the results. [0] gets the first item which is another dictionary
            #   which contains a list of alternatives. since there is only one here, the [0]
            #   is used to get the first entry which is a dictionary with a 'transcript' key
            #   and the ASR transcribed speech as the value
            text = file_response.results[0].alternatives[0].transcript

            #writes the text to the dataframe
            aint_gs_df.loc[file_row.Index, "google_transcription"] = text

            #creates a variable for the confidence level
            # this could be written into the next line, but coding it this way
            #  makes the code more readable and understandable for me
            confidence_level = file_response.results[0].alternatives[0].confidence

            #writes the confidence level to the dataframe
            aint_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = confidence_level

        else:

            #if there are multiple alternatives, this code will find the alternative with the
            #  highest confidence level and take that as the transcript

            #creates an empty list to append tuples to
            tuples_list = []

            #cycles through the list of results in the res variable
            for result in file_response.results:

                # for each result, creates a tuple of the confidence level and index of each result in the lisit
                confidence_index = (result.alternatives[0].confidence, file_response.results.index(result))

                # appends the tuple to the list
                tuples_list.append(confidence_index)

            #performs the same function as the text= variable in the previous step, except here,
            #  the first [0] is replaced with [max(tuples_list)[1]]. What this does is
            #  takes the maximum confidence level in the tuples_list by taking the max
            #  number in all of the first items in each tuple and then takes the index
            #  from the max tuple by accessing its second item, and uses that index
            #  as the index for the larger results list
            text = file_response.results[max(tuples_list)[1]].alternatives[0].transcript

            #writes the text to the dataframe
            aint_gs_df.loc[file_row.Index, "google_transcription"] = text

            #creates a variable for the confidence level
            # this could be written into the next line, but coding it this way
            #  makes the code more readable and understandable for me
            confidence_level = file_response.results[max(tuples_list)[1]].alternatives[0].confidence

            #writes the confidence level to the dataframe
            aint_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = confidence_level
            
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(aint_gs_df)} completed.")
        
    #iteration_number += 1

### Feature: Be

In [None]:
#this is the path in the google cloud storage
#google cloud storage MUST be used for files longer than a minute
#only files under a minute can be processed locally
# store files in Google Cloud Platform storage here: https://console.cloud.google.com/storage/browser
be_gcs_uri_path = "path"


#creates a column for the ibm transcripts
be_gs_df['google_transcription'] = np.nan

#creates a column for the ibm confidence level
be_gs_df['google_ConfidenceLevel'] = np.nan


## enable this if you'd like to print a message that will show the progress
#iteration_number = 1


#cycles through the dataframe rows
for file_row in be_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #performs the transcription
    file_response = transcribe_gcs(be_gcs_uri_path, filename)
    
    #if the service isn't able to produce a transcript, writes a NaN to the dataframe
    if len(file_response.results) == 0:
        
        be_gs_df.loc[file_row.Index, "google_transcription"] = ""

        #writes a zero level confidence to the dataframe
        be_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = 0

    else:

        #if the result only has one transcript, writes the transcript to the dataframe
        if len(file_response.results) == 1:

            # this is coded this way because of the structure of the json
            #   res is the json object, which is essentially a python dictionary in this context
            #   res['results'] opens the value paired with the key 'results' which is
            #   a list of the results. [0] gets the first item which is another dictionary
            #   which contains a list of alternatives. since there is only one here, the [0]
            #   is used to get the first entry which is a dictionary with a 'transcript' key
            #   and the ASR transcribed speech as the value
            text = file_response.results[0].alternatives[0].transcript

            #writes the text to the dataframe
            be_gs_df.loc[file_row.Index, "google_transcription"] = text

            #creates a variable for the confidence level
            # this could be written into the next line, but coding it this way
            #  makes the code more readable and understandable for me
            confidence_level = file_response.results[0].alternatives[0].confidence

            #writes the confidence level to the dataframe
            be_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = confidence_level

        else:

            #if there are multiple alternatives, this code will find the alternative with the
            #  highest confidence level and take that as the transcript

            #creates an empty list to append tuples to
            tuples_list = []

            #cycles through the list of results in the res variable
            for result in file_response.results:

                # for each result, creates a tuple of the confidence level and index of each result in the lisit
                confidence_index = (result.alternatives[0].confidence, file_response.results.index(result))

                # appends the tuple to the list
                tuples_list.append(confidence_index)

            #performs the same function as the text= variable in the previous step, except here,
            #  the first [0] is replaced with [max(tuples_list)[1]]. What this does is
            #  takes the maximum confidence level in the tuples_list by taking the max
            #  number in all of the first items in each tuple and then takes the index
            #  from the max tuple by accessing its second item, and uses that index
            #  as the index for the larger results list
            text = file_response.results[max(tuples_list)[1]].alternatives[0].transcript

            #writes the text to the dataframe
            be_gs_df.loc[file_row.Index, "google_transcription"] = text

            #creates a variable for the confidence level
            # this could be written into the next line, but coding it this way
            #  makes the code more readable and understandable for me
            confidence_level = file_response.results[max(tuples_list)[1]].alternatives[0].confidence

            #writes the confidence level to the dataframe
            be_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = confidence_level
            
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(be_gs_df)} completed.")
        
    #iteration_number += 1

### Feature: Done

In [None]:
#this is the path in the google cloud storage
#google cloud storage MUST be used for files longer than a minute
#only files under a minute can be processed locally
# store files in Google Cloud Platform storage here: https://console.cloud.google.com/storage/browser
done_gcs_uri_path = "path"


#creates a column for the ibm transcripts
done_gs_df['google_transcription'] = np.nan

#creates a column for the ibm confidence level
done_gs_df['google_ConfidenceLevel'] = np.nan


# # enable this if you'd like to print a message that will show the progress
# iteration_number = 1


#cycles through the dataframe rows
for file_row in done_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #performs the transcription
    file_response = transcribe_gcs(done_gcs_uri_path, filename)
    
    #if the service isn't able to produce a transcript, writes a NaN to the dataframe
    if len(file_response.results) == 0:
        
        done_gs_df.loc[file_row.Index, "google_transcription"] = ""

        #writes a zero level confidence to the dataframe
        done_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = 0

    else:

        #if the result only has one transcript, writes the transcript to the dataframe
        if len(file_response.results) == 1:

            # this is coded this way because of the structure of the json
            #   res is the json object, which is essentially a python dictionary in this context
            #   res['results'] opens the value paired with the key 'results' which is
            #   a list of the results. [0] gets the first item which is another dictionary
            #   which contains a list of alternatives. since there is only one here, the [0]
            #   is used to get the first entry which is a dictionary with a 'transcript' key
            #   and the ASR transcribed speech as the value
            text = file_response.results[0].alternatives[0].transcript

            #writes the text to the dataframe
            done_gs_df.loc[file_row.Index, "google_transcription"] = text

            #creates a variable for the confidence level
            # this could be written into the next line, but coding it this way
            #  makes the code more readable and understandable for me
            confidence_level = file_response.results[0].alternatives[0].confidence

            #writes the confidence level to the dataframe
            done_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = confidence_level

        else:

            #if there are multiple alternatives, this code will find the alternative with the
            #  highest confidence level and take that as the transcript

            #creates an empty list to append tuples to
            tuples_list = []

            #cycles through the list of results in the res variable
            for result in file_response.results:

                # for each result, creates a tuple of the confidence level and index of each result in the lisit
                confidence_index = (result.alternatives[0].confidence, file_response.results.index(result))

                # appends the tuple to the list
                tuples_list.append(confidence_index)

            #performs the same function as the text= variable in the previous step, except here,
            #  the first [0] is replaced with [max(tuples_list)[1]]. What this does is
            #  takes the maximum confidence level in the tuples_list by taking the max
            #  number in all of the first items in each tuple and then takes the index
            #  from the max tuple by accessing its second item, and uses that index
            #  as the index for the larger results list
            text = file_response.results[max(tuples_list)[1]].alternatives[0].transcript

            #writes the text to the dataframe
            done_gs_df.loc[file_row.Index, "google_transcription"] = text

            #creates a variable for the confidence level
            # this could be written into the next line, but coding it this way
            #  makes the code more readable and understandable for me
            confidence_level = file_response.results[max(tuples_list)[1]].alternatives[0].confidence

            #writes the confidence level to the dataframe
            done_gs_df.loc[file_row.Index, "google_ConfidenceLevel"] = confidence_level
            
#     #enable this if you'd like to print a message that will show the progress
#     print(f"{iteration_number} / {len(done_gs_df)} completed.")
        
#     iteration_number += 1

# 4. IBM Watson Speech-to-Text

This code assumes you have set up an IBM Watson account. For help with start-up, see this link: https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-gettingStarted.

## References

The following code is adapted directly from [Nicholas Renotte](https://github.com/nicknochnack) (see code [here](https://github.com/nicknochnack/WatsonSTT/blob/master/Watson%20Speech%20to%20Text.ipynb) and video walkthrough [here](https://www.youtube.com/watch?v=A9_0OgW1LZU)).

## 4.1 Initial Set-up

To connect to the IBM Watson Speech-to-Text service, you must have an API key and URL. This comes from the IBM account. Follow these steps to get these: 
1. Go to IBM Cloud console, here: https://cloud.ibm.com/.
2. In the hamburger icon (three lines) on the top left, Click 'Resource list'. 
3. Click 'Services and Software'.
4. Click 'Speech to Text-bn'. 
5. Click 'Manage' (or it may automatically take you to 'Manage').
6. You should see 'Credentials'.
7. Copy the API Key and URL and insert them in the code below.

In [None]:
# Import the required packages
from ibm_watson import SpeechToTextV1
from ibm_watson.websocket import RecognizeCallback, AudioSource 
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

In [None]:
# Set the API Key and URL

#this comes from the IBM account. Go to IBM Cloud console >> Resource list >> Services >> 
#   Speech to Text-bn >> Manage >> Credentials >> Copy the API key and insert it here
apikey = "api-key"

#this comes from the IBM account. Go to IBM Cloud console >> Resource list >> Services >> 
#   Speech to Text-bn >> Manage >> Credentials >> Copy the URL and insert it here
url = "url"

In [None]:
# Setup Service
authenticator = IAMAuthenticator(apikey)

stt = SpeechToTextV1(authenticator=authenticator)

stt.set_service_url(url)

## 4.2 Execute the Code

### Feature: Ain't

In [None]:
#creates a column for the ibm transcripts
aint_gs_df['IBMWatson_transcription'] = np.nan

#creates a column for the ibm confidence level
aint_gs_df['IBMWatson_ConfidenceLevel'] = np.nan

## enable this if you'd like to print a message that will show the progress
#iteration_number = 1

#cycles through the dataframe rows
for file_row in aint_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #opens the audio file
    with open(f'{aint_audio_file_path}{filename}', 'rb') as f:
        
        #runs the speech recognition and returns a json file with results
        #  the results can include either no transcript, one transcript, or alternative transcripts
        #  it will also include a confidence level from 0 to 1 
        #  this code uses IBM's NarrowBroadband Model which seemed to be the most accurate for the data here
        #  other models (at present) include the Broadband model, Multimedia model, and Telephony model
        #   see here: https://cloud.ibm.com/apidocs/speech-to-text?code=python#listmodels
        res = stt.recognize(audio=f, content_type='audio/wav', model='en-US_NarrowbandModel', continuous=True).get_result()
        
        #if the service isn't able to produce a transcript, writes a NaN to the dataframe
        if len(res['results']) == 0:
            
            aint_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = ""
            
            #writes a zero level confidence to the dataframe
            aint_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = 0

        else:
            
            #if the result only has one transcript, writes the transcript to the dataframe
            if len(res['results']) == 1:
                
                # this is coded this way because of the structure of the json
                #   res is the json object, which is essentially a python dictionary in this context
                #   res['results'] opens the value paired with the key 'results' which is
                #   a list of the results. [0] gets the first item which is another dictionary
                #   which contains a list of alternatives. since there is only one here, the [0]
                #   is used to get the first entry which is a dictionary with a 'transcript' key
                #   and the ASR transcribed speech as the value
                text = res['results'][0]['alternatives'][0]['transcript']
                
                #writes the text to the dataframe
                aint_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = text
                
                #creates a variable for the confidence level
                # this could be written into the next line, but coding it this way
                #  makes the code more readable and understandable for me
                confidence_level = res['results'][0]['alternatives'][0]['confidence']
                
                #writes the confidence level to the dataframe
                aint_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = confidence_level

            else:
                
                #if there are multiple alternatives, this code will find the alternative with the
                #  highest confidence level and take that as the transcript
                
                #creates an empty list to append tuples to
                tuples_list = []

                #cycles through the list of results in the res variable
                for result in res['results']:
                    
                    # for each result, creates a tuple of the confidence level and index of each result in the lisit
                    confidence_index = (result['alternatives'][0]['confidence'], res['results'].index(result))
                    
                    # appends the tuple to the list
                    tuples_list.append(confidence_index)
                
                #performs the same function as the text= variable in the previous step, except here,
                #  the first [0] is replaced with [max(tuples_list)[1]]. What this does is
                #  takes the maximum confidence level in the tuples_list by taking the max
                #  number in all of the first items in each tuple and then takes the index
                #  from the max tuple by accessing its second item, and uses that index
                #  as the index for the larger results list
                text = res['results'][max(tuples_list)[1]]['alternatives'][0]['transcript']

                #writes the text to the dataframe
                aint_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = text
                
                #creates a variable for the confidence level
                # this could be written into the next line, but coding it this way
                #  makes the code more readable and understandable for me
                confidence_level = res['results'][max(tuples_list)[1]]['alternatives'][0]['confidence']
                
                #writes the confidence level to the dataframe
                aint_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = confidence_level
                
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(aint_gs_df)} completed.")
        
    #iteration_number += 1

### Feature: Be

In [None]:
#creates a column for the ibm transcripts
be_gs_df['IBMWatson_transcription'] = np.nan

#creates a column for the ibm confidence level
be_gs_df['IBMWatson_ConfidenceLevel'] = np.nan

## enable this if you'd like to print a message that will show the progress
#iteration_number = 1

#cycles through the dataframe rows
for file_row in be_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #opens the audio file
    with open(f'{be_audio_file_path}{filename}', 'rb') as f:
        
        #runs the speech recognition and returns a json file with results
        #  the results can include either no transcript, one transcript, or alternative transcripts
        #  it will also include a confidence level from 0 to 1 
        #  this code uses IBM's NarrowBroadband Model which seemed to be the most accurate for the data here
        #  other models (at present) include the Broadband model, Multimedia model, and Telephony model
        #   see here: https://cloud.ibm.com/apidocs/speech-to-text?code=python#listmodels
        res = stt.recognize(audio=f, content_type='audio/wav', model='en-US_NarrowbandModel', continuous=True).get_result()
        
        #if the service isn't able to produce a transcript, writes a NaN to the dataframe
        if len(res['results']) == 0:
            
            be_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = ""
            
            #writes a zero level confidence to the dataframe
            be_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = 0

        else:
            
            #if the result only has one transcript, writes the transcript to the dataframe
            if len(res['results']) == 1:
                
                # this is coded this way because of the structure of the json
                #   res is the json object, which is essentially a python dictionary in this context
                #   res['results'] opens the value paired with the key 'results' which is
                #   a list of the results. [0] gets the first item which is another dictionary
                #   which contains a list of alternatives. since there is only one here, the [0]
                #   is used to get the first entry which is a dictionary with a 'transcript' key
                #   and the ASR transcribed speech as the value
                text = res['results'][0]['alternatives'][0]['transcript']
                
                #writes the text to the dataframe
                be_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = text
                
                #creates a variable for the confidence level
                # this could be written into the next line, but coding it this way
                #  makes the code more readable and understandable for me
                confidence_level = res['results'][0]['alternatives'][0]['confidence']
                
                #writes the confidence level to the dataframe
                be_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = confidence_level

            else:
                
                #if there are multiple alternatives, this code will find the alternative with the
                #  highest confidence level and take that as the transcript
                
                #creates an empty list to append tuples to
                tuples_list = []

                #cycles through the list of results in the res variable
                for result in res['results']:
                    
                    # for each result, creates a tuple of the confidence level and index of each result in the lisit
                    confidence_index = (result['alternatives'][0]['confidence'], res['results'].index(result))
                    
                    # appends the tuple to the list
                    tuples_list.append(confidence_index)
                
                #performs the same function as the text= variable in the previous step, except here,
                #  the first [0] is replaced with [max(tuples_list)[1]]. What this does is
                #  takes the maximum confidence level in the tuples_list by taking the max
                #  number in all of the first items in each tuple and then takes the index
                #  from the max tuple by accessing its second item, and uses that index
                #  as the index for the larger results list
                text = res['results'][max(tuples_list)[1]]['alternatives'][0]['transcript']

                #writes the text to the dataframe
                be_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = text
                
                #creates a variable for the confidence level
                # this could be written into the next line, but coding it this way
                #  makes the code more readable and understandable for me
                confidence_level = res['results'][max(tuples_list)[1]]['alternatives'][0]['confidence']
                
                #writes the confidence level to the dataframe
                be_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = confidence_level
                
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(be_gs_df)} completed.")
        
    #iteration_number += 1

### Featue: Done

In [None]:
#creates a column for the ibm transcripts
done_gs_df['IBMWatson_transcription'] = np.nan

#creates a column for the ibm confidence level
done_gs_df['IBMWatson_ConfidenceLevel'] = np.nan

## enable this if you'd like to print a message that will show the progress
#iteration_number = 1

#cycles through the dataframe rows
for file_row in done_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #opens the audio file
    with open(f'{done_audio_file_path}{filename}', 'rb') as f:
        
        #runs the speech recognition and returns a json file with results
        #  the results can include either no transcript, one transcript, or alternative transcripts
        #  it will also include a confidence level from 0 to 1 
        #  this code uses IBM's NarrowBroadband Model which seemed to be the most accurate for the data here
        #  other models (at present) include the Broadband model, Multimedia model, and Telephony model
        #   see here: https://cloud.ibm.com/apidocs/speech-to-text?code=python#listmodels
        res = stt.recognize(audio=f, content_type='audio/wav', model='en-US_NarrowbandModel', continuous=True).get_result()
        
        #if the service isn't able to produce a transcript, writes a NaN to the dataframe
        if len(res['results']) == 0:
            
            done_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = ""
            
            #writes a zero level confidence to the dataframe
            done_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = 0

        else:
            
            #if the result only has one transcript, writes the transcript to the dataframe
            if len(res['results']) == 1:
                
                # this is coded this way because of the structure of the json
                #   res is the json object, which is essentially a python dictionary in this context
                #   res['results'] opens the value paired with the key 'results' which is
                #   a list of the results. [0] gets the first item which is another dictionary
                #   which contains a list of alternatives. since there is only one here, the [0]
                #   is used to get the first entry which is a dictionary with a 'transcript' key
                #   and the ASR transcribed speech as the value
                text = res['results'][0]['alternatives'][0]['transcript']
                
                #writes the text to the dataframe
                done_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = text
                
                #creates a variable for the confidence level
                # this could be written into the next line, but coding it this way
                #  makes the code more readable and understandable for me
                confidence_level = res['results'][0]['alternatives'][0]['confidence']
                
                #writes the confidence level to the dataframe
                done_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = confidence_level

            else:
                
                #if there are multiple alternatives, this code will find the alternative with the
                #  highest confidence level and take that as the transcript
                
                #creates an empty list to append tuples to
                tuples_list = []

                #cycles through the list of results in the res variable
                for result in res['results']:
                    
                    # for each result, creates a tuple of the confidence level and index of each result in the lisit
                    confidence_index = (result['alternatives'][0]['confidence'], res['results'].index(result))
                    
                    # appends the tuple to the list
                    tuples_list.append(confidence_index)
                
                #performs the same function as the text= variable in the previous step, except here,
                #  the first [0] is replaced with [max(tuples_list)[1]]. What this does is
                #  takes the maximum confidence level in the tuples_list by taking the max
                #  number in all of the first items in each tuple and then takes the index
                #  from the max tuple by accessing its second item, and uses that index
                #  as the index for the larger results list
                text = res['results'][max(tuples_list)[1]]['alternatives'][0]['transcript']

                #writes the text to the dataframe
                done_gs_df.loc[file_row.Index, "IBMWatson_transcription"] = text
                
                #creates a variable for the confidence level
                # this could be written into the next line, but coding it this way
                #  makes the code more readable and understandable for me
                confidence_level = res['results'][max(tuples_list)[1]]['alternatives'][0]['confidence']
                
                #writes the confidence level to the dataframe
                done_gs_df.loc[file_row.Index, "IBMWatson_ConfidenceLevel"] = confidence_level
                
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(done_gs_df)} completed.")
        
    #iteration_number += 1

# 5. Microsoft Azure Cognitive Services Speech-to-Text

This code assumes you have set up a Microsoft Azure account. For help with start-up, see this link: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-python

## References

The code for this is taken directly from Microsoft (see code [here](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-speech-to-text)).

## 5.1 Initial Set-up

To connect to the Microsoft Azure Cognitive Sevices Speech-to-Text service, you must have a speech key and speech location. To get these, follow the steps listed here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/overview#find-keys-and-locationregion.

In [None]:
# Import the required packages
import azure.cognitiveservices.speech as speechsdk

In [None]:
# Set up the speech key and speech location

#find these here: https://portal.azure.com/#home >> Click All Resources
#  Click the resource you want to use (this should have been created by you) >>
#  On the left, click Keys and Endpoint >> Copy either key and the endpoint
speech_key = "speech_key"

speech_location = "location"

## 5.2 Defining the Transcriber Function

This code will define a function that runs the transcriber. It will return the transcription and write to the dataframe.

This function takes three arguments:
1. The speech key
2. The speech location
3. The audio filename (taken from the dataframe)

In [None]:
def from_file(speech_key, speech_location, audio_filename):
    
    """
    performs speech to text on audio fil
    """
    
    #creates a speech configuration
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_location)
    
    #gets the audio input
    audio_input = speechsdk.AudioConfig(filename=audio_filename)
    
    #creates a speech recognizer
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
    
    #runs the speech recognizer
    result = speech_recognizer.recognize_once_async().get()
    
    #returns the result text
    return result.text

## 5.3 Executing the Code

### Feature: Ain't

In [None]:
#creates a column for the transcript
aint_gs_df['microsoft_transcription'] = np.nan

# # enable this if you'd like to print a message that will show the progress
# iteration_number = 1

#cycles through the dataframe rows
for file_row in aint_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #gets the full audio filepath
    audio_path = aint_audio_file_path + audio_filename
    
    #run the code and get transcription
    transcription = from_file(speech_key, speech_location, audio_path)

    #writes the transcription to the dataframe
    aint_gs_df.loc[file_row.Index, "microsoft_transcription"] = transcription
    
#     #enable this if you'd like to print a message that will show the progress
#     print(f"{iteration_number} / {len(aint_gs_df)} completed.")
        
#     iteration_number += 1

### Feature: Be

In [None]:
#creates a column for the transcript
be_gs_df['microsoft_transcription'] = np.nan

# enable this if you'd like to print a message that will show the progress
iteration_number = 1

#cycles through the dataframe rows
for file_row in be_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #gets the full audio filepath
    audio_path = be_audio_file_path + audio_filename
    
    #run the code and get transcription
    transcription = from_file(speech_key, speech_location, audio_path)

    #writes the transcription to the dataframe
    be_gs_df.loc[file_row.Index, "microsoft_transcription"] = transcription
    
    #enable this if you'd like to print a message that will show the progress
    print(f"{iteration_number} / {len(be_gs_df)} completed.")
        
    iteration_number += 1

### Feature: Done

In [None]:
#creates a column for the transcript
done_gs_df['microsoft_transcription'] = np.nan

## enable this if you'd like to print a message that will show the progress
#iteration_number = 1

#cycles through the dataframe rows
for file_row in done_gs_df.itertuples():
    
    #creates a variable with the filename
    # here i've adjusted for my 16khz files. If you don't need this, use the commented out line
    audio_filename = f"16khz_{file_row.File}_Line{file_row.Line}_FeatCount{file_row.FeatureCountPerLine}.wav"
    
    #gets the full audio filepath
    audio_path = done_audio_file_path + audio_filename
    
    #run the code and get transcription
    transcription = from_file(speech_key, speech_location, audio_path)

    #writes the transcription to the dataframe
    done_gs_df.loc[file_row.Index, "microsoft_transcription"] = transcription
    
    ##enable this if you'd like to print a message that will show the progress
    #print(f"{iteration_number} / {len(done_gs_df)} completed.")
        
    #iteration_number += 1

## Sorting the Dataframes by File and Line

This will sort the dataframes first by filename and then by line number. Doing this each step will ensure consistency across the board.

### Feature: Ain't

In [None]:
aint_gs_df = aint_gs_df.sort_values(by=['File', 'Line'])

### Feature: Be

In [None]:
be_gs_df = be_gs_df.sort_values(by=['File', 'Line'])

### Feature: Done

In [None]:
done_gs_df = done_gs_df.sort_values(by=['File', 'Line'])

## Exporting Dataframes to CSV Files

This will export the dataframes to CSV files.

In [None]:
# Designate the output path where the CSVs will be stored
csv_output_path = "path"

### Feature: Ain't

In [None]:
aint_gs_df.to_csv(f"{csv_output_path}aint_variations_ASRtranscripts.csv", index=False)

### Feature: Be

In [None]:
be_gs_df.to_csv(f"{csv_output_path}be_ASRtranscripts.csv", index=False)

### Feature: Done

In [None]:
done_gs_df.to_csv(f"{csv_output_path}done_ASRtranscripts.csv", index=False)