# Step 2.8: Getting Error Counts

This code will get the error counts for the content of ASR output before and after the occurrence of the feature in question. 

## Required Packages

The following packages are necessary to run this code: re, os, [pandas](https://pypi.org/project/pandas/), [numpy](https://pypi.org/project/numpy/), [wagnerfischerpp](https://gist.github.com/kylebgorman/8034009), [interruptingcow](https://pypi.org/project/interruptingcow/)

## Intitial Setup

In [None]:
# Import required packages
import pandas as pd
import numpy as np
import os

In [None]:
#filepath for the csv produced in Step 2.7
aint_file_path = "path"

be_file_path = "path"

done_file_path = "path"

#reads in the gold standard dataframe    
aint_gs_df = pd.read_csv(aint_file_path)

be_gs_df = pd.read_csv(be_file_path)

done_gs_df = pd.read_csv(done_file_path)

# Defining the Aligning Function

This function is an internal function defined here and used in the function below.

This function takes the following arguments:
1. The original utterance content
2. The ASR output content
3. The list of edits produced by the wagnerfischerpp function below

## References

This code was provided by [Kevin Tang](https://github.com/tang-kevin) through personal correspondence. See Kevin's website [here](http://www.kevintang.org/) for more work and research.

In [None]:
def align_WF(s1, s2, edits):
    
    s1 = list(s1) # in case the s1 and s2 are strings
    
    s2 = list(s2)
    
    alignment_temp = []
    
    for match_type in edits:
        
    #match_type = edit
        if match_type == 'D':
            
            oseg = '-'
            
            iseg = s1[0]
            
            s1.pop(0)
            
        elif match_type == 'I':
            
            iseg = '-'    
            
            oseg = s2[0]
            
            s2.pop(0)
            
        else: # match or sub
            
            iseg = s1[0]
            
            oseg = s2[0]
            
            s1.pop(0)
            
            s2.pop(0)
            
        alignment_temp.append((iseg,oseg))

    if (len(s1) + len(s2)) == 0:
        
        return alignment_temp
    
    else:
        
        return 'BUG'

## Defining the Word Alignment Getting Function

The function defined here is a modified version of the function used in the previous step.

This function depends on the importation of the wagnerfischerpp python script. To do so, follow these steps:
1. Go to https://gist.github.com/kylebgorman/8034009.
2. Download the wagnerfischerpp.py script.
3. Move the script into the current working directory you are working in with this code.
4. For a test, run *from wagnerfishcerpp import \** to make sure it works.

This function takes two arguments:
1. The original utterance content as a string
2. The ASR output content as a string

## References

The *getWordAlignments* function is directly adapted from the work of [Kyle Gorman](https://gist.github.com/kylebgorman) (see code [here](https://gist.github.com/kylebgorman/8034009)).

In [None]:
def getWordAlignments(original_utterance, ASR_output):
    
    """
    Gets alignments between intended and ASR inference.
    """
    
    from wagnerfischerpp import WagnerFischer
    import numpy as np
    
    
    
    #returns NaNs if the content is not a string
    if type(original_utterance) != str or type(ASR_output) != str:
        
        #this was originally return np.nan to be more precise
        #  however, since the resulting list from this function
        #  has to be converted to a string in order to be added
        #  to the pandas dataframe, i use this because if 
        #  str(np.nan) is run, it returns the string "nan"
        #  rather than the empty cell I need
        return ""
    
    
    else:
        
        #the cleaning process separates ain't into "ai" and "'nt"
        #  this will make it back into one word for this process
        #  without changing it in the actual csv
        if feature == "ain't":
        
            original_utterance = original_utterance.replace("ai n't", "ain't")

            ASR_output = ASR_output.replace("ai n't", "ain't")
            
        elif feature == "isn't":
        
            original_utterance = original_utterance.replace("is n't", "isn't")

            ASR_output = ASR_output.replace("is n't", "isn't")
        
        elif feature == "aren't":
        
            original_utterance = original_utterance.replace("are n't", "aren't")

            ASR_output = ASR_output.replace("are n't", "aren't")
            
        elif feature == "I'm not":
        
            original_utterance = original_utterance.replace("i 'm not", "i'm not")

            ASR_output = ASR_output.replace("i 'm not", "i'm not")
            
        elif feature == "didn't":
        
            original_utterance = original_utterance.replace("did n't", "didn't")

            ASR_output = ASR_output.replace("did n't", "didn't")
            
        elif feature == "haven't":
        
            original_utterance = original_utterance.replace("have n't", "haven't")

            ASR_output = ASR_output.replace("have n't", "haven't")
            
        elif feature == "hasn't":
        
            original_utterance = original_utterance.replace("has n't", "hasn't")

            ASR_output = ASR_output.replace("has n't", "hasn't")
            
        
        #splits the strings into a list
        original_list = original_utterance.split()
        
        ASR_list = ASR_output.split()
        
        
        #performs the alignments
        aligns = [list(WagnerFischer(original_list, ASR_list).alignments())[0]]
        
        alignment_all = [align_WF(original_list, ASR_list, iii) for iii in aligns]
        
    return alignment_all[0]

## Defining the Pre-Feature Error Count Getting Function

This function will get alignments between the original utterance and ASR output. It will then calculate how many errors there are before the feature provided. 

This function takes the following arguments:
1. The original utterance content as a string
2. The ASR output content as a string
3. The feature
4. The iteration number (taken from the dataframe)

In [None]:
def getPreFeatureErr(original_utterance, ASR_output, feature, iteration_number):
    
    """
    Gets alignments between intended and ASR inference.
    Then calculates how many errors before the feature provided
    Depends on wagnerfisherpp. Have the wagnerfischerpp.py file in the
    same directory and then run from wagnerfischerpp import * before using this.
    it also depends on align_WF() from Kevin Tang, defined above
    intended_speech and perceived_inference should be strings.
    The iteration number is for lines that have more than one
    instance of the feature in the line.
    this function will break them into lists
    """
    
    from wagnerfischerpp import WagnerFischer
    import numpy as np

    
    
    #returns NaNs if the content is not a string
    if type(original_utterance) != str or type(ASR_output) != str:
        
        return np.nan
    
    
    else:
        
        #the cleaning process separates ain't into "ai" and "'nt"
        #  this will make it back into one word for this process
        #  without changing it in the actual csv
        if feature == "ain't":

            original_utterance = original_utterance.replace("ai n't", "ain't")

            ASR_output = ASR_output.replace("ai n't", "ain't")
            
        elif feature == "isn't":
        
            original_utterance = original_utterance.replace("is n't", "isn't")

            ASR_output = ASR_output.replace("is n't", "isn't")
        
        elif feature == "aren't":
        
            original_utterance = original_utterance.replace("are n't", "aren't")

            ASR_output = ASR_output.replace("are n't", "aren't")
            
        elif feature == "I'm not":
        
            original_utterance = original_utterance.replace("i 'm not", "i'm not")

            ASR_output = ASR_output.replace("i 'm not", "i'm not")
            
        elif feature == "didn't":
        
            original_utterance = original_utterance.replace("did n't", "didn't")

            ASR_output = ASR_output.replace("did n't", "didn't")
            
        elif feature == "haven't":
        
            original_utterance = original_utterance.replace("have n't", "haven't")

            ASR_output = ASR_output.replace("have n't", "haven't")
            
        elif feature == "hasn't":
        
            original_utterance = original_utterance.replace("has n't", "hasn't")

            ASR_output = ASR_output.replace("has n't", "hasn't")
            
        
        #splits the strings into a list
        original_list = original_utterance.split()
        
        ASR_list = ASR_output.split()
        
        
        #performs the alignments
        aligns = [list(WagnerFischer(original_list, ASR_list).alignments())[0]]
        
        alignment_all = [align_WF(original_list, ASR_list, iii) for iii in aligns]
          
        
        #creates empty list to append the indexes of all the instances of features occurring in the line
        feature_indexes = []  
        
        
        # special case for I'm not since it's two words
        if feature == "I'm not" or feature == "i'm not":
            
            #creates a list of the two words
            feature = ["i'm", "not"]
            
            # the following will loop through pairs of words made into two set lists
            # to see if the two set list matches the feature. the try/except structure
            # is here so that when the loop hits the last word in the alignment list
            # it won't throw an out of index range error
            
            try:
                
                #alignment_all is a list within a list, so the [0] gets the list within the list
                for x in range(len(alignment_all[0])):
                    
                    # the structure here is the alignment list[the first alignment][the tuple at index at the number of the loop][the first item in the tuple]
                    if [alignment_all[0][x][0], alignment_all[0][x+1][0]] == feature:
                                                
                        feature_indexes.append(x) 

            except:
                                
                pass

        
        else:
        
            #cycles through the tuples, which are composed of the intended (original)
            #  utterance token paired with its aligned token in the perceived (ASR)
            #  text. if the first item of the tuple matches the feature, the index
            #  of the feature is appended to the feature_indexes list
            for x in range(len(alignment_all[0])):

                if alignment_all[0][x][0] == feature:

                    feature_indexes.append(x) 

                
                
        #if there is only one index in the list, performs the process for the
        #  content before the one instance of the feature
        if len(feature_indexes) == 1:


            #creates an empty list. this block will count how many errors there are
            #  by cycling through the tuples and comparing the content in the tuples.
            #  if the first and second items are the same, that means that the ASR
            #  got it right. if they are different, then that means it got it wrong.
            #  if it matches, a 0 will be appended to the list. if it doesn't match,
            #  a 1 will be appended to the list. in the end, this list will be summed
            #  and that sum number is the number of errors the ASR made before the feature
            pre_feature_errors = []



            #cycles through the tuples which occur before the feature index
            for y in alignment_all[0][:feature_indexes[0]]:

                #checks if the first item in the tuple matches the second or not
                if y[0] == y[1]:

                    pre_feature_errors.append(0)

                else:

                    pre_feature_errors.append(1)


            #gets the sum of the feature error list
            num_pre_feature_errors = sum(pre_feature_errors)


        # if there are no instances, this will return a nan
        elif len(feature_indexes) == 0:

            num_pre_feature_errors = np.nan



        #if there are more than one instance in the line, this will
        # take the iteration number and use that to get the correct index
        #  of the feature in the line. it does this by collecting
        #  the indexes of all the features in the alignment list,
        #  storing them in a list, then, uses the iteration number to determine
        #  which of the indexes is the correct to use for that particular instance
        else:

            pre_feature_errors = []


            #cycles through the tuples which occur before the feature index
            #  takes the iteration number and subtracts one. so if the iteration number
            #  is 1, meaning the first iteration or occurrence of the feature in the line
            #  and subtracts 1 making 0, then that will be used to get the first
            #  index in the list. that index will be used to determine the boundary of where
            #  the content before the feature ends


            for y in alignment_all[0][:feature_indexes[iteration_number-1]]:

                if y[0] == y[1]:

                    pre_feature_errors.append(0)

                else:

                    pre_feature_errors.append(1)

            #gets the sum of the feature error list
            num_pre_feature_errors = sum(pre_feature_errors)
            
            
        return num_pre_feature_errors

## Defining the Post-Feature Error Count Getting Function

This function will get alignments between the original utterance and ASR output. It will then calculate how many errors there are after the feature provided. 

This function takes the following arguments:
1. The original utterance content as a string
2. The ASR output content as a string
3. The feature
4. The iteration number (taken from the dataframe)

In [None]:
def getPostFeatureErr(original_utterance, ASR_output, feature, iteration_number):
    
    """
    Gets alignments between intended and ASR inference.
    Then calculates how many errors after the feature provided
    Depends on wagnerfisherpp. Have the wagnerfischerpp.py file in the
    same directory and then run from wagnerfischerpp import * before using this.
    it also depends on align_WF() from Kevin Tang, defined above
    intended_speech and perceived_inference should be strings.
    The iteration number is for lines that have more than one
    instance of the feature in the line.
    this function will break them into lists
    """
    
    from wagnerfischerpp import WagnerFischer
    import numpy as np

        
    #returns NaNs if the content is not a string
    if type(original_utterance) != str or type(ASR_output) != str:

        return np.nan

    
    else:
        
        #the cleaning process separates ain't into "ai" and "'nt"
        #  this will make it back into one word for this process
        #  without changing it in the actual csv
        if feature == "ain't":

            original_utterance = original_utterance.replace("ai n't", "ain't")

            ASR_output = ASR_output.replace("ai n't", "ain't")
        
        elif feature == "isn't":
        
            original_utterance = original_utterance.replace("is n't", "isn't")

            ASR_output = ASR_output.replace("is n't", "isn't")
        
        elif feature == "aren't":
        
            original_utterance = original_utterance.replace("are n't", "aren't")

            ASR_output = ASR_output.replace("are n't", "aren't")
            
        elif feature == "I'm not":
        
            original_utterance = original_utterance.replace("i 'm not", "i'm not")

            ASR_output = ASR_output.replace("i 'm not", "i'm not")
            
        elif feature == "didn't":
        
            original_utterance = original_utterance.replace("did n't", "didn't")

            ASR_output = ASR_output.replace("did n't", "didn't")
            
        elif feature == "haven't":
        
            original_utterance = original_utterance.replace("have n't", "haven't")

            ASR_output = ASR_output.replace("have n't", "haven't")
            
        elif feature == "hasn't":
        
            original_utterance = original_utterance.replace("has n't", "hasn't")

            ASR_output = ASR_output.replace("has n't", "hasn't")
            
        
        #splits the strings into a list
        original_list = original_utterance.split()

        ASR_list = ASR_output.split()
        
        
        #performs the alignments
        aligns = [list(WagnerFischer(original_list, ASR_list).alignments())[0]]
        #original --> aligns = list(WagnerFischer(intended_list, perceived_list).alignments(bfirst = True))
        
        alignment_all = [align_WF(original_list, ASR_list, iii) for iii in aligns]

        #creates empty list to append the indexes of all the instances of features occurring in the line
        feature_indexes = []  #####

        
         # special case for I'm not since it's two words
        if feature == "I'm not" or feature == "i'm not":
            
            #creates a list of the two words
            feature = ["i'm", "not"]
            
            # the following will loop through pairs of words made into two set lists
            # to see if the two set list matches the feature. the try/except structure
            # is here so that when the loop hits the last word in the alignment list
            # it won't throw an out of index range error
            
            try:
                
                #alignment_all is a list within a list, so the [0] gets the list within the list
                for x in range(len(alignment_all[0])):
                    
                    # the structure here is the alignment list[the first alignment][the tuple at index at the number of the loop][the first item in the tuple]
                    if [alignment_all[0][x][0], alignment_all[0][x+1][0]] == feature:
                                                
                        feature_indexes.append(x) 

            except:
                                
                pass

        
        else:
        
            #cycles through the tuples, which are composed of the intended (original)
            #  utterance token paired with its aligned token in the perceived (ASR)
            #  text. if the first item of the tuple matches the feature, the index
            #  of the feature is appended to the feature_indexes list
            for x in range(len(alignment_all[0])):

                if alignment_all[0][x][0] == feature:

                    feature_indexes.append(x) 
        
                
        #if there is only one index in the list, performs the process for the
        #  content after the one instance of the feature
        if len(feature_indexes) == 1:
            
            #creates an empty list. this block will count how many errors there are
            #  by cycling through the tuples and comparing the content in the tuples.
            #  if the first and second items are the same, that means that the ASR
            #  got it right. if they are different, then that means it got it wrong.
            #  if it matches, a 0 will be appended to the list. if it doesn't match,
            #  a 1 will be appended to the list. in the end, this list will be summed
            #  and that sum number is the number of errors the ASR made after the feature
            post_feature_errors = []
            

            #cycles through the tuples which occur before the feature index
            for y in alignment_all[0][feature_indexes[0]+1:]:
                                
                #checks if the first item in the tuple matches the second or not
                if y[0] == y[1]:

                    post_feature_errors.append(0)

                else:

                    post_feature_errors.append(1)

                    
            #gets the sum of the feature error list
            num_post_feature_errors = sum(post_feature_errors)
           
        
        # if there are no instances of I'm not, this will return a nan
        elif len(feature_indexes) == 0:

            num_post_feature_errors = np.nan
        
        
        #if there are more than one instance in the line, this will
        # take the iteration number and use that to get the correct index
        #  of the feature in the line. it does this by collecting
        #  the indexes of all the features in the alignment list,
        #  storing them in a list, then, uses the iteration number to determine
        #  which of the indexes is the correct to use for that particular instance
        else:

            post_feature_errors = []
            
            
            #cycles through the tuples which occur before the feature index
            #  takes the iteration number and subtracts one. so if the iteration number
            #  is 1, meaning the first iteration or occurrence of the feature in the line
            #  and subtracts 1 making 0, then that will be used to get the first
            #  index in the list. that index will be used to determine the boundary of where
            #  the content before the feature begins. one is added in the end because
            #  python includes the first number in the [ : ] list splicer, so 
            #  to have the list after the index, you have to add one            
            for y in alignment_all[0][feature_indexes[iteration_number-1]+1:]:

                if y[0] == y[1]:

                    post_feature_errors.append(0)

                else:

                    post_feature_errors.append(1)
            
            
            #gets the sum of the feature error list
            num_post_feature_errors = sum(post_feature_errors)
            
        
        return num_post_feature_errors

## Executing the Code

In [None]:
# Import the required packages
from interruptingcow import timeout 

In [None]:
# a list of column names to be appended next to
column_names = ["amazon_transcription_cleaned", 
                "deepspeech_transcription_cleaned", "google_transcription_cleaned", 
                "IBMWatson_transcription_cleaned", "microsoft_transcription_cleaned"]

### Feature: Ain't

Before running the code for the *ain't* variations, the variations will be split into separate dataframes to be processed. These will be concatenated again in the end.

In [None]:
aint_df = aint_gs_df[aint_gs_df["AintVariation"]=="ain't"]
isnt_df = aint_gs_df[aint_gs_df["AintVariation"]=="isn't"]
arent_df = aint_gs_df[aint_gs_df["AintVariation"]=="aren't"]
imnot_df = aint_gs_df[aint_gs_df["AintVariation"]=="I'm not"]
didnt_df = aint_gs_df[aint_gs_df["AintVariation"]=="didn't"]
havent_df = aint_gs_df[aint_gs_df["AintVariation"]=="haven't"]
hasnt_df = aint_gs_df[aint_gs_df["AintVariation"]=="hasn't"]

In [None]:
# Define the feature
feature = "ain't"

# Appends new columns
for column_name in column_names:
    
    col_index = aint_df.columns.get_loc(column_name)
    
    aint_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    aint_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    aint_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in aint_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                aint_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                aint_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                aint_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                aint_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                aint_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                aint_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                aint_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                aint_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                aint_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                aint_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                aint_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                aint_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                aint_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                aint_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                aint_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(aint_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/{len(aint_df)} completed.")
        
    progress_number += 1

In [None]:
# Define the feature
feature = "isn't"

# Appends new columns
for column_name in column_names:
    
    col_index = isnt_df.columns.get_loc(column_name)
    
    isnt_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    isnt_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    isnt_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in isnt_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                isnt_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                isnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                isnt_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                isnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                isnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                isnt_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                isnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                isnt_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                isnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                isnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                isnt_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                isnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                isnt_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                isnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                isnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(isnt_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/{len(isnt_df)} completed.")
        
    progress_number += 1

In [None]:
# Define the feature
feature = "aren't"

# Appends new columns
for column_name in column_names:
    
    col_index = arent_df.columns.get_loc(column_name)
    
    arent_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    arent_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    arent_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in arent_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                arent_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                arent_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                arent_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                arent_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                arent_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                arent_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                arent_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                arent_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                arent_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                arent_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                arent_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                arent_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                arent_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                arent_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                arent_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(arent_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/{len(arent_df)} completed.")
        
    progress_number += 1

In [None]:
# Define the feature
feature = "I'm not"

# Appends new columns
for column_name in column_names:
    
    col_index = imnot_df.columns.get_loc(column_name)
    
    imnot_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    imnot_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    imnot_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in imnot_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                imnot_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                imnot_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                imnot_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                imnot_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                imnot_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                imnot_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                imnot_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                imnot_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                imnot_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                imnot_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                imnot_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                imnot_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                imnot_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                imnot_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                imnot_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(imnot_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/{len(imnot_df)} completed.")
        
    progress_number += 1

In [None]:
# Define the feature
feature = "didn't"

# Appends new columns
for column_name in column_names:
    
    col_index = didnt_df.columns.get_loc(column_name)
    
    didnt_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    didnt_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    didnt_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in didnt_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                didnt_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                didnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                didnt_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                didnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                didnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                didnt_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                didnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                didnt_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                didnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                didnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                didnt_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                didnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                didnt_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                didnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                didnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(didnt_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/{len(didnt_df)} completed.")
        
    progress_number += 1

In [None]:
# Define the feature
feature = "haven't"

# Appends new columns
for column_name in column_names:
    
    col_index = havent_df.columns.get_loc(column_name)
    
    havent_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    havent_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    havent_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in havent_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                havent_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                havent_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                havent_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                havent_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                havent_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                havent_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                havent_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                havent_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                havent_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                havent_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                havent_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                havent_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                havent_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                havent_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                havent_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(havent_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/{len(havent_df)} completed.")
        
    progress_number += 1

In [None]:
# Define the feature
feature = "hasn't"

# Appends new columns
for column_name in column_names:
    
    col_index = hasnt_df.columns.get_loc(column_name)
    
    hasnt_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    hasnt_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    hasnt_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in hasnt_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                hasnt_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                hasnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                hasnt_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                hasnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                hasnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                hasnt_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                hasnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                hasnt_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                hasnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                hasnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                hasnt_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                hasnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                hasnt_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                hasnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                hasnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(hasnt_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/{len(hasnt_df)} completed.")
        
    progress_number += 1

In [None]:
aint_gs_df = pd.concat([aint_df, isnt_df, arent_df, imnot_df, didnt_df, havent_df, hasnt_df])

### Feature: Be

In [None]:
# Define the feature
feature = "be"

# Appends new columns
for column_name in column_names:
    
    col_index = be_gs_df.columns.get_loc(column_name)
    
    be_gs_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    be_gs_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    be_gs_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in be_gs_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                be_gs_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                be_gs_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                be_gs_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                be_gs_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                be_gs_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                be_gs_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                be_gs_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                be_gs_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                be_gs_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                be_gs_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                be_gs_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                be_gs_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                be_gs_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                be_gs_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                be_gs_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"be_{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(be_gs_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"be_{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/len(be_gs_df) completed.")
        
    progress_number += 1

### Feature: Done

In [None]:
# Define the feature
feature = "done"

# Appends new columns
for column_name in column_names:
    
    col_index = done_gs_df.columns.get_loc(column_name)
    
    done_gs_df.insert(col_index+3, f"{column_name}_Alignments", np.nan)
        
    done_gs_df.insert(col_index+4, f"{column_name}_preFeature_errorCount", np.nan)
        
    done_gs_df.insert(col_index+5, f"{column_name}_postFeature_errorCount", np.nan)

# enable this if you'd like a progress check printed
progress_number = 1

# cycles through all rows and executes all functions
for file_row in done_gs_df.itertuples():
    
    try:

        #sets the timeout timer to 10 seconds
        with timeout(10, exception = RuntimeError):    

                done_gs_df.loc[file_row.Index, "amazon_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.amazon_transcription_cleaned))
                
                done_gs_df.loc[file_row.Index, "deepspeech_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned))
                
                done_gs_df.loc[file_row.Index, "google_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.google_transcription_cleaned))
                
                done_gs_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned))
                
                done_gs_df.loc[file_row.Index, "microsoft_transcription_cleaned_Alignments"] = str(getWordAlignments(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned))

            
                done_gs_df.loc[file_row.Index, "amazon_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                done_gs_df.loc[file_row.Index, "deepspeech_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                done_gs_df.loc[file_row.Index, "google_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                done_gs_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                done_gs_df.loc[file_row.Index, "microsoft_transcription_cleaned_preFeature_errorCount"] = getPreFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)


                done_gs_df.loc[file_row.Index, "amazon_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.amazon_transcription_cleaned, feature, file_row.IterationNumber)

                done_gs_df.loc[file_row.Index, "deepspeech_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned, feature, file_row.IterationNumber)

                done_gs_df.loc[file_row.Index, "google_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.google_transcription_cleaned, feature, file_row.IterationNumber)

                done_gs_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned, feature, file_row.IterationNumber)

                done_gs_df.loc[file_row.Index, "microsoft_transcription_cleaned_postFeature_errorCount"] = getPostFeatureErr(file_row.Content_cleaned, file_row.microsoft_transcription_cleaned, feature, file_row.IterationNumber)

                # Enable this if you'd like to print for progress
                print(f"be_{file_row.Index}_{file_row.File}_{file_row.Line}    {progress_number}/{len(done_gs_df)} completed.")


    except RuntimeError: 

        # enable this to print a flag
        print(f"be_{file_row.Index}_{file_row.File}_{file_row.Line} timed out and was skipped.  {progress_number}/len(done_gs_df) completed.")
        
    progress_number += 1

## Sorting the Dataframes by File and Line

This will sort the dataframes first by filename and then by line number. Doing this each step will ensure consistency across the board.

### Feature: Ain't

In [None]:
aint_gs_df = aint_gs_df.sort_values(by=['File', 'Line'])

### Feature: Be

In [None]:
be_gs_df = be_gs_df.sort_values(by=['File', 'Line'])

### Feature: Done

In [None]:
done_gs_df = done_gs_df.sort_values(by=['File', 'Line'])

## Exporting Dataframes to CSV Files

This will export the dataframes to CSV files.

In [None]:
# Designate the output path where the CSVs will be stored
csv_output_path = "path"

### Feature: Ain't

In [None]:
aint_gs_df.to_csv(f"{csv_output_path}aint_variations_errorCounts.csv", index=False)

### Feature: Be

In [None]:
be_gs_df.to_csv(f"{csv_output_path}be_errorCounts.csv", index=False)

### Feature: Done

In [None]:
done_gs_df.to_csv(f"{csv_output_path}done_errorCounts.csv", index=False)