# Step 2.11: Checking Adjacent Tokens

This code will check adjacent tokens to the left and right of the instance of the word in the original utterance and the ASR output and see if they match. It will also check just one word to the left of each to see if they match (Further, for *be*, it will check more specific cases.). 

## Required Packages

The following packages are necessary to run this code: os, [pandas](https://pypi.org/project/pandas/), [numpy](https://pypi.org/project/numpy/)

## Intitial Setup

In [None]:
# Import required packages
import pandas as pd
import numpy as np
import os

In [None]:
#filepath for the csv produced in Step 2.10
aint_file_path = "path"

be_file_path = "path"

done_file_path = "path"

#reads in the gold standard dataframe    
aint_gs_df = pd.read_csv(aint_file_path)

be_gs_df = pd.read_csv(be_file_path)

done_gs_df = pd.read_csv(done_file_path)

# Defining the Checking Adjacent Tokens Function

This function takes the following arguments:
1. The feature
2. The number of instances of the word (taken from the dataframe)
3. The number of instances of the AAL feature (taken from the dataframe)
4. The iteration number (taken from the dataframe)
5. The cleaned original utterance content as a string
6. The cleaned ASR output as a string

The function will return different numbers for *be* inputs based on the case. These are meant to be informative and will be converted for data use in the next step. For now, here's a key:

- 13 (multiple instances): The first two words to the left of the original and the first word to the left of the ASR output are in the modals list (composed of single and multi-word compound modals).
- 12 (multiple instances): The first word to the left of the original and the first two words to the left of the ASR output are in the modals list (composed of single and multi-word compound modals).
- 11 (multiple instances): The first two words to the left of the original and ASR output are in the modals list (composed of single and multi-word compound modals).
- 10 (multiple instances): The first word to the left of the original and ASR output are both modals.
- 9 (multiple instances): The first word to the left of the original and ASR output match.
- 8 (multiple instances): The first word to the left and right of *be* in both the original and ASR output match.
- 7 (one instance): The first two words to the left of the original and the first word to the left of the ASR output are in the modals list (composed of single and multi-word compound modals).
- 6 (one instance): The first word to the left of the original and the first two words to the left of the ASR output are in the modals list (composed of single and multi-word compound modals).
- 5 (one instance): The first two words to the left of the original and ASR output are in the modals list (composed of single and multi-word compound modals).
- 4 (one instance): The first word to the left of the original and ASR output are both modals.
- 3 (one instance): The first word to the left of the original and ASR output match.
- 2 (one instance): The first word to the left and right of *be* in both the original and ASR output match.
- 1 (one instance): The first word in the ASR output is *be* but the first word in the original is also *be*.
- -1 (one instance): The first word in the ASR output is *be* but the first word in the original is not *be*.
- -2 (one instance): There is only one instance of the feature in the original utterance, but the ASR output is preceded by a modal.
- -3 (one instance): The else condition for one instance of the word.
- -4 (multiple instances): The number of word instances does not match between the original and ASR output.
- -5 (multiple instances): The else condition for multiple instances of the word.
- -6: The general else condition
- -7: The except condition for if the code tries but does not work.

In [None]:
def checkAdjacentTokens(feature, instances_count, feature_count, iteration_number, cleaned_input, cleaned_output):
    
    """
    Check adjacent tokens to the left and right of the instance of the word
    in the original utterance and the ASR output and see if they match.
    """
    
    cleaned_input = cleaned_input.replace("gon na", "going to").replace("got ta", "got to").replace("wan na", "want to").replace("i 'm a be", "i 'm going to be")
    
    
    if feature == "ain't":
        
        cleaned_input = cleaned_input.replace("ai n't", "ain't")
        
    if feature == "ain't":
        
        cleaned_input = cleaned_input.replace("ai n't", "ain't")
            
    elif feature == "isn't":

        cleaned_input = cleaned_input.replace("is n't", "isn't")

    elif feature == "aren't":

        cleaned_input = cleaned_input.replace("are n't", "aren't")

    elif feature == "I'm not":
        
        feature = "i'mnot"

        cleaned_input = cleaned_input.replace("i 'm not", "i'mnot")

    elif feature == "didn't":

        cleaned_input = cleaned_input.replace("did n't", "didn't")

    elif feature == "haven't":

        cleaned_input = cleaned_input.replace("have n't", "haven't")

    elif feature == "hasn't":

        cleaned_input = cleaned_input.replace("has n't", "hasn't")
    
    
    
    split_input = cleaned_input.split()
    
    input_index = split_input.index(feature)
    
    L2_L1_input = split_input[input_index-2:input_index]
    
    L1_R1_input = split_input[input_index-1:input_index+2]
    
    L1_input = split_input[input_index-1]
    
    
    try:
        
        cleaned_output = cleaned_output.replace("gon na", "going to").replace("got ta", "got to").replace("wan na", "want to").replace("i 'm a be", "i 'm going to be")
        
        
        if feature == "ain't":
            
            cleaned_output = cleaned_output.replace("ai n't", "ain't")
            
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)

                L1_R1_output = cleaned_output.split()[output_index-1:output_index+2]

                L1_output = cleaned_output.split()[output_index-1]


                if L1_R1_input == L1_R1_output:

                    return 1

                elif L1_input == L1_output:

                    return 1

                else:

                    return 0

            else:

                return np.nan
            
            
            
        elif feature == "isn't":
            
            cleaned_output = cleaned_output.replace("is n't", "isn't")
            
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)

                L1_R1_output = cleaned_output.split()[output_index-1:output_index+2]

                L1_output = cleaned_output.split()[output_index-1]


                if L1_R1_input == L1_R1_output:

                    return 1

                elif L1_input == L1_output:

                    return 1

                else:

                    return 0

            else:

                return np.nan
            
        
        elif feature == "aren't":
            
            cleaned_output = cleaned_output.replace("are n't", "aren't")
            
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)

                L1_R1_output = cleaned_output.split()[output_index-1:output_index+2]

                L1_output = cleaned_output.split()[output_index-1]


                if L1_R1_input == L1_R1_output:

                    return 1

                elif L1_input == L1_output:

                    return 1

                else:

                    return 0

            else:

                return np.nan
            
            
            
        elif feature == "I'm not":
            
            cleaned_output = cleaned_output.replace("i 'm not", "i'mnot")
            
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)

                L1_R1_output = cleaned_output.split()[output_index-1:output_index+2]

                L1_output = cleaned_output.split()[output_index-1]


                if L1_R1_input == L1_R1_output:

                    return 1

                elif L1_input == L1_output:

                    return 1

                else:

                    return 0

            else:

                return np.nan
            
            
            
        elif feature == "didn't":
            
            cleaned_output = cleaned_output.replace("did n't", "didn't")
            
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)

                L1_R1_output = cleaned_output.split()[output_index-1:output_index+2]

                L1_output = cleaned_output.split()[output_index-1]


                if L1_R1_input == L1_R1_output:

                    return 1

                elif L1_input == L1_output:

                    return 1

                else:

                    return 0

            else:

                return np.nan
            
            
        elif feature == "haven't":
            
            cleaned_output = cleaned_output.replace("have n't", "haven't")
            
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)

                L1_R1_output = cleaned_output.split()[output_index-1:output_index+2]

                L1_output = cleaned_output.split()[output_index-1]


                if L1_R1_input == L1_R1_output:

                    return 1

                elif L1_input == L1_output:

                    return 1

                else:

                    return 0

            else:

                return np.nan
            
        elif feature == "hasn't":
            
            cleaned_output = cleaned_output.replace("has n't", "hasn't")
            
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)

                L1_R1_output = cleaned_output.split()[output_index-1:output_index+2]

                L1_output = cleaned_output.split()[output_index-1]


                if L1_R1_input == L1_R1_output:

                    return 1

                elif L1_input == L1_output:

                    return 1

                else:

                    return 0

            else:

                return np.nan    
            
            
            
        elif feature == "done":
                        
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)

                L1_R1_output = cleaned_output.split()[output_index-1:output_index+2]

                L1_output = cleaned_output.split()[output_index-1]


                if L1_R1_input == L1_R1_output:

                    return 1

                else:

                    return 0

            else:

                return np.nan
        
            
        elif feature == "be":
            
            #this doesn't include modals which are compounds that contain "to"
            #  such as "going to" and "used to" because if the code checks for
            #  L1s being the same and if both are "to" then it will deem it correct
            modals = ["can", "could", "will", "would", "'ll", "may",
                      "might", "must", "shall", "should", "'d"]
            
            #compound modals
            multi_modals = [["going", "to"], ["got", "to"], ["has", "to"],
                            ["have", "to"], ["supposed", "to"], ["used", "to"],
                            ["ought", "to"]]
            
            #negated modals
            neg_modals = [["ca", "n't"], ["could", "n't"], ["wo", "n't"],
                          ["would", "n't"], ["must", "n't"], ["should", "n't"]]
            
            #combined list
            mega_modals = modals + multi_modals + neg_modals

            
            if instances_count == 1:

                split_output = cleaned_output.split()

                output_index = split_output.index(feature)
                
                L2_L1_output = split_output[output_index-2:output_index]

                L1_R1_output = split_output[output_index-1:output_index+2]

                L1_output = split_output[output_index-1]



                if split_output[0] == "be" and split_input[0] == "be":

                    return 1

                #the returned 2 here will be converted to a 0 in the automated correctness step
                elif split_output[0] == "be" and split_input[0] != "be":

                    return -1

                elif L1_R1_input == L1_R1_output:

                    return 2

                elif L1_input == L1_output:

                    return 3

                elif L1_input in mega_modals and L1_output in mega_modals:

                    return 4
                
                elif L2_L1_input in mega_modals and L2_L1_output in mega_modals:

                    return 5
                
                elif L1_input in mega_modals and L2_L1_output in mega_modals:

                    return 6
                
                elif L2_L1_input in mega_modals and L1_output in mega_modals:

                    return 7

                elif feature_count >= 1 and L1_output in modals:

                    return -2

                else:

                    return -3
                
                
            
            elif instances_count > 1:
                
                
                split_output = cleaned_output.split()
                
                
                input_indexes = [] 
                
                output_indexes = []
                
                
                for input_index in range(len(split_input)):
                    
                    if split_input[input_index] == "be":
                        
                        input_indexes.append(input_index)
                        
                for output_index in range(len(split_output)):
                    
                    if split_output[output_index] == "be":
                        
                        output_indexes.append(output_index)
                        
                
                if len(input_indexes) != len(output_indexes):
                    
                    return -4
                
                else:
                    
                    input_iteration_index = input_indexes[iteration_number-1]
                    
                    output_iteration_index = output_indexes[iteration_number-1]
                    
                   
                    L2_L1_input = split_input[input_iteration_index-2:input_iteration_index]

                    L1_R1_input = split_input[input_iteration_index-1:input_iteration_index+2]

                    L1_input = split_input[input_iteration_index-1]
                    
                    
                    L2_L1_output = split_output[output_iteration_index-2:output_iteration_index]
                    
                    L1_R1_output = split_output[output_iteration_index-1:output_iteration_index+2]

                    L1_output = split_output[output_iteration_index-1]
                    
                    
                    if L1_R1_input == L1_R1_output:

                        return 8

                    elif L1_input == L1_output:

                        return 9

                    elif L1_input in mega_modals and L1_output in mega_modals:

                        return 10

                    elif L2_L1_input in mega_modals and L2_L1_output in mega_modals:

                        return 11

                    elif L1_input in mega_modals and L2_L1_output in mega_modals:

                        return 12

                    elif L2_L1_input in mega_modals and L1_output in mega_modals:

                        return 13
                    
                    else:
                        
                        return -5
                
            
            else:

                return -6
        
    except:
        
        return -7

## Executing the Code

In [None]:
# a list of column names to be appended next to
column_names = ["amazon_transcription_cleaned", 
                "deepspeech_transcription_cleaned", "google_transcription_cleaned", 
                "IBMWatson_transcription_cleaned", "microsoft_transcription_cleaned"]

### Feature: Ain't

Before running the code for the *ain't* variations, the variations will be split into separate dataframes to be processed. These will be concatenated again in the end.

In [None]:
aint_df = aint_gs_df[aint_gs_df["AintVariation"]=="ain't"]
isnt_df = aint_gs_df[aint_gs_df["AintVariation"]=="isn't"]
arent_df = aint_gs_df[aint_gs_df["AintVariation"]=="aren't"]
imnot_df = aint_gs_df[aint_gs_df["AintVariation"]=="I'm not"]
didnt_df = aint_gs_df[aint_gs_df["AintVariation"]=="didn't"]
havent_df = aint_gs_df[aint_gs_df["AintVariation"]=="haven't"]
hasnt_df = aint_gs_df[aint_gs_df["AintVariation"]=="hasn't"]

In [None]:
# Defines the feature
feature = "ain't"

# Appends new columns
for column_name in column_names:
    
    col_index = aint_df.columns.get_loc(column_name)
    
    aint_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in aint_df.itertuples():
    
    aint_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    aint_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    aint_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    aint_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    aint_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

In [None]:
# Defines the feature
feature = "isn't"

# Appends new columns
for column_name in column_names:
    
    col_index = isnt_df.columns.get_loc(column_name)
    
    isnt_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in isnt_df.itertuples():
    
    isnt_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    isnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    isnt_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    isnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    isnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

In [None]:
# Defines the feature
feature = "aren't"

# Appends new columns
for column_name in column_names:
    
    col_index = arent_df.columns.get_loc(column_name)
    
    arent_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in arent_df.itertuples():
    
    arent_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    arent_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    arent_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    arent_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    arent_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

In [None]:
# Defines the feature
feature = "I'm not"

# Appends new columns
for column_name in column_names:
    
    col_index = imnot_df.columns.get_loc(column_name)
    
    imnot_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in imnot_df.itertuples():
    
    imnot_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    imnot_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    imnot_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    imnot_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    imnot_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

In [None]:
# Defines the feature
feature = "didn't"

# Appends new columns
for column_name in column_names:
    
    col_index = didnt_df.columns.get_loc(column_name)
    
    didnt_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in didnt_df.itertuples():
    
    didnt_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    didnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    didnt_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    didnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    didnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

In [None]:
# Defines the feature
feature = "haven't"

# Appends new columns
for column_name in column_names:
    
    col_index = havent_df.columns.get_loc(column_name)
    
    havent_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in havent_df.itertuples():
    
    havent_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    havent_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    havent_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    havent_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    havent_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

In [None]:
# Defines the feature
feature = "hasn't"

# Appends new columns
for column_name in column_names:
    
    col_index = hasnt_df.columns.get_loc(column_name)
    
    hasnt_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in hasnt_df.itertuples():
    
    hasnt_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    hasnt_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    hasnt_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    hasnt_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    hasnt_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

In [None]:
aint_gs_df = pd.concat([aint_df, isnt_df, arent_df, imnot_df, didnt_df, havent_df, hasnt_df])

### Feature: Be

In [None]:
# Defines the feature
feature = "be"

# Appends new columns
for column_name in column_names:
    
    col_index = be_gs_df.columns.get_loc(column_name)
    
    be_gs_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in be_gs_df.itertuples():
    
    be_gs_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    be_gs_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    be_gs_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    be_gs_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    be_gs_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

### Feature: Done

In [None]:
# Defines the feature
feature = "done"

# Appends new columns
for column_name in column_names:
    
    col_index = done_gs_df.columns.get_loc(column_name)
    
    done_gs_df.insert(col_index+2, f"{column_name}_adjacentTokens", np.nan)
            

# Loops through the rows and executes the function
for file_row in done_gs_df.itertuples():
    
    done_gs_df.loc[file_row.Index, "amazon_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.amazon_transcription_cleaned)
    
    done_gs_df.loc[file_row.Index, "deepspeech_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.deepspeech_transcription_cleaned)
    
    done_gs_df.loc[file_row.Index, "google_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.google_transcription_cleaned)
    
    done_gs_df.loc[file_row.Index, "IBMWatson_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.IBMWatson_transcription_cleaned)
    
    done_gs_df.loc[file_row.Index, "microsoft_transcription_cleaned_adjacentTokens"] = checkAdjacentTokens(feature, file_row.InstancesCountPerLine, file_row.FeatureCountPerLine, file_row.IterationNumber, file_row.Content_cleaned, file_row.microsoft_transcription_cleaned)

## Sorting the Dataframes by File and Line

This will sort the dataframes first by filename and then by line number. Doing this each step will ensure consistency across the board.

### Feature: Ain't

In [None]:
aint_gs_df = aint_gs_df.sort_values(by=['File', 'Line'])

### Feature: Be

In [None]:
be_gs_df = be_gs_df.sort_values(by=['File', 'Line'])

### Feature: Done

In [None]:
done_gs_df = done_gs_df.sort_values(by=['File', 'Line'])

## Exporting Dataframes to CSV Files

This will export the dataframes to CSV files.

In [None]:
# Designate the output path where the CSVs will be stored
csv_output_path = "path"

### Feature: Ain't

In [None]:
aint_gs_df.to_csv(f"{csv_output_path}aint_variations_checkAdjacentTokens.csv", index=False)

### Feature: Be

In [None]:
be_gs_df.to_csv(f"{csv_output_path}be_checkAdjacentTokens.csv", index=False)

### Feature: Done

In [None]:
done_gs_df.to_csv(f"{csv_output_path}done_checkAdjacentTokens.csv", index=False)