# Step 2.12.5: Getting Habituality and Completive Columns

For the *be* and *done* CSVs produced in Step 2.12, this code will create one new column in each CSV. For each this will be a column with a binary 0 or 1 to indicate whether the instance of the word in the row encodes the AAL morphosyntactic feature in question or not. 0 means the feature is not present; 1 means the feature is present. If the cell is blank (NaN) after this code is run, that means it needs manual annotation in Step 2.13. 

If the *FeatureCountPerLine* column is 0, then the new column will populate with a 0. If the *InstancesCountPerLine* and *FeatureCountPerLine* match, the cell will populate with that number. If the two are different and the *FeatureCountPerLine* is not 0, this will be left for manual annotation. This is because the difference means there is more than one occurrence of the word in the utterance, but only a subset of them the occurrences contain the feature. Thus, this must be checked manually.

Because *ain't* has no distinction in the same way, this code will read in and write out the *ain't* CSV from the previous step without changing it. This is an extra step present simply to ensure consistency. The copy pasting of the CSV from the previous step into this step's folder could be done manually, of course.

## Required Packages

The following packages are necessary to run this code: os, [pandas](https://pypi.org/project/pandas/)

## Intitial Setup

In [None]:
# Import required packages
import pandas as pd
import numpy as np
import os

In [None]:
#filepath for the csv produced in Step 2.11
aint_file_path = "path"

be_file_path = "path"

done_file_path = "path"

#reads in the gold standard dataframe    
aint_gs_df = pd.read_csv(aint_file_path)

be_gs_df = pd.read_csv(be_file_path)

done_gs_df = pd.read_csv(done_file_path)

## Executing the Code

In [None]:
# a list of column names to be appended next to
column_names = ["FeatureCountPerLine"]

### Feature: Be

In [None]:
# Appends new columns
for column_name in column_names:
    
    col_index = be_gs_df.columns.get_loc(column_name)
    
    be_gs_df.insert(col_index+1, "Habituality", np.nan)
            
# Loops through rows and executes the code
for file_row in be_gs_df.itertuples():
    
    if file_row.FeatureCountPerLine == 0:
        
        be_gs_df.loc[file_row.Index, "Habituality"] = 0
    
    elif file_row.InstancesCountPerLine == file_row.FeatureCountPerLine and file_row.FeatureCountPerLine >=1:
        
        be_gs_df.loc[file_row.Index, "Habituality"] = 1
        
    else:
        
        continue

### Feature: Done

In [None]:
# Appends new columns
for column_name in column_names:
    
    col_index = done_gs_df.columns.get_loc(column_name)
    
    done_gs_df.insert(col_index+1, "Completive", np.nan)
            
# Loops through rows and executes the code
for file_row in done_gs_df.itertuples():
    
    if file_row.FeatureCountPerLine == 0:
        
        done_gs_df.loc[file_row.Index, "Completive"] = 0
    
    elif file_row.InstancesCountPerLine == file_row.FeatureCountPerLine and file_row.FeatureCountPerLine >=1:
        
        done_gs_df.loc[file_row.Index, "Completive"] = 1
        
    else:
        
        continue

## Sorting the Dataframes by File and Line

This will sort the dataframes first by filename and then by line number. Doing this each step will ensure consistency across the board.

### Feature: Ain't

In [None]:
aint_gs_df = aint_gs_df.sort_values(by=['File', 'Line'])

### Feature: Be

In [None]:
be_gs_df = be_gs_df.sort_values(by=['File', 'Line'])

### Feature: Done

In [None]:
done_gs_df = done_gs_df.sort_values(by=['File', 'Line'])

## Exporting Dataframes to CSV Files

This will export the dataframes to CSV files.

In [None]:
# Designate the output path where the CSVs will be stored
csv_output_path = "path"

### Feature: Ain't

In [None]:
# although the ain't CSV hasn't been changed by this code, this step is here simply for consistency
aint_gs_df.to_csv(f"{csv_output_path}aint_featureBinaryCheck.csv", index=False)

### Feature: Be

In [None]:
be_gs_df.to_csv(f"{csv_output_path}be_featureBinaryCheck.csv", index=False)

### Feature: Done

In [None]:
done_gs_df.to_csv(f"{csv_output_path}done_featureBinaryCheck.csv", index=False)