<h1>Tracking Language Patterns in Covenants Across Time</h1>
<p>Marguertie Mills</p>
<p>This program is designed to analyze racial covenant data gathered by the Mapping Prejudice project and identify patterns in racial covenant language from across time. Please see documentation (*executive summary) for further background and details on this project.</p>

_____________________

<h2>Package/Data Imports</h2>

In [231]:
import pandas

# get data set CSV and push into dataframe
csv = "text_search_master_abstracts.csv"
df = pandas.read_csv(csv)

___________________

<h2>Functions</h2>
<h3>1. Separartion by Decade Function</h3>

In [232]:
# split entries into seperate lists based according to what decade they belong. 
# because we are working with participatory data (see documentation for further explanation)
# the data has imperfections, in order to avoid errors for null values, we skip any non-ineger
# values. If the year value matches the year range condional, it is appended to a new list.

def year_filter(origin_df, start_year, end_year, decade_list):
    for index_num, i in origin_df.iterrows():
        try: 
            year_int = int(i['Year'])
        except:
            pass
        if year_int > start_year and year_int < end_year:
            decade_list.append(i)
        else:
            pass

<h3>2. Seperation Based on Covenant Language Function</h3>

In [233]:
# Once data is parced by decade, this function sorts entires within the decade lists based on 
# the appearence of "Inclusionary" or "Exclusionary" language (see documentation for explanation
# of these categories). If the racial restriction value(string) for each entry in the decade seperated
# lists contains the term "white" or "caucasian" (i.e. laguage that which calls out included groups)
# the item is added to a decade seperated "inclusion" list. If it does not, it is added to an
# "exclusion" list

def lang_filter(origin_df, decade_inclusion, decade_exclusion):
    for i in origin_df:
        if type(i[2]) == str:
            if "white" in i[2]:
                decade_inclusion.append(i)
            elif "Caucasian" in i[2]:
                decade_inclusion.append(i)
            elif "White" in i[2]:
                decade_inclusion.append(i)
            elif "caucasian" in i[2]:
                decade_inclusion.append(i)
            else:
                decade_exclusion.append(i)
        else:
            pass

<h3>3. Print/Organize Results Function</h3>

In [236]:
# the results are printed and and percentages of each type of covenanat are calculated to 
# demonstrate how these different types of racial covenants are deployed over time

def results_print(decade_range, decade_df, inclusion_df, exclusion_df):
    print(decade_range, '\n', 
      'total: ', len(decade_df), '\n', 
      'inclusionary covenants total: ', 
      len(inclusion_df),
      '  (', (len(inclusion_df)/len(decade_df)*100), 'percent)', '\n',
      'exclusionary covenants total: ', 
      len(exclusion_df), 
      '  (', (len(exclusion_df)/len(decade_df)*100), 'percent)', '\n')

_____________________

<h2>Initial Data Cleaning/Pre-processing</h2>
<h3>1. Eliminate first entry</h3>

In [238]:
# The first entry in the database is used for column type identification when imported into ArcMap
# and must be removed to avoid errors when processing it with python.

df2 = df.drop(df.index[0])

<h3>2. Slice year only out of the date data</h3>
<p>The original data included a full date (dd/mm/yyyy). In order to allow the split by decade function to run properly, date data for each entry must be sliced- leaving only the last four digits from the year value entries and place them in a new column for grouping.</p>

In [239]:
# The original data included a full date (dd/mm/yyyy). In order to allow the split by decade function
# to run properly, date data for each entry must be sliced- leaving only the last four digits from 
# the year value entries and place them in a new column for grouping.

df2['Year'] = df.Date_Ex.str[-4:]

__________________

<h2>Processing</h2>
<h3>1. Split data based on decade</h3>

In [241]:
# for each decade in the data set (1910s, 1920s, 1930s, 1940s) create a list and use it as a perameter in the 
# decade filtration function. Run the decacade filtration function and fill each lists will that the dadta entries
# from that date range.

df_1910s = []
year_filter(df2, 1909, 1920, df_1910s)

df_1920s = []
year_filter(df2, 1919, 1930, df_1920s)

df_1930s = []
year_filter(df2, 1929, 1940, df_1930s)

df_1940s = []
year_filter(df2, 1939, 1950, df_1940s)

df_1950s = []
year_filter(df2, 1949, 1960, df_1950s)

<h3>2. Split decade seperated data based on inclusion/exclusion language</h3>

In [242]:
# for each decade create two lists: inclusion and exclusion. Run the language filtration function 
# using the appropriate decade/language type list variables as peramiters and retun lists containing 
# the data that fit the decade and language criteria.

df_1910s_inclusion = []
df_1910s_exclusion = []
lang_filter(df_1910s, df_1910s_inclusion, df_1910s_exclusion)
    
df_1920s_inclusion = []
df_1920s_exclusion = []
lang_filter(df_1920s, df_1920s_inclusion, df_1920s_exclusion)
    
df_1930s_inclusion = []
df_1930s_exclusion = []
lang_filter(df_1930s, df_1930s_inclusion, df_1930s_exclusion)

df_1940s_inclusion = []
df_1940s_exclusion = []
lang_filter(df_1940s, df_1940s_inclusion, df_1940s_exclusion)

df_1950s_inclusion = []
df_1950s_exclusion = []
lang_filter(df_1950s, df_1950s_inclusion, df_1950s_exclusion)

______________________

<h2>Results</h2>

In [229]:
# for each decade range print the results using the function which calculate the percentage of each 
# covenant "type" for the corresponding decade.

results_print('1910-1919', df_1910s, df_1910s_inclusion, df_1910s_exclusion)
results_print('1920-1929', df_1920s, df_1920s_inclusion, df_1920s_exclusion)
results_print('1930-1939', df_1930s, df_1930s_inclusion, df_1930s_exclusion)
results_print('1940-1949', df_1940s, df_1940s_inclusion, df_1940s_exclusion)
results_print('1950-1959', df_1950s, df_1950s_inclusion, df_1950s_exclusion)

1910-1919 
 total:  549 
 inclusionary covenants total:  97   ( 17.66848816029144 percent) 
 exclusionary covenants total:  451   ( 82.14936247723132 percent) 

1920-1929 
 total:  5824 
 inclusionary covenants total:  4268   ( 73.28296703296702 percent) 
 exclusionary covenants total:  1511   ( 25.944368131868135 percent) 

1930-1939 
 total:  3490 
 inclusionary covenants total:  2895   ( 82.9512893982808 percent) 
 exclusionary covenants total:  592   ( 16.96275071633238 percent) 

1940-1949 
 total:  4570 
 inclusionary covenants total:  4199   ( 91.88183807439825 percent) 
 exclusionary covenants total:  364   ( 7.964989059080962 percent) 

1950-1959 
 total:  50 
 inclusionary covenants total:  38   ( 76.0 percent) 
 exclusionary covenants total:  11   ( 22.0 percent) 

