# Gentrification in New Orleans - Leah Kuperman and Leo Simanonok

https://leosimanonok.github.io/DataScienceFinalTutorial/

## Milestone 1
    
&nbsp;&nbsp;&nbsp;&nbsp;Throughout the past decade, data science has emerged as a leading field of study and industry. Data analysis has become critical in tackling societal problems and 
tracking certain trends in order to arrive at informed, logical solutions. For our final project, we have decided to study New Orleans income and property data 
to identify the prevalence of gentrification in different neighborhoods. As Tulane students, it is easy to become trapped in the uptown bubble and forget about the 
impact we have on the rest of the city’s people, economy, and culture. It is no secret that New Orleans as a whole is a victim of gentrification. Part of what makes 
the city so unique is its rich history and resilience, something to which none of the 13,000 people from out of state who have moved here from 2012-2016 can bear 
witness. That being said, our goal for this project is to find meaningful insights in our datasets which we can offer to our partners, hopefully inspiring positive 
changes in New Orleans.
<br>
&nbsp;&nbsp;&nbsp;&nbsp;Initially, one of the datasets we hoped to get our hands on was property value data. Looking at housing prices would allow us take a holistic look at gentrification
and possibly track the movement of affluent people to previously poorer neighborhoods. Unfortunately, property value data is notoriously hard to collect in New 
Orleans. Because of this, we decided to focus instead on poverty rates and income data. To this end, we found census data from the years 2010-2018 which estimates 
the median income and poverty rate of every county in the USA.
Another dataset we are looking at describes the racial profile of different neighborhoods in and around New Orleans. We hope to use this data in conjunction with 
the previously discussed census data in order to show how gentrification changes the make-up of neighborhoods. Ideally, we would like to show where the displaced 
citizens moved after their neighborhoods were gentrified. 
<br>
&nbsp;&nbsp;&nbsp;&nbsp;The dataset we decided to load into our notebook for this milestone contains the median income and poverty rate for each parish in Louisiana. Loading this dataset 
came with a host of struggles. The biggest struggle was figuring out which character(s) to use as a delimiter. Some of the columns in the data set were separated by a 
single space, some by tabs, and some by longer segments of white space. In addition, one field in particular, the name of the county, had white space in the middle of 
the value, e.g. Orleans Parish. In order to be able to load the data correctly, we decided to use r"\s+" as our delineator, which specifies any kind of white space. 
This meant that the names of the counties had to have their spaces removed by hand.
<br>
&nbsp;&nbsp;&nbsp;&nbsp;In terms of our meeting plan, we have set weekly meetings on Mondays at 1pm to analyze our progress and plan for any upcoming due dates or milestones. Depending on 
our availability and wellbeing, we have decided to meet in person in the library to avoid technological complications. In the event that we need to meet virtually, we will maintain these meeting times via Zoom. For any code we are writing for the final project, we will do so in our private GitHub repo attached to our final project page. We have already established that both of us are able to connect, pull, and push to it from our local machines.

## References/Links
https://www.census.gov/programs-surveys/saipe/data/api.html<br>
https://www.census.gov/data/datasets/time-series/demo/saipe/model-tables.html<br>
https://www.datacenterresearch.org/reports_analysis/placing-prosperity/<br>
https://richcampanella.com/wp-content/uploads/2020/02/article_Campanella_300-Years-of-Human-Geography-in-New-Orleans.pdf<br>
https://www.nytimes.com/2019/08/27/opinion/new-orleans.html

In [20]:
import requests
import pandas as pd

In [21]:
Saiphe2010_df = pd.read_csv('./est10-la.txt',
                        sep=r"\s+",
                        header=None,
                        skiprows = 1,
                        index_col=False,
                        names=["FIPS State Code", "FIPS county code", "Estimate of people of all ages in poverty",
                               "90% confidence interval lower bound of estimate of people of all ages in poverty",
                               "90% confidence interval upper bound of estimate of people of all ages in poverty",
                               "Estimated percent of people of all ages in poverty",
                               "90% confidence interval lower bound of estimate of percent of people of all ages in poverty", 
                               "90% confidence interval upper bound of estimate of percent of people of all ages in poverty", 
                               " Estimate of people age 0-17 in poverty", 
                               "90% confidence interval lower bound of estimate of people age 0-17 in poverty", 
                               "90% confidence interval upper bound of estimate of people age 0-17 in poverty", 
                               "Estimated percent of people age 0-17 in poverty", 
                              "90% confidence interval lower bound of estimate of percent of people age 0-17 in poverty", 
                               "90% confidence interval upper bound of estimate of percent of people age 0-17 in poverty", 
                               "Estimate of related children age 5-17 in families in poverty", 
                              "90% confidence interval lower bound of estimate of related children age 5-17 in families in poverty",
                               "90% confidence interval upper bound of estimate of related children age 5-17 in families in poverty", 
                               "Estimated percent of related children age 5-17 in families in poverty", 
                               "90% confidence interval lower bound of estimate of percent of related children age 5-17 in families in poverty", 
                              "90% confidence interval upper bound of estimate of percent of related children age 5-17 in families in poverty", 
                               "Estimate of median household income", "90% confidence interval lower bound of estimate of median household income", 
                              "90% confidence interval upper bound of estimate of median household income", "State or county name", 
                               "Two-letter Postal State abbreviation", "Tag"] 
                           )
Saiphe2010_df["Year"] = 2010

MedianIncome_df = Saiphe2010_df[["FIPS county code", "State or county name", 
                                 "Estimate of people of all ages in poverty", 
                               "Estimated percent of people of all ages in poverty", 
                               "Estimate of median household income", 
                               "90% confidence interval lower bound of estimate of median household income", 
                              "90% confidence interval upper bound of estimate of median household income", "Year"]]

#Adds all years of data into a singular dataset
for i in range(11, 19):
    newAddition = pd.read_csv('./est' + str(i) + '-la.txt',
                        sep=r"\s+",
                        header=None,
                        skiprows = 1,
                        index_col=False,
                        names=["FIPS State Code", "FIPS county code", "Estimate of people of all ages in poverty",
                               "90% confidence interval lower bound of estimate of people of all ages in poverty",
                               "90% confidence interval upper bound of estimate of people of all ages in poverty",
                               "Estimated percent of people of all ages in poverty",
                               "90% confidence interval lower bound of estimate of percent of people of all ages in poverty", 
                               "90% confidence interval upper bound of estimate of percent of people of all ages in poverty", 
                               " Estimate of people age 0-17 in poverty", 
                               "90% confidence interval lower bound of estimate of people age 0-17 in poverty", 
                               "90% confidence interval upper bound of estimate of people age 0-17 in poverty", 
                               "Estimated percent of people age 0-17 in poverty", 
                              "90% confidence interval lower bound of estimate of percent of people age 0-17 in poverty", 
                               "90% confidence interval upper bound of estimate of percent of people age 0-17 in poverty", 
                               "Estimate of related children age 5-17 in families in poverty", 
                              "90% confidence interval lower bound of estimate of related children age 5-17 in families in poverty",
                               "90% confidence interval upper bound of estimate of related children age 5-17 in families in poverty", 
                               "Estimated percent of related children age 5-17 in families in poverty", 
                               "90% confidence interval lower bound of estimate of percent of related children age 5-17 in families in poverty", 
                              "90% confidence interval upper bound of estimate of percent of related children age 5-17 in families in poverty", 
                               "Estimate of median household income", "90% confidence interval lower bound of estimate of median household income", 
                              "90% confidence interval upper bound of estimate of median household income", "State or county name", 
                               "Two-letter Postal State abbreviation", "Tag"] 
                           )
    newAddition["Year"] = '20' + str(i)
    
    
    newAddition = newAddition[["FIPS county code", "State or county name", "Estimate of people of all ages in poverty", 
                               "Estimated percent of people of all ages in poverty", 
                               "Estimate of median household income", 
                               "90% confidence interval lower bound of estimate of median household income", 
                              "90% confidence interval upper bound of estimate of median household income", "Year"]]
    MedianIncome_df = pd.concat([MedianIncome_df, newAddition], ignore_index=True)



In [22]:
#Change column names to be more readable
MedianIncome_df.columns = ['Parish Code', 'Parish', 'Median Household Income', 'Num in Poverty', 
                           'Pct in Poverty', 'Med. Income Upper Bound', 'Med. Income Lower Bound', 'Year']
MedianIncome_df

Unnamed: 0,Parish Code,Parish,Median Household Income,Num in Poverty,Pct in Poverty,Med. Income Upper Bound,Med. Income Lower Bound,Year
0,1,AcadiaParish,12760,21.0,36814,33728,39900,2010
1,3,AllenParish,4374,20.4,35711,33319,38103,2010
2,5,AscensionParish,13622,12.8,62069,57154,66984,2010
3,7,AssumptionParish,3858,16.7,43503,39810,47196,2010
4,9,AvoyellesParish,8350,21.6,31523,29209,33837,2010
...,...,...,...,...,...,...,...,...
571,119,WebsterParish,9764,26.0,35070,31143,38997,2018
572,121,WestBatonRougeParish,3299,12.8,58205,53003,63407,2018
573,123,WestCarrollParish,2455,23.3,39332,35467,43197,2018
574,125,WestFelicianaParish,2470,24.4,60296,53961,66631,2018
