## Week 5 Assignment - Midterm
### Race as Vulnerability: The Relationship between Policing and Gentrification
#### John Parks, Joshua Claxton and Miles Cressy

The aim for this midterm is to demonstrate our progress in assessing the relationship between poli

Import various packages for data manipulation/cleaning and visualization.

In [None]:
# standard data manipulation
import pandas as pd

# geospatial data manipulation/visualization
import geopandas as gpd
import plotly.express as px

# advanced graph visualization
import seaborn as sns
import matplotlib.pyplot as plt

# To ignore warnings
import warnings
warnings.filterwarnings('ignore')

### Assessing Gentrification

Before we delve into policing data, we'll attempt to define gentrification through a layering of various metrics related to housing and demographic data in the greater Los Angeles Area and their potential shifts over time. We will primarily build this foundation of gentrification through the lens of rent burden and shifts in rent prices, changes in renter-occupied household demographics, and concentration of building permit data. The potential metrics that could be utilized are various and expansive, and as our team approaches the final project, our selection of variables will organically shift and adapt as we develop a deeper understanding of gentrification.

#### Demographic Shifts in Renter-occupied Households

In [None]:
df_race_2021
df_race_2017

#### Rent Burden and Rent Price Shifts

Rent burden is defined broadly by the percentage of renter-occupied households who pay 30% or more of household income toward rent. This indicator will help inform the degree to which particular households are able to absorb forces of gentrification and which are susceptible. Minor cleaning was done prior to bringing in this data.

In [None]:
# import rent burden csv file
df_rent_burden = pd.read_csv('rent_burden_race_2021.csv', dtype = {'Geo_FIPS':str})
# fips column does not have a leading zero, let's add that here
df_rent_burden['Geo_FIPS'] = df_rent_burden['Geo_FIPS'].str.zfill(11)
# inspect rent burden data frame
df_rent_burden.head()

In [None]:
df_rent_burden.info()

In [None]:
df_rent_burden.describe()

It is likely where rent burden values are 0, the census tract has a total population of 0. Let's investigate what accounts for the remaining 20 observations where rent burden is 0 but the total population is a non-zero value.

In [None]:
# isolate observations where rent burden is 0 but total_pop has a non-zero value.
df_rent_burden[(df_rent_burden['rent_burden'] == 0) & (df_rent_burden['total_pop'] != 0)]

In [None]:
sns.boxplot(df_rent_burden, x='rent_burden')

#### Building Permit

In [None]:
df_2021 = pd.read_csv('rent_burden_ct_1.csv')

In [None]:
df_2021 = 

In [None]:
df_gentrification_2021 = pd.read_csv('2021_gentrification.csv', dtype = {'fips':str})
df_gentrification_2017 = pd.read_csv('2017_gentrification.csv', dtype = {'fips':str})

In [None]:
df_gentrification=df_gentrification_2021.merge(df_gentrification_2017,on="fips")

In [None]:
df_gentrification = df_gentrification[df_gentrification['income_2021'] != 0]
df_gentrification = df_gentrification[df_gentrification['income_2017'] != 0]
df_gentrification = df_gentrification[df_gentrification['rent_2021'] != 0]
df_gentrification = df_gentrification[df_gentrification['rent_2017'] != 0]
df_gentrification

In [None]:
df_gentrification['gentrification'] = df_gentrification.income_rent_ratio_2021-df_gentrification.income_rent_ratio_2017
df_gentrification

In [None]:
df_gentrification['fips'] = '0' + df_gentrification['fips']
df_gentrification

In [None]:
tracts = tracts[['geometry', 'fips']]
tracts

In [None]:
gdf_gentrification =df_gentrification.merge(tracts,on="fips")
gdf_gentrification

In [None]:
gdf_gentrification.gentrification.describe()

In [None]:
gdf_gentrification = gpd.GeoDataFrame(gdf_gentrification, 
                       geometry='geometry')

In [None]:
fig = px.choropleth_mapbox(gdf_gentrification, geojson=gdf_gentrification.geometry, locations=gdf_gentrification.index, color='gentrification',
                           color_continuous_scale="viridis",
                           range_color=(-.6, .7),
                           mapbox_style="carto-positron",
                           zoom=8, center = {"lat": 34.14218, "lon": -118.28411},
                           opacity=0.5)
fig.show()

In [None]:
df_arrest = pd.read_csv('Arrest_Data_from_2020_to_Present.csv')
df_arrest.head()

In [None]:
df_arrest.info()

In [None]:
df_arrest = df_arrest[df_arrest['Charge Description'].notna()]

In [None]:
df_arrest['Charge Group Description'].value_counts()

In [None]:
pd.set_option('display.max_columns', 500)
df_arrest[df_arrest['Charge Group Description'] == 'Miscellaneous Other Violations'].value_counts('Charge Description')                          

In [None]:
df_arrest_property = df_arrest[(df_arrest['Charge Description'] == 'REFUSE TO LEAVE PROPERTY UPON REQUEST P.O.') | 
                               (df_arrest['Charge Description'] == 'REFUSE TO LEAVE PROPERTY UPON REQST OWNER') | 
                               (df_arrest['Charge Description'] == 'TRESPASSING LANDS UNDER CULTIVATION')]

In [None]:
df_arrest_property.shape

In [None]:
# lets visualize the dataframe spatially by converting it to a geodataframe
gdf_arrest_property = gpd.GeoDataFrame(df_arrest_property, 
                       geometry=gpd.points_from_xy(df_arrest_property['LON'], df_arrest_property['LAT']))

In [None]:
gdf_arrest_property = gdf_arrest_property[gdf_arrest_property['LON'] != 0]
gdf_arrest_property.plot(figsize = (20,20), markersize = 1)

In [None]:
# re-import rent burden csv file but this time with Geo_FIPS as a string
df = pd.read_csv('rent_burden_ct_1.csv', dtype = {'Geo_FIPS':str})
# fips column does not have a leading zero, let's add that here
df['Geo_FIPS'] = df['Geo_FIPS'].str.zfill(11)
# import census tract spatial data
tracts=gpd.read_file('Census_Tracts_2020.geojson')
# create FIPS column
tracts['fips'] ='06' + '037' + tracts['CT20']
tracts = tracts[['geometry','fips']]
# rename column
df = df.rename({'Geo_FIPS': 'fips'}, axis=1)
# create a new dataframe based on the join
df_tracts=tracts.merge(df,on="fips")

In [None]:
df_tracts

In [None]:
df_neighborhoods

In [None]:
# spatial join to place census tracts into a neighborhood 
df = gpd.sjoin(gdf_arrest_property, df_tracts)

In [None]:
df.plot(markersize=2)

In [None]:
import plotly.express as px

In [None]:
?update_layout

In [None]:
# plot using plotly.express with color indicating housing type and size indicating funding amount
fig = px.scatter_mapbox(df,
                        lat='LAT',
                        lon='LON',
                        mapbox_style="carto-positron",
                        color = 'rent_burden')

# options on the layout
fig.update_layout(
    width = 1200,
    height = 800,
    title = "Interactive Map of Affordable Housing",
    title_x=0.5, # aligns title to center
) 

fig.show()


In [None]:
C:\Users\18189\Desktop\Grad School\221\Group Project# create function that creates assesses the renter population by race then assigns that row with one of the major race categories
def categorise_race_tracts(row):  
    if row['black_alone'] > row['white_alone'] and row['black_alone'] > row['hispanic'] and row['black_alone'] > row['other'] and row['black_alone'] > row['asian_alone']:
        return 'Black'
    elif row['hispanic'] > row['white_alone'] and row['hispanic'] > row['black_alone'] and row['hispanic'] > row['other'] and row['hispanic'] > row['asian_alone']:
        return 'Hispanic'
    elif row['white_alone'] > row['hispanic'] and row['white_alone'] > row['black_alone'] and row['white_alone'] > row['other'] and row['white_alone'] > row['asian_alone']:
        return 'White'
    elif row['asian_alone'] > row['hispanic'] and row['asian_alone'] > row['black_alone'] and row['asian_alone'] > row['other'] and row['black_alone'] > row['white_alone']:
        return 'Asian'
    return 'Other'

In [None]:
# iterate on dataframe and create new column by applying created race function
df_tracts['majority_race'] = df_tracts.apply(lambda row: categorise_race_tracts(row), axis=1)
# assess results
df_tracts.value_counts('majority_race')

In [None]:
gdf = gpd.read_file('Census_Tracts_2020.geojson')

In [None]:
gdf

In [None]:
gdf.plot()