## Week 5 Assignment - Midterm
### Race as Vulnerability: The Relationship between Policing and Gentrification
#### John Parks, Joshua Claxton and Miles Cressy

The aim for this midterm is to demonstrate our progress in assessing the relationship between policing and gentrification. We will do so by first exploring various metrics that help contribute to or inform the process of gentrification. Although there are numerous definitions of gentrification, we have strived to build a rough framework to define gentrification as the displacement of minority communities as a result of development forces. To demonstrate this working definition, we will delve into demographic shifts of renter-occupied households, development patterns through filing of building permits, and rent price changes and rent burden percentages by race. These metrics may change as we progress toward our final project but we hope the evaluation of these metrics get us closer to laying the foundation of our understanding of gentrification.

Following the groundwork we lay for evaluating gentrification, we will dive into various forms of policing data and evaluate the relationship between policing and communities that are susceptible to gentrifying forces.  

__________________________________________________________________________________________________________________________________________________________________________________________________

Import various packages for data manipulation/cleaning and visualization.

In [None]:
# standard data manipulation
import pandas as pd
import numpy as np
# geospatial data manipulation/visualization
import geopandas as gpd
import plotly as pl
import plotly.express as px
import plotly.graph_objects as go
import libpysal as lp
import contextily as ctx
import json
import seaborn as sns
import plotly.offline as pyo

# advanced graph visualization
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import LinearLocator

# To ignore warnings
import warnings
warnings.filterwarnings('ignore')

### Part A: Assessing Gentrification

Before we begin to overlay policing and gentrification data, we need to develop an understanding and a way to evaluate gentrification. This is a difficult concept to precisely measure but we will attempt to do so by building off established definitions of gentrification and evaluating metrics that correlate with these definitions. For this midterm, we assess demographic shifts, building permit issuance counts and rent price shifts.

#### A1: Demographic Shifts in Renter-occupied Households

Our first step is to bring in race data from 2017 and 2021, conducting some pre-cleaning as we bring them into the notebook. These values are defined by census tracts for the entirety of LA County but we want to analyze this data within the context of neighbhorhoods within the boundary of LA city - to do this, we will bring in a LA neighborhood geojson as well.

In [None]:
# import demographic csv file data from 2017 Census, rename FIPS column for merge and add a leading zero.
tracts_race_17 = pd.read_csv('data/race_2017.csv', dtype = {'fips':str})
tracts_race_17.rename(columns={'fips': 'FIPS'}, inplace=True)
tracts_race_17['FIPS'] = tracts_race_17['FIPS'].str.zfill(11)

# import demographic csv file data from 2021 Census, rename some columns for merge and add a leading zero.
tracts_race_21 = pd.read_csv('data/rent_burden_ct_1.csv',dtype = {'Geo_FIPS':str})
tracts_race_21.rename(columns={'total_housing_units': 'total','white_alone':'white','black_alone':'black','asian_alone':'asian' }, inplace=True)
tracts_race_21['FIPS'] = tracts_race_21['Geo_FIPS'].str.zfill(11)

# import LA neighborhoods from ArcGIS
nbh = gpd.read_file("https://services5.arcgis.com/7nsPwEMP38bSkCjy/arcgis/rest/services/LA_Times_Neighborhoods/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")

In [None]:
#check 2017 dataframe
tracts_race_17.head()

In [None]:
#check 2021 dataframe
tracts_race_21.head()

Our two dataframes contain census tract identifiers, along with values for each category of race for renter-occupied households. We will subset our race dataframes to only contain the major racial groups as this will simplify our analysis but keep our results and conclusions nearly the same.

In [None]:
#subset dataframes and conduct additional review
tracts_race_17 = tracts_race_17[['FIPS','total','white','black','asian','hispanic']]
tracts_race_21 = tracts_race_21[['FIPS','total','white','black','asian','hispanic']]

In [None]:
#review subsetted dataframe
tracts_race_17.info()

In [None]:
#review subsetted dataframe
tracts_race_21.info()

It is important to note that the Census was conducted in 2020, meaning the census tract boundaries are either different or have completely changed in some cases. We wanted to ensure that we had a good historical coverage to view demographic shifts through but we are doing so at the cost of lost observations due to dissimilar census tracts.

In [None]:
#merge census tract dataframes from both years on FIPS, then review data
race_merge = tracts_race_17.merge(tracts_race_21, on='FIPS',
                                  suffixes=('_2017', '_2021'),
                                 indicator=True)
race_merge.describe()

After merging to ensure that all similar census tracts were accounted for, we will separate the dataframe by year, add a year column and then append them together in order to create a longer (rather than a wider) dataframe.

In [None]:
# now disaggregatge to make the dataset long instead of wide to prepare for appending
ct_21 = race_merge[['FIPS','total_2021','white_2021','black_2021', 'asian_2021', 'hispanic_2021']]
ct_21.rename(columns={'total_2021': 'total','white_2021':'white','black_2021':
                               'black','asian_2021':'asian','hispanic_2021':'hispanic' }, inplace=True)
ct_17 = race_merge.drop(columns=['_merge','total_2021','white_2021','black_2021', 'asian_2021', 'hispanic_2021'])
ct_17.rename(columns={'total_2017': 'total','white_2017':'white','black_2017':
                               'black','asian_2017':'asian','hispanic_2017':'hispanic' }, inplace=True)

In [None]:
# add column for year for each demographic dataset
ct_17['year']=2017
ct_21['year']=2021
# demogaphic information is ready to be appended
ct_race=ct_17.append(ct_21)
# 10 year census tract data for understanding of demograhic shifts, with each shared census tract having two observations (one for 2017 and one for 2021)
ct_race.sort_values(by=['year','FIPS'])
ct_race

Having organized our data appropriately, we will now add in geospatial information so that we can later spatially join the census tracts into the LA neighborhood geodataframe. This will entail bringing in census tract data, adjusting the columns to match our working dataframe and then a merge to join the data together.

In [None]:
# merge in census spatial information
tracts=gpd.read_file('data/Census_Tracts_2020.geojson')
# subset for only relevant information
tracts = tracts[['CT20','geometry']]
# create new column using CT20 column that allows us to merge on FIPS
tracts['FIPS'] ='06' + '037' + tracts['CT20']
# merge now that we have a shared column
spatial_race=tracts.merge(ct_race, on='FIPS')
# double check this worked correctly
spatial_race.info()

In [None]:
# explore neighborhoods
nbh.head()

In [None]:
# check coordinate system to ensure they match on join
nbh.crs

In [None]:
# prepare demographic data for spatial join with neighborhoods
# first adjust census data to point data 
spatial_race = spatial_race.to_crs(4326)
spatial_race['Centroid']=spatial_race.to_crs('+proj=cea').centroid.to_crs(spatial_race.crs)
spatial_race_1 = spatial_race.to_crs(4326)
spatial_race_1 = spatial_race_1.drop(columns=['CT20','geometry'])
spatial_race_1.rename(columns={'Centroid': 'geometry'}, inplace=True)
spatial_race_1.head(20)

Now that the race data is cleaned and prepped, we will run a spatial join on the race data (now with census tract centroids) and the neighborhood polygon data.

In [None]:
# spatial join
nbh_race = nbh.sjoin(spatial_race_1, how="left")
nbh_race.sort_values(by=['FIPS','year'])
nbh_race.head()

Now that we've successfully joined the dataframes, let's organize our new geodataframe, create new columns based on the metric we're interested in analyzing and then sort by those new columns.

In [None]:
# collapse by year and neighborhood
nbhr = nbh_race.groupby(['year','name'],
                                           as_index=False).agg(Geometry = ('geometry','first'),
                                                               Total = ('total','sum'),
                                                               White = ('white','sum'),
                                                               Black = ('black','sum'),
                                                               Asian = ('asian','sum'),
                                                               Hispanic = ('hispanic','sum')
                                                              )
nbhr = nbhr.sort_values(by=['name','year'])
nbhr.head(20)

Now that our geodataframe is organized appropriately, we will generate new columns based on white and black demographic shifts. Before doing so, considering we no longer have use for our 2017 data, we will drop those rows along with any 2021 rows that have missing data. We will then generate a new column that evaluates the differential between the black and white renter population shift as a way to measure the neighborhoods experiencing the greatest change - we will sort our dataframe by this value and produce a bar chart that visualizes this.

In [None]:
# row calculations to determine % shifts in renter-occupied household demographics
nbhr['delta_b'] = nbhr['Black'].div(nbhr.groupby('name')['Black'].shift())
nbhr['delta_b_perc'] = nbhr['delta_b']*100
nbhr['delta_w'] = nbhr['White'].div(nbhr.groupby('name')['White'].shift())
nbhr['delta_w_perc'] = nbhr['delta_w']*100
# add column which accounts for the % change in demographic shift between white and black renters 
nbhr['Percent Differential'] = nbhr.delta_w_perc - nbhr.delta_b_perc
#check column
nbhr

In [None]:
# drop 2017 values as we've already captured the difference between the two years via the 'difference' column.
# this step is separate as we need to utilize this dataframe to subset our permit dataframe in the next exercise
nbhr_2021 = nbhr[nbhr.year == 2021].reset_index(drop = True)
nbhr_2021

Before visualizing our data, we want to isolate neighborhoods that have a relatively high population of black renters and demonstrated a net emigration of black renters. These neighborhoods will be the focus of our analysis going forward as we attempt to lay the groundwork for gentrification.

In [None]:
# drop null 2021 values
nbhr_trim = nbhr_2021[nbhr_2021['delta_b'].notna()]

# select for the black communities that are populous and where black folks have moved or been moved out in the past decade
nbhr_trim_black = nbhr_trim[(nbhr_trim['Black']>500) & (nbhr_trim['delta_b_perc']<100)]
nbhr_trim_black = nbhr_trim_black.sort_values('Percent Differential', ascending = False)
nbhr_trim_black.head(20)

In [None]:
# let's plot our findings from above via a bar graph, with the left bar indicating black renter population shift and the right bar indicating white renter population shift
fig = go.Figure()

# set visual style
sns.set(style="darkgrid")

fig.add_trace(go.Bar(x=nbhr_trim_black['name'],
                y=nbhr_trim_black['delta_b_perc'],
                name='Black emigration',
                marker_color='rgb(158,73,99)',
                     opacity = .8
                ))
fig.add_trace(go.Bar(x=nbhr_trim_black['name'],
                     y=nbhr_trim_black['delta_w_perc'],
                     name='White migration',
                marker_color='rgb(216,139,118)',
                      opacity = .8
                ))
fig.add_hline(y=100, line_dash="dot")

fig.add_annotation(x=26, y=102.5,
            text="in-out migration threshold",
                  showarrow = False)

fig.update_layout(
    title='Demographic shifts in LA neighborhoods, sorted by white immigration to black emigration differentials in descending order.',
    title_y = .9,
    xaxis_tickfont_size=12,
    yaxis=dict(
        title='2021 Renter-occupied population as a percent of 2017 value',
        titlefont_size=14,
        tickfont_size=12,
    ),
    legend=dict(
        x=1,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15, # gap between bars of adjacent location coordinates.
    bargroupgap=0.1 ,# gap between bars of the same location coordinate.
    
    width=1500,
    height=750
)
fig.show()

fig.write_image("data/Demo.png")

In [None]:
# create chloropleth map that evaluates the differential seen between the two bars above (expanded for whole dataset rather than just black neighborhoods)
fig = px.choropleth_mapbox(nbhr_2021, geojson=nbhr_2021.Geometry, locations=nbhr_2021.index, color='Percent Differential',
                           mapbox_style="carto-positron",
                           range_color=(-100, 100),
                           zoom=9.15, 
                           center = {"lat": 34.02, "lon": -118.38411},
                           opacity=0.5,
                           hover_name = 'name')

# options on the layout
fig.update_layout(
    width = 1200,
    height = 800,
    title = "Interactive Map of Demographic Shifts, from 2017 to 2021 in L.A. (White Immigration vs Black Emigration)",
    title_y=.96,
    title_x=0.455,
) 

fig.show()

Remarks:
- We've created and sorted our dataset to reflect demographic shifts, particularly in white and black renter-occupied households. We defined some paramaters to try and slowly wittle down and isolate black neighbhorhoods that are showing demonstrating the ramifications of gentrification.
- We will need to explore and iterate on our approach to see if there are better methods to defining meaningful population shifts (i.e. historical data, renter-population by proportions rather than absolute values, etc.)
- We are missing some key neighborhoods as a result of unaligned census tract data - we will want to revisit this going forward to remedy this.
________________________________________________________________________________________________________________________________________________________________________________________________________________

#### A2: Development - Builing Permit Issuance

Now that we've assessed demographic shifts, let's begin layering/combining other metrics that help to inform a definition of gentrification. Let's take a look at building permits from LA City Data in order to assess the degree to which a neighborhood is undergoing development.

In [None]:
# load in csv
df_permit = pd.read_csv('data/Building_Permits_clean')
# we don't need this column right now - we'll drop for the time being
df_permit = df_permit.drop(columns = '# of Residential Dwelling Units')
df_permit.info()

In [None]:
# check size of dataframe
df_permit.shape

This is a very large file, we will need to subset in order to manage it size. Let's start by removing null values so we can trim it down.

In [None]:
# view # of missing values
print(df_permit['Latitude/Longitude'].isna().sum())
print(df_permit['Census Tract'].isna().sum())

In [None]:
# drop missing values and re-check to see if we successfully dropped
df_permit = df_permit[df_permit['Latitude/Longitude'].isna() != True]
df_permit = df_permit[df_permit['Census Tract'].isna() != True]
print(df_permit['Latitude/Longitude'].isna().sum())
print(df_permit['Census Tract'].isna().sum())

In [None]:
# we need to parse out the lat/long values into a format that geopandas can join on, we'll split the column 'Latitude/Longitude' into 'lat' and 'long' and convert to numeric
# split column, then adding new columns with split results
df_permit[['lat', 'long']] = df_permit['Latitude/Longitude'].str.split(', ', expand=True)
# remove unnessary characters
df_permit["lat"]=df_permit["lat"].str.replace('(','')
df_permit["long"]=df_permit["long"].str.replace(')','')
# convert string to numeric
df_permit['lat'] = pd.to_numeric(df_permit['lat'])
df_permit['long'] = pd.to_numeric(df_permit['long'])
# drop unused column
df_permit = df_permit.drop(columns = ['Latitude/Longitude'])
df_permit.head()

We're almost there but now we need to convert the date column to datetime so we can easily filter that dataframe by year. We will then convert these new dataframes to geodataframes so that we can join this data with the LA neighborhood boundaries.

In [None]:
# convert column to datetime, then subset dataframe into two separate based on year the permit was issued (2021 vs 2017)
df_permit['Issue Date'] = pd.to_datetime(df_permit['Issue Date'])
df_permit_2021 = df_permit[(df_permit['Issue Date'] > '2021-01-01') & (df_permit['Issue Date'] < '2022-01-01')]
df_permit_2017 = df_permit[(df_permit['Issue Date'] > '2017-01-01') & (df_permit['Issue Date'] < '2018-01-01')]

In [None]:
# convert dataframes into geodataframes
gdf_permit_2021 = gpd.GeoDataFrame(df_permit_2021,
                         crs='epsg:4326',
                       geometry=gpd.points_from_xy(df_permit_2021['long'], df_permit_2021['lat']))

gdf_permit_2017 = gpd.GeoDataFrame(df_permit_2017,
                         crs='epsg:4326',
                       geometry=gpd.points_from_xy(df_permit_2017['long'], df_permit_2017['lat']))

Considering that one site can have a multitude of permits issued, we will drop duplicates based on lat/long so that we are not counting the same development more than once. Once we've done so, we can then spatially join with the neighborhood boundaries.

In [None]:
# drop duplicates from both dataframes and evaluate how many rows were dropped as a result
print('# of 2021 sites before dropping duplicates:', gdf_permit_2021.shape[0])
gdf_permit_2021 = gdf_permit_2021.drop_duplicates(subset='geometry')
print('# of 2021 sites after dropping duplicates:', gdf_permit_2021.shape[0])

print('# of 2017 sites before dropping duplicates:', gdf_permit_2017.shape[0])
gdf_permit_2017 = gdf_permit_2017.drop_duplicates(subset='geometry')
print('# of 2017 sites after dropping duplicates:', gdf_permit_2017.shape[0])

Quick pause before we spatially join with neighborhood data - let's see what this look like as raw point data.

In [None]:
# plot using plotly.express with color indicating permit type
fig = px.scatter_mapbox(gdf_permit_2021,
                        lat=gdf_permit_2021.geometry.y,
                        lon=gdf_permit_2021.geometry.x,
                        mapbox_style="carto-positron",
                        color = 'Permit Sub-Type',
                        opacity = .6,
                        zoom=9.15, 
                        center = {"lat": 34.02, "lon": -118.38411},)

# options on the layout
fig.update_layout(
    width = 1000,
    height = 800,
    title = "Interactive Map of Permit Issuances for 2021 by Permit Type",
    title_x=0.5 # aligns title to center
) 

fig.show()

In [None]:
# spatial join on dataframes created above and neighborhood geodataframe
neighborhoods_permits_2021 = gpd.sjoin(nbh, gdf_permit_2021)
neighborhoods_permits_2017 = gpd.sjoin(nbh, gdf_permit_2017)

In [None]:
# drop unused columns
neighborhoods_permits_2021 = neighborhoods_permits_2021.drop(columns = ['index_right', 'Issue Date','lat','long','Census Tract'])
neighborhoods_permits_2017 = neighborhoods_permits_2017.drop(columns = ['index_right', 'Issue Date','lat','long','Census Tract'])

In [None]:
# inspect dataframe
neighborhoods_permits_2021

Let's take a quick look at the values for each neighborhood by permit type as this represents how we want to structure and organize our dataframes.

In [None]:
# view breakdown by name, by permit type and their counts
neighborhoods_permits_2021.groupby(['name','Permit Sub-Type']).size()

In [None]:
# view breakdown by name, by permit type and their counts
neighborhoods_permits_2017.groupby(['name','Permit Sub-Type']).size()

We now have two dataframes with each row indicating a development, by neighborhood, by type. We now want to reduce the number of rows to the number of neighborhoods in LA City, with a count representing the total number of permits for each category represented by a column. We will do this by getting dummies for each permit type, summing their values for each neighborhood and then dropping duplicates.

In [None]:
# make dataframes wider by getting dummies for permit type
neighborhoods_permits_2021 = pd.get_dummies(neighborhoods_permits_2021, columns=['Permit Sub-Type'])
neighborhoods_permits_2017 = pd.get_dummies(neighborhoods_permits_2017, columns=['Permit Sub-Type'])
# the last four permit types are a small fraction of the total permit counts, to simplify our analysis, we will drop these columns.
neighborhoods_permits_2021 = neighborhoods_permits_2021.drop(columns = ['Permit Sub-Type_Offsite', 'Permit Sub-Type_Onsite', 'Permit Sub-Type_Public Safety Only', 'Permit Sub-Type_Special Equipment'])
neighborhoods_permits_2017 = neighborhoods_permits_2017.drop(columns = ['Permit Sub-Type_Offsite', 'Permit Sub-Type_Onsite', 'Permit Sub-Type_Public Safety Only', 'Permit Sub-Type_Special Equipment'])
# rename columns
neighborhoods_permits_2021 = neighborhoods_permits_2021.rename(columns={"Permit Sub-Type_1 or 2 Family Dwelling": "sfh","Permit Sub-Type_Apartment":"apt","Permit Sub-Type_Commercial":"com"})
neighborhoods_permits_2017 = neighborhoods_permits_2017.rename(columns={"Permit Sub-Type_1 or 2 Family Dwelling": "sfh","Permit Sub-Type_Apartment":"apt","Permit Sub-Type_Commercial":"com"})

In [None]:
# check dataframe to confirm
neighborhoods_permits_2021

In [None]:
# sum total count of permits by neighborhood and then drop duplicates
# 2021 dataframe
neighborhoods_permits_2021['sfh'] = neighborhoods_permits_2021.groupby("name", sort=False)["sfh"].transform('sum')
neighborhoods_permits_2021['apt'] = neighborhoods_permits_2021.groupby("name", sort=False)["apt"].transform('sum')
neighborhoods_permits_2021['com'] = neighborhoods_permits_2021.groupby("name", sort=False)["com"].transform('sum')
neighborhoods_permits_2021 = neighborhoods_permits_2021.drop_duplicates()

# 2017 dataframe
neighborhoods_permits_2017['sfh'] = neighborhoods_permits_2017.groupby("name", sort=False)["sfh"].transform('sum')
neighborhoods_permits_2017['apt'] = neighborhoods_permits_2017.groupby("name", sort=False)["apt"].transform('sum')
neighborhoods_permits_2017['com'] = neighborhoods_permits_2017.groupby("name", sort=False)["com"].transform('sum')
neighborhoods_permits_2017 = neighborhoods_permits_2017.drop_duplicates()

In [None]:
neighborhoods_permits_2021

In [None]:
# create total permit column for each year
# 2021 dataframe
neighborhoods_permits_2021['total'] = neighborhoods_permits_2021['sfh'] + neighborhoods_permits_2021['apt'] + neighborhoods_permits_2021['com']
# 2017 dataframe
neighborhoods_permits_2017['total'] = neighborhoods_permits_2017['sfh'] + neighborhoods_permits_2017['apt'] + neighborhoods_permits_2017['com']

We now have two dataframes with permit counts for two separate years that coincide with the LA neighborhoods that we want to analyze. With this completed, we will merge the dataframes and generate a new column with the % change in permit count to derive a proxy value that roughly indicates the growth in development across LA cities. 

In [None]:
# merge dataframes so that we have one dataframe containing one row for each neighborhood and permit issuance counts for each year and permit type
neighborhoods_permits_combined = neighborhoods_permits_2021.merge(neighborhoods_permits_2017,suffixes=('_2021', '_2017'),on="name")
# subset dataframe to only contain useful columns
neighborhoods_permits_combined = neighborhoods_permits_combined[['name','geometry_2021','sfh_2021','apt_2021','com_2021','total_2021','sfh_2017','apt_2017','com_2017','total_2017']]
# rename geometry column
neighborhoods_permits_combined = neighborhoods_permits_combined.rename(columns={'geometry_2021':'geometry'})
# generate new column to account for permit change over time
neighborhoods_permits_combined['Permit Number Percent Change'] = (neighborhoods_permits_combined.total_2021/neighborhoods_permits_combined.total_2017)*100
# sort by name
neighborhoods_permits_combined = neighborhoods_permits_combined.sort_values('name')
# view dataframe
neighborhoods_permits_combined

To build off our previous exercise of evaluating demographic shifts, we will bring in the value that we created which assessed the % difference in white immigration vs black emigration. We will use this value to sort our analysis as we've already begun to define potential neighborhoods that may likely to experience the ill-effects of gentrification. We will do this by first adding this column onto our permit dataframe, followed by subsetting the dataframe with the neighborhoods that we had isolated previously.

In [None]:
# add demographic shift values in order to sort our visualization
neighborhoods_permits_combined['Percent Differential'] = nbhr_2021['Percent Differential'].values
# reset index so we can subset our permit dataframe on the final demographic shift dataframe created in the previous exercise
neighborhoods_permits_combined = neighborhoods_permits_combined.reset_index(drop=True)
neighborhoods_permits_combined

In [None]:
# subset dataframe our permit dataframe on the dataframe created in the previous exercise
neighborhoods_permits_combined_black = neighborhoods_permits_combined[neighborhoods_permits_combined.index.isin(nbhr_trim_black.index)]
# sort by demographic shift value
neighborhoods_permits_combined_black = neighborhoods_permits_combined_black.sort_values('Percent Differential', ascending = False)
neighborhoods_permits_combined_black.head(5)

In [None]:
# let's plot our findings from above via a bar graph with neighborhoods on the x-axis, with the left bar indicating 2017 building permit numbers and the right bar indicating 2021 building permit numbers
fig = go.Figure()

fig.add_trace(go.Bar(x=neighborhoods_permits_combined_black['name'],
                y=neighborhoods_permits_combined_black['total_2017'],
                name='2017 Permit Count',
                marker_color='rgb(158,73,99)',
                     opacity = .8
                ))
fig.add_trace(go.Bar(x=neighborhoods_permits_combined_black['name'],
                y=neighborhoods_permits_combined_black['total_2021'],
                name='2021 Permit Count',
                marker_color='rgb(216,139,118)',
                     opacity = .8
                ))

fig.update_layout(
    title='Build Permit Issued: 2017 vs 2021, neighborhoods ordered in descending order by black vs white shift differential percentages.',
    title_y = .9,
    xaxis_tickfont_size=12,
    yaxis=dict(
        title='Number of Building Permits Issued',
        titlefont_size=14,
        tickfont_size=12,
    ),
    legend=dict(
        x=1,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15, # gap between bars of adjacent location coordinates.
    bargroupgap=0.1 ,# gap between bars of the same location coordinate.
    
    width=1500,
    height=750
)
fig.show()

fig.write_image("data/Permit.png")

Now that we have visualized the absolute values of permit issuances for the two years, let's take a look at how these values have changed overtime via a choropleth map with the color indicating the percent change between the two years.

In [None]:
# geodataframe got converted back to normal dataframe during merging process, re-doing this step to create map plot
neighborhoods_permits_combined = gpd.GeoDataFrame(neighborhoods_permits_combined,
                         crs='epsg:4326',
                       geometry=neighborhoods_permits_combined.geometry)
neighborhoods_permits_combined.info()

In [None]:
# create chloropleth map that evaluates the differential seen between the two bars above (expanded for whole dataset rather than just black neighborhoods)
fig = px.choropleth_mapbox(neighborhoods_permits_combined, geojson=neighborhoods_permits_combined.geometry, locations=neighborhoods_permits_combined.index, color='Permit Number Percent Change',
                           mapbox_style="carto-positron",
                           range_color=(50, 150),
                           zoom=9.15, 
                           center = {"lat": 34.02, "lon": -118.38411},
                           opacity=0.5,
                           hover_name = 'name')

# options on the layout
fig.update_layout(
    width = 1200,
    height = 800,
    title = "Interactive Map of Permit Count Percent Changes, from 2017 to 2021 in Los Angeles",
    title_y=.96,
    title_x=0.43,
) 

fig.show()

Remarks:
- We've successfully developed a better understanding of development patterns by neighborhood from the 2017 to 2021, with a particular focus on the black neighborhoods we isolated in the previous step.
- Some interesting findings is that of the top neighborhoods with the largest demographic shift, the overall building permit issuance counts are relatively low (e.g. Adams-Normandie, Jefferson Park, Exposition Park, etc.). This may be caused by the fact that these neighborhoods are relatively small and any percent changes in demographic shifts would be weighted heavily. This is not necessarily incorrect or a negative outcome but something to keep in mind going forward. We will want to make sure the manner in which we classify neighborhoods is resilient to being misproportionately weighted.
- It would be interesting to visualize the differences in permit types and see if anything can be gleaned from that type of development that is occuring in these areas. This difference would be nuanced but could be important in our analysis going forward.
________________________________________________________________________________________________________________________________________________________________________________________________________________

#### A3: Rent Price Shifts

The final variable we will assess is the change in rent price values from 2017 to 2021 (5 year averages). We will essentially mirror the process conducted in the previous exercise by merging census tract level data with our neighborhood boundaries. This will help us develop a better understanding of if and to what degree minority neighborhoods are in the process of or at risk of being priced out of their community. 

In [None]:
# bring in 2017 rent data
df_rent_2017 = pd.read_csv('data/rent_2017.csv', dtype = {'FIPS':str})
# bring in 2021 rent data
df_rent_2021 = pd.read_csv('data/rent_2021.csv', dtype = {'FIPS':str})

# add leading zeroes
df_rent_2017['FIPS'] = df_rent_2017['FIPS'].str.zfill(11)
df_rent_2021['FIPS'] = df_rent_2017['FIPS'].str.zfill(11)

Merging, checking for null values, dropping if need be, and taking a glimpse at the data.

In [None]:
# merge data 
df_rent=df_rent_2021.merge(df_rent_2017,on="FIPS")
# check for na values
df_rent.isna().sum()

In [None]:
# drop na values and re-check
df_rent = df_rent.dropna()
df_rent.isna().sum()

In [None]:
# assess rent statistics
df_rent.describe()

We have our rent dataframe correctly organized by census tract but we will want to merge it with our dataframe that contains spatial information so that we can ultimately locate the census tracts within our neighborhoods.

In [None]:
# merge dataframes together
spatial_rent=tracts.merge(df_rent, on='FIPS')
# check if successful
spatial_rent

Prepare our new dataframe which contains geospatial census tract information by creating a centroid for the census boundaries. Then run a spatial join and check if successful.

In [None]:
# prepare demographic data for spatial join with neighborhoods
# first adjust census data to point data 
spatial_rent = spatial_rent.to_crs(4326)
spatial_rent['Centroid']=spatial_rent.to_crs('+proj=cea').centroid.to_crs(spatial_rent.crs)
spatial_rent_1 = spatial_rent.to_crs(4326)
spatial_rent_1 = spatial_rent.drop(columns=['CT20','geometry'])
spatial_rent_1.rename(columns={'Centroid': 'geometry'}, inplace=True)
spatial_rent_1

In [None]:
# spatial join
nbh_rent = nbh.sjoin(spatial_rent_1, how="left")
nbh_rent = nbh_rent.drop(columns = ['OBJECTID','index_right','FIPS'])
nbh_rent

We now have located our census tract level data into our neighborhood boundaries but we have lost a considerable amount of information along the way. In order to reduce our geodataframe to neighborhood level, we will need to average rent values for both years by neighborhood name. We will then trim down our dataframe to capture just neighborhood-level data. Finally we will generate a new column that is the difference between the two averages. 

In [None]:
# average rent values for each year by neighborhood and generate columns with new values
nbh_rent['rent_21_mean'] = nbh_rent.groupby("name", sort=False)["rent_2021"].transform('mean')
nbh_rent['rent_17_mean'] = nbh_rent.groupby("name", sort=False)["rent_2017"].transform('mean')
# drop unused columns
nbh_rent = nbh_rent.drop(columns = ['rent_2021','rent_2017'])
# drop duplicate rows
nbh_rent = nbh_rent.drop_duplicates()
# add column which calculates the % change from the two timeframes for each census tract
nbh_rent['Rent Price Percent Change'] = (nbh_rent.rent_21_mean/nbh_rent.rent_17_mean)*100
nbh_rent = nbh_rent.sort_values('name').reset_index()
# check if successful
nbh_rent

Lastly we will add the Percent Differential used in the first exercise to help sort our visualizations (as we did previously) and then we will subset our data based on the neighborhoods we originally isolated.

In [None]:
# add Percent Differential from first exercise 
nbh_rent['Percent Differential'] = nbhr_2021['Percent Differential'].values
# subset dataframe our permit dataframe on the dataframe created in the first exercise
nbh_rent_black = nbh_rent[nbh_rent.index.isin(nbhr_trim_black.index)]
# sort by demographic shift value
nbh_rent_black = nbh_rent_black.sort_values('Percent Differential', ascending = False)
nbh_rent_black

We are now ready to plot our create dataframe/geodataframe above.

In [None]:
# let's plot our findings from above via a bar graph with neighborhoods on the x-axis,
# with the left bar indicating 2017 average rent prices and the right bar indicating 2021 average rent pricees
fig = go.Figure()

fig.add_trace(go.Bar(x=nbh_rent_black['name'],
                y=nbh_rent_black['rent_17_mean'],
                name='2017 Average Rent Price',
                marker_color='rgb(158,73,99)',
                     opacity = .8
                ))
fig.add_trace(go.Bar(x=nbh_rent_black['name'],
                y=nbh_rent_black['rent_21_mean'],
                name='2021 Average Rent Price',
                marker_color='rgb(216,139,118)',
                     opacity = .8
                ))

fig.update_layout(
    title='Average Rent Prices: 2017 vs 2021, neighborhoods ordered in descending order by black vs white shift differential percentages.',
    title_y = .9,
    xaxis_tickfont_size=12,
    yaxis=dict(
        title='Average Rent Prices, in dollars',
        titlefont_size=14,
        tickfont_size=12,
    ),
    legend=dict(
        x=1,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15, # gap between bars of adjacent location coordinates.
    bargroupgap=0.1 ,# gap between bars of the same location coordinate.
    
    width=1500,
    height=750
)
fig.show()

fig.write_image("data/Rent.png")

In [None]:
# create chloropleth map that evaluates the differential seen between the two bars above (expanded for whole dataset rather than just black neighborhoods)
fig = px.choropleth_mapbox(nbh_rent, geojson=nbh_rent.geometry, locations=nbh_rent.index, color='Rent Price Percent Change',
                           mapbox_style="carto-positron",
                           #range_color=(50, 150),
                           zoom=9.15, 
                           center = {"lat": 34.02, "lon": -118.38411},
                           opacity=0.5,
                           hover_name = 'name')

# options on the layout
fig.update_layout(
    width = 1200,
    height = 800,
    title = "Interactive Map of Rent Price Percent Changes, from 2017 to 2021 in Los Angeles",
    title_y=.96,
    title_x=0.43,
) 

fig.show()

Remarks:
- We've successfully developed a better understanding of rent price shifts by neighborhood from the 2017 to 2021, with a particular focus on the black neighborhoods we isolated in the previous step.
- Some interesting findings is that of the top neighborhoods (that we previously sorted) do reflect a large percentage change in rent values, further indicating that these areas are undergoing/susceptible to gentrification. 
- Average rent prices from census level data is not a perfect measure for understanding how a renter might be priced out of their communities. Taking into account how new developments are pricing their units (or a similar metric) would provide a better understanding of gentrifying forces. For the time being, average rent price suffices.
________________________________________________________________________________________________________________________________________________________________________________________________________________


#### A3.I: Rent Price Shifts | Why does this matter?

To build off some previous assignments and to provide a foundation as to why rent price matters, particularly as it relates to the disproportionate effect on minority communities, rent burden across ethnicity groups is analyzed below. We will pull census data that captures the percentage of renter-households in a census-tract that pay more than 30% of their income toward rent (30% is the HUD defined threshold). We will categorize this by race and demonstrate the effects on non-white communities.

Pull in census data and add leading zeroes.

In [None]:
# import rent burden csv file
df_rent_burden = pd.read_csv('data/rent_burden_ct_1.csv', dtype = {'Geo_FIPS':str})
# fips column does not have a leading zero, let's add that here
df_rent_burden['Geo_FIPS'] = df_rent_burden['Geo_FIPS'].str.zfill(11)
# inspect rent burden data frame
df_rent_burden.head()

In order to determine the majority race of the population, we will create a function that evaluates the renter population by race against one another and then returns the race that is predominant. We will add a column which captures this return with each census tract categorized by race.

In [None]:
def categorise_race_tracts(row):  
    if row['black_alone'] > row['white_alone'] and row['black_alone'] > row['hispanic'] and row['black_alone'] > row['other'] and row['black_alone'] > row['asian_alone']:
        return 'Black'
    elif row['hispanic'] > row['white_alone'] and row['hispanic'] > row['black_alone'] and row['hispanic'] > row['other'] and row['hispanic'] > row['asian_alone']:
        return 'Hispanic'
    elif row['white_alone'] > row['hispanic'] and row['white_alone'] > row['black_alone'] and row['white_alone'] > row['other'] and row['white_alone'] > row['asian_alone']:
        return 'White'
    elif row['asian_alone'] > row['hispanic'] and row['asian_alone'] > row['black_alone'] and row['asian_alone'] > row['other'] and row['black_alone'] > row['white_alone']:
        return 'Asian'
    return 'Other'

# iterate on dataframe and create new column by applying created race function
df_rent_burden['majority_race'] = df_rent_burden.apply(lambda row: categorise_race_tracts(row), axis=1)
# assess results
df_rent_burden.value_counts('majority_race')

In [None]:
#check dataframe
df_rent_burden

We now have a dataframe that roughly captures race across census districts. Let's separate by the major race categories in Los Angeles County and evaluate the degree to which they suffer from rent burden. 

In [None]:
# separate dataframes by race
df_rent_burden_white = df_rent_burden[df_rent_burden.majority_race == 'White']
df_rent_burden_black = df_rent_burden[df_rent_burden.majority_race == 'Black']
df_rent_burden_hispanic = df_rent_burden[df_rent_burden.majority_race == 'Hispanic']

In [None]:
#lets take a look at the correlation between the race of renter-occupied households and the percentages of rent burden for the census tract

# set subplots
fig, axes = plt.subplots(2, 3, figsize = (20, 12))

# set visual style
sns.set(style="darkgrid")

# create histogram plots
histplot_white = sns.histplot(ax = axes[0,0], data=df_rent_burden_white, x='rent_burden', alpha = .5, bins =30, color = [158/255,73/255,99/255], kde = True)
histplot_black = sns.histplot(ax = axes[0,1], data=df_rent_burden_black, x='rent_burden', alpha = .5, bins =30, color = [216/255,139/255,118/255],kde = True)
histplot_hispanic = sns.histplot(ax = axes[0,2], data=df_rent_burden_hispanic, x='rent_burden', alpha = .5, bins =30, color = [80/255,45/255,93/255], kde = True)

# create regression plots
scatter_white = sns.regplot(ax = axes[1,0], data=df_rent_burden, x= 'rent_burden', y= 'white_alone', scatter_kws={"color": [158/255,73/255,99/255], "s": 4, "alpha" : .3 }, line_kws={'linewidth':1,'alpha':.6, "color": [158/255,73/255,99/255]})
scatter_black = sns.regplot(ax = axes[1,1], data=df_rent_burden, x= 'rent_burden', y= 'black_alone', scatter_kws={"color": [216/255,139/255,118/255], "s": 4, "alpha" : .3}, line_kws={'linewidth':1,'alpha':1, "color": [222/255,170/255,137/255]})
scatter_hispanic = sns.regplot(ax = axes[1,2], data=df_rent_burden, x= 'rent_burden', y= 'hispanic', scatter_kws={"color": [80/255,45/255,93/255], "s": 4, "alpha" : .3}, line_kws={'linewidth':1,'alpha':1, "color": [80/255,45/255,93/255]})

# change x-axis and y-axis labels
axes[1,0].set_xlabel('Rent Burden by White-Renter Households', fontsize = 14)
axes[1,1].set_xlabel('Rent Burden by Black-Renter Households', fontsize = 14)
axes[1,2].set_xlabel('Rent Burden by Hispanic-Renter Households', fontsize = 14)

axes[1,0].set_ylabel('Count')
axes[1,1].set_ylabel(' ')
axes[1,2].set_ylabel(' ')

axes[0,0].set(xlabel=' ', ylabel = 'Density')
axes[0,1].set(xlabel=' ', ylabel = ' ')
axes[0,2].set(xlabel=' ', ylabel = ' ')

#align x-axis and y-axis grid lines for histogram plots
histplot_white.xaxis.set_major_locator(LinearLocator(6))
histplot_black.xaxis.set_major_locator(LinearLocator(6)) 
histplot_hispanic.xaxis.set_major_locator(LinearLocator(6)) 
histplot_white.yaxis.set_major_locator(LinearLocator(6))
histplot_black.yaxis.set_major_locator(LinearLocator(6)) 
histplot_hispanic.yaxis.set_major_locator(LinearLocator(6))

#align x-axis and y-axis grid lines for scatter plots
scatter_white.xaxis.set_major_locator(LinearLocator(6))
scatter_black.xaxis.set_major_locator(LinearLocator(6)) 
scatter_hispanic.xaxis.set_major_locator(LinearLocator(6)) 
scatter_white.yaxis.set_major_locator(LinearLocator(6))
scatter_black.yaxis.set_major_locator(LinearLocator(6)) 
scatter_hispanic.yaxis.set_major_locator(LinearLocator(6))

# change x-axis and y-axis tick labels for histogram plots
histplot_white.set_xticklabels(labels = [' ', '20%', '40% ', '60% ', '80%', '100%'], fontsize = 10, y=.02)
histplot_black.set_xticklabels(labels = [' ', '20%', '40% ', '60% ', '80%', '100%'], fontsize = 10, y=.02)
histplot_hispanic.set_xticklabels(labels = [' ', '20%', '40% ', '60% ', '80%', '100%'], fontsize = 10, y=.02)
scatter_white.set_xticklabels(labels = [' ', '20%', '40% ', '60% ', '80%', '100%'], fontsize = 10, y=.02)
scatter_black.set_xticklabels(labels = [' ', '20%', '40% ', '60% ', '80%', '100%'], fontsize = 10, y=.02)
scatter_hispanic.set_xticklabels(labels = [' ', '20%', '40% ', '60% ', '80%', '100%'], fontsize = 10, y=.02)

histplot_white.set_yticklabels(labels = [' ', '20', '40', '60', '80', '100'], fontsize = 10, y=.02)
histplot_black.set_yticklabels(labels = [' ', '3', '6', '9', '12', '15'], fontsize = 10, y=.02)
histplot_hispanic.set_yticklabels(labels = [' ', '20', '40', '60', '80', '100'], fontsize = 10, y=.02)
scatter_white.set_yticklabels(labels = [' ', '700', '1400', '2100', '2800', '3500'], fontsize = 10, y=.02)
scatter_black.set_yticklabels(labels = [' ', '250', '500', '750', '1000', '1250'], fontsize = 10, y=.02)
scatter_hispanic.set_yticklabels(labels = [' ', '275', '550', '825', '1100', '1375'], fontsize = 10, y=.02)

# add mean vertical line and text for histogram plots
histplot_white.axvline(x = df_rent_burden_white.rent_burden.mean(), linewidth = 2, alpha = .8, ls = '--', lw = 1.5, color = [158/255,73/255,99/255])
histplot_white.text(x = df_rent_burden_white.rent_burden.mean() - .35,
        y = 97, 
        s = 'Mean: 50% ---------- ', 
        color = [158/255,73/255,99/255],
        weight = 'normal', 
        fontsize = 11)


histplot_black.axvline(x = df_rent_burden_black.rent_burden.mean(), linewidth = 2, alpha = .8, ls = '--', lw = 1.5, color = [216/255,139/255,118/255])
histplot_black.text(x = df_rent_burden_black.rent_burden.mean() - .31,
        y = 14.05, 
        s = 'Mean: 56% ---------- ', 
        color = [216/255,139/255,118/255],
        weight = 'normal', 
        fontsize = 11)

histplot_hispanic.axvline(x = df_rent_burden_hispanic.rent_burden.mean(), linewidth = 2, alpha = .8, ls = '--', lw = 1.5, color = [80/255,45/255,93/255])
histplot_hispanic.text(x = df_rent_burden_hispanic.rent_burden.mean() - .28,
        y = 107, 
        s = 'Mean: 56% ---------- ', 
        color = [80/255,45/255,93/255],
        weight = 'normal', 
        fontsize = 11)

# add correlation coefficent values
scatter_white.text(x = .27,
        y = 3700, 
        s = 'Correlation Coefficient: .03', 
        color = [158/255,73/255,99/255],
        weight = 'normal', 
        fontsize = 11)

scatter_black.text(x = .27,
        y = 1300, 
        s = 'Correlation Coefficient: .15', 
        color = [216/255,139/255,118/255],
        weight = 'normal', 
        fontsize = 11)

scatter_hispanic.text(x = .27,
        y = 1410, 
        s = 'Correlation Coefficient: .26', 
        color = [80/255,45/255,93/255],
        weight = 'normal', 
        fontsize = 11)

plt.savefig('data/rb.png')

We have now assessed rent burden by race of renter households and we can see that white renters have essentially no correlation with rent burden while black and hispanic have much higher proportions of rent burden. When we see shifts in rent prices, particular minority communities are more susceptible to displacement and will be affected to a greater degree than white communities. 

### Part B: Assessing Policing

The second component of our project is the correlation and interaction of gentrification and policing - a relationship that has not been visualized well. Before diving into the overlaying of gentrification and policing data, we will evaluate various metrics of policing data independently. This will help us develop an understanding of policing metrics and the communities that are forced to deal with its ramifications.

#### B1:LA Call Data - Tenant-Landlord Disputes

Let's first take a look at call data of tenant-landlord disputes to better understand which areas of LA seem to have higher frequencies of tension between renters and their landlords. The logic hear being that if there are communities undergoing gentrifying forces, there may be more cases of tension bewteen these parties.

In [None]:
# load in call data file for landlord-tenant disputes
calls = pd.read_csv('data/LAPD_Calls_for_Service_2021.csv')

In [None]:
# view information about the dataframe
calls.info()

We only need the area in which the call occurred and the total count of calls for each area. We will create a new dataframe and visualize.

In [None]:
# create new dataframe from total value counts for each area
calls_graph = calls['Area_Occ'].value_counts()
# reset index
calls_graph = calls_graph.reset_index()
# retain only useful columns
calls_graph.columns = ['District','Calls 2021']
# check dataframe
calls_graph

We are ready to plot and visualize the cleaned and organized data.

In [None]:
# let's plot our findings from above via a bar graph with neighborhoods on the x-axis,
# with the left bar indicating 2017 average rent prices and the right bar indicating 2021 average rent pricees
fig = go.Figure()

fig.add_trace(go.Bar(x=calls_graph['District'],
                y=calls_graph['Calls 2021'],
                name='District',
                marker_color='rgb(190,100,50)',
                     opacity = .8
                    ))

fig.update_layout(
    title='Calls of Tenant-Landlord Disputes by District',
    title_y = .8,
    xaxis_tickfont_size=12,
    yaxis=dict(
        title='# of calls',
        titlefont_size=14,
        tickfont_size=12,
    ))
fig.show()

We have charted and listed the areas with highest ocurrance of LAPD calls for service regarding Tenant/Landlord disputes.  Individual LAPD districts are listed on the X-axis while the number of calls are listed on the Y-Axis. From a policy perspective, this is extremely important as the police department's are not supposed to respond to Tenant Landlord complaints and oftentimes the police are used as a tool of displacement of existing residents. 

#### B2.I: Police Score

The main goal of this data is to capture the differences within the police score card data across various jurisdictions. This data has summarized use of force complaints, court rulings, shootings, and funding as a means of assigning scores to each city department.  However, the actual Police Scorecard Project does not present data in comparison between departments.  

In [None]:
# bring in police score data
score = pd.read_csv('data/Score_Cities.csv')
# sort score by values
score = score.sort_values('Score')

In [None]:
# understand data stats
score.describe().T

Let's plot the score of the various jurisdiction, with the score on the y-axis. The scores are ranked in ascending order, where a lower score indicates a "negative" score and a higher score represents a "positive" score.

In [None]:
# plot a scatter graph
fig = px.scatter(score, x="Jurisdiction", y="Score", color='Score')
# adjust graph
fig.update_layout(
    title='Police Score-card Ratings by LA neighborhood/city',
    title_y = .93,
    title_x = .08,
    xaxis_tickfont_size=12,
    width=1000,
    height=450)

fig.show()

Taking a glance at the scores for each neighborhood/city in LA, there does not seem to be a significant correlation between communities of color and the rating of the police department. This would be interesting to explore going further to better understand what determines the outcome of these scores and the variables we have been assessing thus far.

#### B2.II: Police Score

We will take a look at another set of policing score but from a spatial lens.This data set is slightly more nuanced than the score chart.  In this data set, the information is organized to show a score for jurisdictional areas on a map.  For example, all areas which are “Unincorporated LA County” will be patrolled by the L.A. County Sheriff’s Department.  Similarly, some cities, which are distinct entities (Lakewood, West Hollywood, Santa Clarita, and others) are patrolled by the LA County Sheriff’s department.  This data ensures the correct score, for the Police Department tasked with patrolling the city, matches the areas on the choropleth map. Due to the sheer size of the geographic and police data from both the Police Scorecard project and County Assessor, we had to create a separate csv file which listed just the names of the individual cities along with the score.  

In [None]:
# importing packages to process and visualize data
score_boundary = pd.read_csv('data/City Boundaries with Score - 4 - Sheet1.csv')

In [None]:
# import boundary file to conduct spatial visualization
boundaries = gpd.read_file('data/City_Boundaries.geojson')

In [None]:
# merge dataframes into one
score_merge = boundaries.merge(score_boundary, left_on="CITY_LABEL",right_on="city_name", how="outer") 

In [None]:
# check new dataframe
score_merge.info()

In [None]:
# only retain useful columns for our visualization
score_merge=score_merge[['CITY_NAME','police_score','geometry']]
# drop any missing values
score_merge=score_merge.dropna()

In [None]:
## NOTE! This kept crashing our notebook, so careful when running!
# create choropleth map based on police scores across LA cities
fig = px.choropleth_mapbox(score_merge, geojson=score_merge.geometry, locations=score_merge.index, color='police_score',
                           mapbox_style="carto-positron",
                           zoom=8, center = {"lat": 34.14218, "lon": -118.28411},
                           opacity=0.5)
# options on the layout
fig.update_layout(
    width = 1200,
    height = 800,
    title = "Interactive Map of Police Scores across LA cities",
    title_y=.96,
    title_x=0.43,
) 
fig.show()

### Part C: Assessing Policing and its Correlation with Gentrification 

Based on our inital attempt at a gentrification index, we sought out existing projects and research that has been done. We came across the Urban Displacement Project (UDP), a UCLA and UC Berkeley joint effort to typologize and define gentrification, cataloging its past and current occurences, as well as predicting its future sites. In this section of our exploration and analysis, we will use this research to inform a cursory attempt at integrating policing to the mapping of displacement. UDP used change in income, rent, and college educated demographics to assess gentrification.

#### CI: Read in Files

Here we will pull in data from an existing gentrification index created by the UDP team, beginning in 2016. This will allow us to test analyses and operations between displacement and policing data. We are aiming to create a similar typology through our own index, whose inputs we have started to explore in the previous section.

We will also pull in arrests data from the Neighborhood for Social Change center at USC, census tracts polygons, and LA city neighborhood polygons.

In [None]:
# read in genetrification typology from Urban Displacement Project
g_type = pd.read_csv('data/typology_LA.csv',
                     sep =";",
                     dtype = {'FIPS':str})
## arrests from Neighborhood Data for Social Change
arrests = pd.read_csv('data/All Years LASD LAPD Arrest Calculations.csv',
                      dtype = {'geoid20':str}
                     ) 

## census tracts from 2020
tracts = gpd.read_file('data/Census_Tracts_2020.geojson')

# import LA neighborhoods from ArcGIS
nbh = gpd.read_file("https://services5.arcgis.com/7nsPwEMP38bSkCjy/arcgis/rest/services/LA_Times_Neighborhoods/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")

We are aligning FIPS syntax across datasets for simple merging later. These are mostly correcting for leading zeros and data type issues.

In [None]:
# Gentrification typology
g_type['FIPS'] = g_type['FIPS'].str.zfill(11)

# Arrests
arrests.rename(columns={'geoid20': 'FIPS'}, inplace=True)
arrests['FIPS'] = arrests['FIPS'].str.zfill(11)

# merge in census spatial information
tracts=gpd.read_file('data/Census_Tracts_2020.geojson')
# subset for only relevant information
tracts = tracts[['CT20','geometry']]
# create new column using CT20 column that allows us to merge on FIPS
tracts['FIPS'] ='06' + '037' + tracts['CT20']

#### CII: Make typology map, and then merge

We wanted to recreate the map that UDP produced for LA. This sets the stage for how we take this information at the census level and later draw conclusions about neighborhoods at risk of gentrifying. 

First we merge the UDP typology by FIPS code with the census tracts. Then we select for the tracts that UDP deemed at risk of or experiencing gentrification.

In [None]:
## add typology to census tracts that were at risk 
ct_typo_g = tracts.merge(g_type, on='FIPS')
ct_typo_g = ct_typo_g[(ct_typo_g['G_Typology']=='Advanced Gentrification') | 
                      (ct_typo_g['G_Typology']=='At Risk of Gentrification') | 
                      (ct_typo_g['G_Typology']=='Early/Ongoing Gentrification')]
ct_typo_g = ct_typo_g[['FIPS','geometry']]

Choropleth Mapbox, the python toolkit enabling us to make the map, requires the polygons to be in geojson format so make that conversion and then map the typology.

In [None]:
## create geojson file of typology tracts and rewrite it as dictionary to use in plotly mapbox function
ct_typo_g.to_file('data/Gent_tracts.geojson', driver='GeoJSON')
with open('data/Gent_tracts.geojson', 'r') as fp:
    jdata = json.load(fp)
## recreate unique identifier for the plotly mapbox function
for k in range(len(jdata['features'])):
    jdata['features'][k]['id'] = k
    
# Approximate gentrification typology map
g_type_temp = g_type[(g_type['G_Typology']=='Advanced Gentrification') | 
                     (g_type['G_Typology']=='At Risk of Gentrification') | 
                     (g_type['G_Typology']=='Early/Ongoing Gentrification')]

g_type_temp['Typology of Gentrification'] = g_type_temp['G_Typology'] 
fig = px.choropleth_mapbox(g_type_temp, geojson=jdata, color="Typology of Gentrification",
                           locations=g_type_temp['FIPS'], featureidkey="properties.FIPS",
                           center={"lat": 34.0, "lon": -118.2},
                           mapbox_style="carto-positron", zoom=9,
                          opacity=.5)
fig.update_layout(
    width = 800,
    height = 600,
    title = "Gentrification Typology by Census Tract, extracted from Urban Displacement Project",
    title_y=.96,
    title_x=0.455,
) 

fig.show()

At a quick glance, we notice the spatial concentration of gentrification in the center of the city, away from the coast and not especially inland.

----------------
We redo the merge between census tracts and gentrification typology. Then merge that new geodataframe with the arrests. Finally, we calculate the centroid of census tracts to prepare for a spatial join with the neighborhoods. This will set us up to make a map of gentrifying neighborhoods based on the concentration of gentrifying census tracts it contains.

In [None]:
## add typology to census tracts that were at risk for merge with neighborhoods 
ct_typo = tracts.merge(g_type, on='FIPS')
## merge arrests and census tracts
ct_all = ct_typo.merge(arrests, on='FIPS')

## get centroid of arrests in census tracts to add to neighborhoods
ct_all = ct_all.to_crs(4326)
ct_all['Centroid']=ct_all.to_crs('+proj=cea').centroid.to_crs(ct_all.crs)
ct_all_1 = ct_all.to_crs(4326)
ct_all_1 = ct_all_1.drop(columns=['geometry','geoid'])
ct_all_1.rename(columns={'Centroid': 'geometry'}, inplace=True)
ct_all_1.head(10)

#### CIII: Spatial Join between the census tracts (containing gentrification typology and arrests) with the neighborhoods

We conduct a spatial join between the centroid of the census tracts and the neighborhoods. This applies the gentrification typology and arrests at the neighborhood level. We will summarize that information through row calculations. We will also create a varibale called "Displacement" that sums census tracts at risk of or experiencing gentrification within a neighborhood. This will make graphing and mapping easier.

In [None]:
## create binary variable for Displacement based on typology
nbh_race = nbh.sjoin(ct_all_1, how="left")
nbh_race.sort_values(by=['FIPS','year'])
nbh_race.head()
conditions = [(nbh_race['G_Typology']=='Advanced Gentrification'),
              (nbh_race['G_Typology']=='At Risk of Gentrification'),
              (nbh_race['G_Typology']=='Early/Ongoing Gentrification')]
values =[1, 1, 1]
nbh_race['Displacement'] =  np.select(conditions, values)

## collapse data by neighborhood
nbha = nbh_race.groupby(['year','name'],
                        as_index=False).agg(Geometry = ('geometry','first'),
                                            Displacement = ('Displacement','sum'),
                                            Total = ('all_arrest_count','sum'),
                                            Black = ('black_arrest_count','sum'),
                                            Latino = ('latino_arrest_count','sum'),
                                           )
## produce a map of gentrifying neighborhoods (just for 2019)                                                              
nbha_temp_all = nbha[nbha['year']==2019]

fig = px.choropleth_mapbox(nbha_temp_all, geojson=nbha_temp_all.Geometry, locations=nbha_temp_all.index, color='Displacement',
                           mapbox_style="carto-positron",
                           range_color=(0, 15),
                           zoom=9.15, 
                           center = {"lat": 34.02, "lon": -118.38411},
                           opacity=0.5,
                           hover_name = 'name')

# options on the layout
fig.update_layout(
    width = 1200,
    height = 800,
    title = "Interactive UDP Gentrifying Neighborhoods",
    title_y=.96,
    title_x=0.455,
) 

fig.show()
pyo.plot(fig,filename='data/nbh_g.html')

Similar to the previous map we see displacement is mostly happening in centrally located neighborhoods.

------------------
We sort the data and calculate the rate of black and latino arrests. 

In [None]:
nbha = nbha.sort_values(by=['name','year'])
nbha['year'] = nbha['year'].astype('int')
nbha['Black_perc'] = (nbha['Black'] / nbha['Total'])*100
nbha['Latino_perc'] = (nbha['Latino'] / nbha['Total'])*100
nbha = nbha[['year','name','Geometry','Displacement','Black_perc','Latino_perc']]
nbha.head()

#### CIV: Create graphs and maps of arrests in gentrifying communities

We begin to explore how policing and displacement overlap. First we segment the data to only reflect that of 2014 & 2019--this allows us to assess five year change. We also want to focus on neighborhoods with particularly high concentration of gentrifying census tracts.

In [None]:
nbha_temp= nbha[((nbha['year']==2019) | (nbha['year']==2014)) & (nbha['Displacement']>2)]
nbha_temp = nbha_temp.sort_values(by=['name','year'])
# Calculate the 2019 arrests by race as a percentage of 2014 arrests 
nbha_temp['delta_b'] = nbha_temp['Black_perc'].div(nbha_temp.groupby('name')['Black_perc'].shift())
nbha_temp['delta_b_perc'] = nbha_temp['delta_b']*100
nbha_temp['delta_l'] = nbha_temp['Latino_perc'].div(nbha_temp.groupby('name')['Latino_perc'].shift())
nbha_temp['delta_l_perc'] = nbha_temp['delta_l']*100
nbha_temp = nbha_temp[nbha_temp.year == 2019].reset_index(drop = True)
nbha_temp.info()

In [None]:
nbha_temp.head()

In [None]:
# let's plot our findings from above
fig = go.Figure()

# set visual style
sns.set(style="darkgrid")

fig.add_trace(go.Bar(x=nbha_temp['name'],
                y=nbha_temp['delta_b_perc'],
                name='Black arrests',
                marker_color='rgb(35,144,163)',
                     opacity = .8
                ))

fig.add_trace(go.Bar(x=nbha_temp['name'],
                y=nbha_temp['delta_l_perc'],
                name='Latino arrests',
                marker_color='rgb(255,148,59)',
                     opacity = .8
                ))

fig.add_hline(y=100, line_dash="dot")

fig.add_annotation(x=21, y=103,
            text="arrest change threshold",
                  showarrow = True)

fig.update_layout(
    title='Arrests by race over time in gentrifying LA neighborhoods',
    title_y = .9,
    xaxis_tickfont_size=12,
    yaxis=dict(
        title='Arrests in 2019 as a percentage of 2014 arrests',
        titlefont_size=14,
        tickfont_size=12,
    ),
    legend=dict(
        x=1,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15, # gap between bars of adjacent location coordinates.
    bargroupgap=0.1 ,# gap between bars of the same location coordinate.
    
    width=1500,
    height=750
)
fig.show()

Data above the threshold represents where there has been increased arrests of black or latino folks in the community over the 5 year period. Neighborhoods like Adams-Normandie, Chinatown, Harbor Gateway, Highland Park, Hollywood, Jefferson Park, and Palms stand out. A next step would be to conduct a ground-truthing exercise through a literature review and perhaps qualitative interviews to confirm a "gentrifying" process in the neighborhood. We will also want to examine how the magnitude of increase in arrests correlates with said stage of gentrification.

------------------------
Create an interactive and static map of our findings

In [None]:
# create interactive map
nbha_temp['Black Arrests in 2019 as percentage of 2014']=nbha_temp['delta_b_perc']
fig = px.choropleth_mapbox(nbha_temp, geojson=nbha_temp.Geometry, locations=nbha_temp.index, color='Black Arrests in 2019 as percentage of 2014',
                           mapbox_style="carto-positron",
                           range_color=(50, 200),
                           zoom=9.15, 
                           center = {"lat": 34.02, "lon": -118.38411},
                           opacity=0.75,
                           hover_name = 'name')
# options on the layout
fig.update_layout(
    width = 1200,
    height = 800,
    title = "Interactive Map of Shifts in Black Arrests in UDP Gentrifying Neighborhoods",
    title_y=.96,
    title_x=0.455,
) 
fig.show()
pyo.plot(fig)
fig.write_html("data/arrest_gentrified.html")

# create static map
nbha_temp_STATIC = nbha_temp.set_geometry(nbha_temp['Geometry'])
nbha_temp_STATIC.plot(figsize=(20,10),
                      column='Black Arrests in 2019 as percentage of 2014',
                      legend=True,
                      scheme='NaturalBreaks',
                      legend_kwds={'title': "Black Arrests in 2019 as percentage of 2014",
                                   }
                     )
                     

We see particular gentrifying neighborhoods, mostly on the east side of LA (Chinatown and Highland park), double in arrests of black folks over the period from 2014-2019. More centrally located neighborhoods roughly maintained arrest rates of black folks.

##### Concluding remarks
We believe we are on our way to creating an updated version of UDP's project from several years ago. It is unlikely that we will achieve a comprehensive index of spatially-correlated gentrification and low-level policing in LA in the remaining 5 weeks of the course. However, we will, at least try to hone on specific neighborhoods and specific inputs that underscore the relatinoship between state violence and displacement.

**Task Dispersement:**
- Joshua focused on the research questions and framework for the correlation between gentrification and policing. He mostly coded and visualized policing data as it intersects with Urban Displacement Project data, but also did some preliminary coding of the demographic shift data to inform our gentrification model. (Mostly Part C with some help in Part A)
- Miles focused on building a framework for defining gentrification and the collection, analysis and visualization of the data that used to build that framework. He also worked to combine the work done across members via this notebook. (Mostly Part A and notebook organization)
- John worked on LAPD Police Call data and Police Scorecard Data; obtaining, cleaning and visualizing the data related to policing. (Mostly Part B)