## Introduction

In this notebook, we use the borough dataset and we re-aggregage the average_eviction_count by neighborhood. We also clean the dataset to better suit a "join and relate" with SVI dataset in ArcGIS.



BBL Data explainations:
https://data.cityofnewyork.us/City-Government/Primary-Land-Use-Tax-Lot-Output-PLUTO-/64uk-42ks/about_data

A very detailed NYC building info data: https://s-media.nyc.gov/agencies/dcp/assets/files/pdf/data-tools/bytes/padgui.pdf

Some other info: https://www.nyc.gov/assets/finance/jump/hlpbldgcode.html


In [2]:
# !pip install geopandas folium matplotlib seaborn scipy
# !pip install esda
# !pip install splot
# # for google colab, had to reinstall some pacakges.

In [None]:
# !pip install geopandas folium matplotlib seaborn scipy esda splot

In [3]:
import pandas as pd
import geopandas as gpd
import numpy as np
import datetime as dt
import scipy

from sklearn.cluster import DBSCAN
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# visualization
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import seaborn as sns
import folium
from folium.plugins import HeatMap
from folium import Marker
from folium.plugins import MarkerCluster
import plotly.express as px
import plotly.io as pio

# spatial statistics
from esda.moran import Moran
from esda import Moran_Local
from esda.getisord import G_Local
from shapely.geometry import Point
from libpysal.weights import Queen, Rook

# system and utility
import warnings
import os
import io
from IPython.display import IFrame
from google.colab import files

from libpysal.weights import Queen, Rook
from esda.moran import Moran
import matplotlib.pyplot as plt
from splot.esda import moran_scatterplot

# suppress warnings
warnings.filterwarnings('ignore')

# inline
%matplotlib inline

# Part 1: Get the Evictions data

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
# data source:
# gdf already cleaned with lisa info
file_path = '/content/drive/My Drive/X999/evictions_cleaned_lisa.csv'

In [None]:
# evictions_cleaned_raw.to_csv(file_path, index=False)

In [6]:
evictions_cleaned_raw = pd.read_csv(file_path)

In [30]:
evictions_cleaned = evictions_cleaned_raw.copy()

In [31]:
evictions_cleaned.columns

Index(['court_index_number', 'docket_number', 'eviction_address',
       'eviction_apartment_number', 'executed_date', 'borough',
       'eviction_postcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'bbl', 'nta', 'geometry', 'eviction_count',
       'year', 'average_year_eviction_count', 'cluster', 'cluster_k',
       'same_cluster', 'lisa_cluster_rook', 'lisa_pvalue_rook',
       'lisa_cluster_queen', 'lisa_pvalue_queen'],
      dtype='object')

In [32]:
relevant_columns = [
    'borough', 'eviction_postcode','bin', 'bbl', 'eviction_count', 'year', 'nta',
    'average_year_eviction_count', 'geometry'
]

evictions_cleaned_filtered = evictions_cleaned[relevant_columns]
evictions_cleaned_filtered.columns

Index(['borough', 'eviction_postcode', 'bin', 'bbl', 'eviction_count', 'year',
       'nta', 'average_year_eviction_count', 'geometry'],
      dtype='object')

In [33]:
# to match the svi data set's column name
evictions_cleaned_filtered.rename(columns={"eviction_postcode": "FIPS"}, inplace=True)
evictions_cleaned_filtered.head()

Unnamed: 0,borough,FIPS,bin,bbl,eviction_count,year,nta,average_year_eviction_count,geometry
0,BROOKLYN,11220,3143881.0,3057940000.0,3,2024,Sunset Park East,3.0,POINT (-74.011883 40.635941)
1,BROOKLYN,11220,3143435.0,3057820000.0,3,2024,Sunset Park West,3.0,POINT (-74.017068 40.640008)
2,BRONX,10468,2015444.0,2032510000.0,4,2018,Van Cortlandt Village,4.0,POINT (-73.889569 40.87719)
3,BRONX,10455,2003900.0,2025770000.0,9,2019,Mott Haven-Port Morris,2.25,POINT (-73.90881 40.811197)
4,BRONX,10468,2013945.0,2031770000.0,8,2017,Bedford Park-Fordham North,2.666667,POINT (-73.896515 40.866075)


##  Step 2: Aggregate over FIPS (zipcodes)

fips are usually not supposed to be zipcodes, but they happen to be the same thing in this particular SVI dataset I am using in ArcGIS.

In [34]:
# across all years for each borough
average_evictions_all_years_z = evictions_cleaned_filtered.groupby('FIPS')['eviction_count'].mean().reset_index()
average_evictions_all_years_z.rename(columns={'eviction_count': 'average_eviction_count_all_years_zip'}, inplace=True)
average_evictions_all_years_z

Unnamed: 0,FIPS,average_eviction_count_all_years_zip
0,10000,12.000000
1,10001,4.606635
2,10002,6.373457
3,10003,1.625806
4,10004,3.111111
...,...,...
196,11691,18.455603
197,11692,63.342640
198,11693,4.842105
199,11694,6.375796


In [35]:
average_evictions_all_years_z.to_csv('zipcode_average_over_years.csv', index=False, encoding='utf-8')
# average_evictions

In [36]:
files.download('zipcode_average_over_years.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Step 3: Aggregate over boroughs (County name)

In [37]:
# rename to match svi dataset's column name
evictions_cleaned_filtered.rename(columns={'borough': 'County name'}, inplace=True)
evictions_cleaned_filtered.columns

Index(['County name', 'FIPS', 'bin', 'bbl', 'eviction_count', 'year', 'nta',
       'average_year_eviction_count', 'geometry'],
      dtype='object')

In [38]:
# across all years for each borough
average_evictions_all_years_b = evictions_cleaned_filtered.groupby('County name')['eviction_count'].mean().reset_index()
average_evictions_all_years_b.rename(columns={'eviction_count': 'average_eviction_count_all_years'}, inplace=True)
average_evictions_all_years_b


Unnamed: 0,County name,average_eviction_count_all_years
0,BRONX,13.813084
1,BROOKLYN,6.54954
2,MANHATTAN,6.651513
3,QUEENS,9.617091
4,STATEN ISLAND,8.326641


## Step 4: Aggregate over neighborhood tabulation areas (NTA)

(kept the borough info)

In [39]:
average_evictions_all_years_b.to_csv('borough_average_over_years.csv', index=False, encoding='utf-8')
# average_evictions

In [40]:
files.download('borough_average_over_years.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [41]:
len(evictions_cleaned.nta.unique())

190

In [43]:
# see if ntas associated with more than one borough
nta_borough_check = evictions_cleaned_filtered.groupby('nta')['County name'].nunique().reset_index()
nta_borough_check = nta_borough_check[nta_borough_check['County name'] > 1]
nta_borough_check
# this is why the lengths do not match

Unnamed: 0,nta,County name
28,Central Harlem North-Polo Grounds,2
41,Cypress Hills-City Line,2
80,Highbridge,2
101,Marble Hill-Inwood,2
118,North Riverdale-Fieldston-Riverdale,2
171,Washington Heights North,2


In [44]:
# aggregate eviction counts by NTA and borough
average_evictions_nta = evictions_cleaned_filtered.groupby(['nta', 'County name'])['eviction_count'].mean().reset_index()
average_evictions_nta

Unnamed: 0,nta,County name,eviction_count
0,Allerton-Pelham Gardens,BRONX,2.604651
1,Annadale-Huguenot-Prince's Bay-Eltingville,STATEN ISLAND,1.326531
2,Arden Heights,STATEN ISLAND,1.291667
3,Astoria,QUEENS,2.019499
4,Auburndale,QUEENS,1.553191
...,...,...,...
191,Woodlawn-Wakefield,BRONX,6.371775
192,Woodside,QUEENS,2.250000
193,Yorkville,MANHATTAN,3.393651
194,park-cemetery-etc-Bronx,BRONX,14.739130


In [47]:
# mannally resign the borough based on the majority of the nta is at
def handle_ambiguous_ntas(df, nta_col='nta', borough_col='County name'):
    nta_borough_mapping = {
        "Central Harlem North-Polo Grounds": "MANHATTAN",
        "Cypress Hills-City Line": "BROOKLYN",
        "Highbridge": "BRONX",
        "Marble Hill-Inwood": "MANHATTAN",
        "North Riverdale-Fieldston-Riverdale": "BRONX",
        "Washington Heights North": "MANHATTAN"
    }

    df[borough_col] = df[nta_col].map(nta_borough_mapping).fillna(df[borough_col])
    return df

evictions_cleaned_filtered = handle_ambiguous_ntas(evictions_cleaned_filtered)

In [48]:
# to match the svi data set's column name
average_evictions_nta.rename(columns={"borough": "County name"}, inplace=True)
average_evictions_nta

Unnamed: 0,nta,County name,eviction_count
0,Allerton-Pelham Gardens,BRONX,2.604651
1,Annadale-Huguenot-Prince's Bay-Eltingville,STATEN ISLAND,1.326531
2,Arden Heights,STATEN ISLAND,1.291667
3,Astoria,QUEENS,2.019499
4,Auburndale,QUEENS,1.553191
...,...,...,...
191,Woodlawn-Wakefield,BRONX,6.371775
192,Woodside,QUEENS,2.250000
193,Yorkville,MANHATTAN,3.393651
194,park-cemetery-etc-Bronx,BRONX,14.739130


### it's likely because some ntas are associated with multiple boroughs. This happens if the same neigborhood spans more than one borough. we can probably ignore this.

In [49]:
average_evictions_nta.to_csv('nta_average_over_years.csv', index=False, encoding='utf-8')
# average_evictions

In [50]:
files.download('nta_average_over_years.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [51]:
file_path2 = '/content/drive/My Drive/X999/average_eviction_per_nta.csv'
file_path3 = '/content/drive/My Drive/X999/average_eviction_per_borough.csv'
file_path4 = '/content/drive/My Drive/X999/average_eviction_per_zipcode.csv'

In [52]:
file_path2 = '/content/drive/My Drive/X999/average_eviction_per_nta.csv'
file_path3 = '/content/drive/My Drive/X999/average_eviction_per_borough.csv'
file_path4 = '/content/drive/My Drive/X999/average_eviction_per_zipcode.csv'
average_evictions_nta.to_csv(file_path2, index=False)
average_evictions_all_years_b.to_csv(file_path3, index=False)
average_evictions_all_years_z.to_csv(file_path4, index=False)