# **Introduction**

## **Chi-test (boroughs + svi)**
## **Bar-chart with svi as regression/scatterplot (boroughs first)**

source: https://www.atsdr.cdc.gov/place-health/media/pdfs/2024/10/SVI2022Documentation.pdf

source: https://www.atsdr.cdc.gov/place-health/php/svi/svi-interactive-map.html

In [2]:
# !pip install geopandas folium matplotlib seaborn scipy
# !pip install esda
# !pip install splot
# !pip install geopandas contextily
# # for google colab, had to reinstall some pacakges.

In [None]:
# !pip install geopandas folium matplotlib seaborn scipy esda splot

In [3]:
import pandas as pd
import geopandas as gpd
import numpy as np
import datetime as dt
import scipy

from sklearn.cluster import DBSCAN
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from shapely.geometry import Point
from sklearn.neighbors import NearestNeighbors

# visualization
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import seaborn as sns
import folium
from folium.plugins import HeatMap
from folium import Marker
from folium.plugins import MarkerCluster
import plotly.express as px
import plotly.io as pio
import contextily as ctx
from scipy.stats import f_oneway
from sklearn.decomposition import PCA
from scipy.stats import chi2_contingency
import statsmodels.api as sm

# spatial statistics
from esda.moran import Moran
from esda.getisord import G_Local
from libpysal.weights import Queen, Rook
from scipy.stats import chi2_contingency
from scipy.stats import fisher_exact
from scipy.stats import spearmanr
from scipy.stats import mannwhitneyu


# system and utility
import warnings
import os
import io
from IPython.display import IFrame
from google.colab import files

from libpysal.weights import Queen, Rook
from esda.moran import Moran
import matplotlib.pyplot as plt
from splot.esda import moran_scatterplot

# suppress warnings
warnings.filterwarnings('ignore')

# inline
%matplotlib inline

In [4]:
pd.set_option('display.float_format', lambda x: '%.4f' % x)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# **Step 1 Get the Eviction data**

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [108]:
# data source:
file_path1 = '/content/drive/My Drive/X999/bbl_evictions_311_svi_normal_times.csv'
file_path2 = '/content/drive/My Drive/X999/bbl_evictions_311_svi_covid.csv'

In [109]:
evictions_pre_post_raw = pd.read_csv(file_path1)
evictions_covid_raw = pd.read_csv(file_path2)
evictions_covid_raw.shape, evictions_pre_post_raw.shape
# 91 and 92 with normal time + one more analysis column regarding an svi item

((5386, 91), (66397, 92))

In [110]:
evictions_pre_post = evictions_pre_post_raw.copy()
evictions_covid = evictions_covid_raw.copy()

In [111]:
evictions_pre_post.head(2)

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,*308072/22_5865,3037420029,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.6721,-73.8911,5.0,37.0,1152.0,3083989,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0
1,*313639/23_5202,3057940012,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.6359,-74.0119,7.0,38.0,118.0,3143881,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0


In [112]:
evictions_covid.head(2)

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,004123/20_209969,2032140141,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.8654,-73.9013,7.0,14.0,265.0,2113173,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0
1,0050153/20_106030,4031560133,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.7242,-73.8556,6.0,29.0,71306.0,4074666,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0


In [113]:
evictions_pre_post.columns, \
evictions_covid.columns, \
evictions_pre_post.shape, \
evictions_covid.shape

(Index(['primary_key', 'bbl', 'court_index_number', 'docket_number',
        'eviction_address', 'eviction_apartment_number', 'executed_date',
        'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
        'latitude', 'longitude', 'community_board', 'council_district',
        'census_tract', 'bin', 'nta', 'year', 'month_year', 'geometry',
        'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
        'unitsres', 'ownername', 'bldgarea', 'building_type',
        'building_category', 'is_condo', 'floor_category', 'rent_era',
        'architectural_style', 'economic_period', 'residential_units_category',
        'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
        'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
        'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
        'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
        'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp

In [114]:
link = '/content/drive/My Drive/X999/svi_cleaned.csv'

In [115]:
svi_df = pd.read_csv(link)
svi_df.head(2)

Unnamed: 0,fips,location,area_sqmi,e_totpop,m_totpop,e_hu,m_hu,e_hh,m_hh,e_pov150,m_pov150,e_unemp,m_unemp,e_hburd,m_hburd,e_nohsdp,m_nohsdp,e_uninsur,m_uninsur,e_age65,m_age65,e_age17,m_age17,e_disabl,m_disabl,e_sngpnt,m_sngpnt,e_limeng,m_limeng,e_minrty,m_minrty,e_munit,m_munit,e_mobile,m_mobile,e_crowd,m_crowd,e_noveh,m_noveh,e_groupq,m_groupq,ep_pov150,mp_pov150,ep_unemp,mp_unemp,ep_hburd,mp_hburd,ep_nohsdp,mp_nohsdp,ep_uninsur,mp_uninsur,ep_age65,mp_age65,ep_age17,mp_age17,ep_disabl,mp_disabl,ep_sngpnt,mp_sngpnt,ep_limeng,mp_limeng,ep_minrty,mp_minrty,ep_munit,mp_munit,ep_mobile,mp_mobile,ep_crowd,mp_crowd,ep_noveh,mp_noveh,ep_groupq,mp_groupq,epl_pov150,epl_unemp,epl_hburd,epl_nohsdp,epl_uninsur,spl_theme1,rpl_theme1,epl_age65,epl_age17,epl_disabl,epl_sngpnt,epl_limeng,spl_theme2,rpl_theme2,epl_minrty,spl_theme3,rpl_theme3,epl_munit,epl_mobile,epl_crowd,epl_noveh,epl_groupq,spl_theme4,rpl_theme4,spl_themes,rpl_themes,f_pov150,f_unemp,f_hburd,f_nohsdp,f_uninsur,f_theme1,f_age65,f_age17,f_disabl,f_sngpnt,f_limeng,f_theme2,f_minrty,f_theme3,f_munit,f_mobile,f_crowd,f_noveh,f_groupq,f_theme4,f_total,e_daypop,e_noint,m_noint,e_afam,m_afam,e_hisp,m_hisp,e_asian,m_asian,e_aian,m_aian,e_nhpi,m_nhpi,e_twomore,m_twomore,e_otherrace,m_otherrace,ep_noint,mp_noint,ep_afam,mp_afam,ep_hisp,mp_hisp,ep_asian,mp_asian,ep_aian,mp_aian,ep_nhpi,mp_nhpi,ep_twomore,mp_twomore,ep_otherrace,mp_otherrace
0,10001,ZCTA5 10001,0.6238,27004,1827,16975,831,14375,782,5248,797,761,266,3314,531,1930,534,831,289,3428,432,2694,643,2310,499,501,215,1381,405,13460,2305,15840,898,15,23,389,135,12285,840,2213,218,20.3,2.7,4.3,1.5,23.1,3.5,9.1,2.4,3.1,1.0,12.7,1.6,10.0,2.1,8.6,1.9,3.5,1.5,5.3,1.5,49.8,7.8,93.3,2.7,0.1,0.1,2.7,0.9,85.5,2.8,8.2,0.6,0.6108,0.4574,0.5573,0.5902,0.4436,2.6593,0.5688,0.142,0.1161,0.1891,0.4707,0.8777,1.7956,0.1692,0.867,0.867,0.867,0.9853,0.271,0.7402,0.9949,0.9104,3.9018,0.9806,9.2237,0.7414,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,3,3,239407,1047,389,2220,576,5206,943,5031,774,0,25,0,25,780,326,223,169,7.3,2.6,8.2,2.2,19.3,3.0,18.6,2.9,0.0,0.1,0.0,0.1,2.9,1.2,0.8,0.6
1,10002,ZCTA5 10002,0.8223,76518,2894,39094,1241,36028,1326,27908,2853,2833,574,14688,1367,18301,1376,4074,766,17681,1287,10028,1549,9896,1062,2211,499,18393,1640,56964,3226,35725,1677,16,28,2461,449,29828,1403,2090,39,36.8,3.5,7.6,1.4,40.8,3.5,30.0,2.0,5.4,1.0,23.1,1.7,13.1,1.8,13.0,1.4,6.1,1.4,24.7,2.0,74.4,3.1,91.4,3.2,0.0,0.1,6.8,1.2,82.8,1.8,2.7,0.1,0.9148,0.7946,0.9219,0.9741,0.7207,4.3261,0.9639,0.7296,0.1831,0.5186,0.739,0.9944,3.1647,0.8781,0.9369,0.9369,0.9369,0.979,0.0,0.9105,0.9915,0.773,3.654,0.9254,12.0817,0.9656,1,0,1,1,0,3,0,0,0,0,1,1,1,1,1,0,1,1,0,3,8,64307,8590,1110,6141,1194,19864,2190,28477,1989,74,83,24,45,1810,486,574,394,23.8,2.9,8.0,1.5,26.0,2.5,37.2,2.2,0.1,0.1,0.0,0.1,2.4,0.6,0.8,0.5


In [116]:
svi_df.shape

(204, 153)

In [117]:
# list(svi_df.columns)

In [118]:
svi_df.ep_nhpi.unique()

array([ 0.00e+00,  1.00e-01,  3.00e-01,  2.00e-01, -9.99e+02,  8.00e-01,
        1.20e+00,  5.00e-01,  4.00e-01])

# **Step 2: SVI items**

q quick double check

In [119]:
link = "/content/drive/My Drive/X999/NewYork_ZCTA.csv"

In [120]:
svi_raw = pd.read_csv(link)
svi_raw.head(2)

Unnamed: 0,ST,STATE,ST_ABBR,FIPS,LOCATION,AREA_SQMI,E_TOTPOP,M_TOTPOP,E_HU,M_HU,E_HH,M_HH,E_POV150,M_POV150,E_UNEMP,M_UNEMP,E_HBURD,M_HBURD,E_NOHSDP,M_NOHSDP,E_UNINSUR,M_UNINSUR,E_AGE65,M_AGE65,E_AGE17,M_AGE17,E_DISABL,M_DISABL,E_SNGPNT,M_SNGPNT,E_LIMENG,M_LIMENG,E_MINRTY,M_MINRTY,E_MUNIT,M_MUNIT,E_MOBILE,M_MOBILE,E_CROWD,M_CROWD,E_NOVEH,M_NOVEH,E_GROUPQ,M_GROUPQ,EP_POV150,MP_POV150,EP_UNEMP,MP_UNEMP,EP_HBURD,MP_HBURD,EP_NOHSDP,MP_NOHSDP,EP_UNINSUR,MP_UNINSUR,EP_AGE65,MP_AGE65,EP_AGE17,MP_AGE17,EP_DISABL,MP_DISABL,EP_SNGPNT,MP_SNGPNT,EP_LIMENG,MP_LIMENG,EP_MINRTY,MP_MINRTY,EP_MUNIT,MP_MUNIT,EP_MOBILE,MP_MOBILE,EP_CROWD,MP_CROWD,EP_NOVEH,MP_NOVEH,EP_GROUPQ,MP_GROUPQ,EPL_POV150,EPL_UNEMP,EPL_HBURD,EPL_NOHSDP,EPL_UNINSUR,SPL_THEME1,RPL_THEME1,EPL_AGE65,EPL_AGE17,EPL_DISABL,EPL_SNGPNT,EPL_LIMENG,SPL_THEME2,RPL_THEME2,EPL_MINRTY,SPL_THEME3,RPL_THEME3,EPL_MUNIT,EPL_MOBILE,EPL_CROWD,EPL_NOVEH,EPL_GROUPQ,SPL_THEME4,RPL_THEME4,SPL_THEMES,RPL_THEMES,F_POV150,F_UNEMP,F_HBURD,F_NOHSDP,F_UNINSUR,F_THEME1,F_AGE65,F_AGE17,F_DISABL,F_SNGPNT,F_LIMENG,F_THEME2,F_MINRTY,F_THEME3,F_MUNIT,F_MOBILE,F_CROWD,F_NOVEH,F_GROUPQ,F_THEME4,F_TOTAL,E_DAYPOP,E_NOINT,M_NOINT,E_AFAM,M_AFAM,E_HISP,M_HISP,E_ASIAN,M_ASIAN,E_AIAN,M_AIAN,E_NHPI,M_NHPI,E_TWOMORE,M_TWOMORE,E_OTHERRACE,M_OTHERRACE,EP_NOINT,MP_NOINT,EP_AFAM,MP_AFAM,EP_HISP,MP_HISP,EP_ASIAN,MP_ASIAN,EP_AIAN,MP_AIAN,EP_NHPI,MP_NHPI,EP_TWOMORE,MP_TWOMORE,EP_OTHERRACE,MP_OTHERRACE
0,36,New York,NY,6390,ZCTA5 06390,4.0467,53,39,253,49,19,19,17,16,0,13,9,26,0,13,27,34,0,13,6,11,31,33,0,18,9,53,20,51,0,18,4,5,0,18,0,13,17,16,32.1,18.8,0.0,52.7,47.4,100.0,0.0,51.4,50.9,45.7,0.0,45.2,11.3,19.0,58.5,41.4,0.0,94.7,17.0,99.2,37.7,92.1,0.0,7.1,1.6,2.0,0.0,94.7,0.0,75.5,32.1,18.8,0.879,0.0,0.9635,0.0,0.996,2.8385,0.6342,0.0,0.1408,0.9944,0.0,0.9775,2.1127,0.3009,0.8062,0.8062,0.8062,0.0,0.4654,0.0,0.0,0.9735,1.4389,0.2205,7.1963,0.4192,0,0,1,0,1,2,0,0,1,0,1,2,0,0,0,0,0,0,1,1,5,601,9,14,0,13,9,19,0,13,0,13,8,16,3,7,0,13,47.4,51.8,0.0,45.2,17.0,35.0,0.0,45.2,0.0,45.2,15.1,32.1,5.7,12.1,0.0,45.2
1,36,New York,NY,10001,ZCTA5 10001,0.6238,27004,1827,16975,831,14375,782,5248,797,761,266,3314,531,1930,534,831,289,3428,432,2694,643,2310,499,501,215,1381,405,13460,2305,15840,898,15,23,389,135,12285,840,2213,218,20.3,2.7,4.3,1.5,23.1,3.5,9.1,2.4,3.1,1.0,12.7,1.6,10.0,2.1,8.6,1.9,3.5,1.5,5.3,1.5,49.8,7.8,93.3,2.7,0.1,0.1,2.7,0.9,85.5,2.8,8.2,0.6,0.6108,0.4574,0.5573,0.5902,0.4436,2.6593,0.5688,0.142,0.1161,0.1891,0.4707,0.8777,1.7956,0.1692,0.867,0.867,0.867,0.9853,0.271,0.7402,0.9949,0.9104,3.9018,0.9806,9.2237,0.7414,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,3,3,239407,1047,389,2220,576,5206,943,5031,774,0,25,0,25,780,326,223,169,7.3,2.6,8.2,2.2,19.3,3.0,18.6,2.9,0.0,0.1,0.0,0.1,2.9,1.2,0.8,0.6


In [121]:
def is_nyc_zipcode(zipcode):
    zip_int = int(zipcode) if isinstance(zipcode, str) else zipcode

    # Manhattan: 10001-10282
    if 10001 <= zip_int <= 10282:
        return True
    # addition Manhattan: 10300-10499
    if 10300 <= zip_int <= 10499:
        return True
    # Bronx: 10451-10475
    if 10451 <= zip_int <= 10475:
        return True
    # Brooklyn: 11201-11256
    if 11201 <= zip_int <= 11256:
        return True
    # Queens: 11351-11436, 11101-11109
    if (11351 <= zip_int <= 11436) or (11101 <= zip_int <= 11109):
        return True
    # Staten Island: 10301-10314
    if 10301 <= zip_int <= 10314:
        return True
    # additional Queens ZIPs
    if zip_int in [11004, 11005, 11411, 11412, 11413, 11418, 11419, 11420, 11421, 11422, 11423, 11426, 11427, 11428, 11429]:
        return True
    return False

In [122]:
nyc_df = svi_raw[svi_raw['FIPS'].apply(is_nyc_zipcode)]

In [123]:
nyc_df.shape

(204, 156)

In [124]:
nyc_df.EP_NHPI.unique()
# interesting, so there was an error in the source data

array([ 0.00e+00,  1.00e-01,  3.00e-01,  2.00e-01, -9.99e+02,  8.00e-01,
        1.20e+00,  5.00e-01,  4.00e-01])

In [125]:
svi_raw.EP_NHPI.unique()

array([ 1.51e+01,  0.00e+00,  1.00e-01,  3.00e-01,  2.00e-01, -9.99e+02,
        8.00e-01,  1.20e+00,  5.00e-01,  4.00e-01,  1.10e+00,  7.00e-01,
        1.50e+00,  1.80e+00,  9.00e-01,  6.00e-01,  1.40e+00,  2.10e+00,
        2.30e+00,  1.00e+00])

In [126]:
-9.99e+02, 0.00e+00, 9.00e-01, 2.00e-01

(-999.0, 0.0, 0.9, 0.2)

# **Step 3:  Boroughs and their eviction rates**

In [127]:
evictions_pre_post_mean = evictions_pre_post[['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']].mean()
evictions_pre_post_mean
# may need to merge ep_twomore and ep_otherrance toegther

Unnamed: 0,0
ep_afam,29.2346
ep_asian,8.904
ep_hisp,38.7307
ep_nhpi,0.0115
ep_white,19.0236
ep_twomore,2.8025
ep_otherrace,1.0239


In [128]:
evictions_pre_post_mean = evictions_pre_post_mean.reset_index()

In [129]:
evictions_pre_post_mean.rename(columns = {'index':'race_svi', 0: "racial percentage"}, inplace=True)

In [130]:
evictions_pre_post_mean

Unnamed: 0,race_svi,racial percentage
0,ep_afam,29.2346
1,ep_asian,8.904
2,ep_hisp,38.7307
3,ep_nhpi,0.0115
4,ep_white,19.0236
5,ep_twomore,2.8025
6,ep_otherrace,1.0239


In [131]:
type(evictions_pre_post_mean)
# so this is correct

## **Step 3.1 Seperate Boroughs**

In [132]:
Manhattan = evictions_pre_post[evictions_pre_post['borough'] == 'MANHATTAN']
Brooklyn = evictions_pre_post[evictions_pre_post['borough'] == 'BROOKLYN']
Queens = evictions_pre_post[evictions_pre_post['borough'] == 'QUEENS']
Staten_Island = evictions_pre_post[evictions_pre_post['borough'] == 'STATEN ISLAND']
Bronx = evictions_pre_post[evictions_pre_post['borough'] == 'BRONX']
# where was the error message. They just pass through

In [133]:
Manhattan.shape,  Brooklyn.shape, Queens.shape, Staten_Island.shape, Bronx.shape

((10898, 92), (19090, 92), (9831, 92), (1940, 92), (24638, 92))

In [134]:
evictions_pre_post.borough.unique()

array(['BROOKLYN', 'BRONX', 'STATEN ISLAND', 'MANHATTAN', 'QUEENS'],
      dtype=object)

## **Step 3.2 We also need a datframe that has boroughs as columns and average_year_eviction_count as rows and contents**

In [135]:
average_evictions_man = Manhattan[['average_year_eviction_count']].mean()

In [136]:
type(average_evictions_man)
# so this is still a series

## **Step 3.3 We need a dataframe that has boroughs as coloumns and races as rows and racial composite (percentage) as cells**

### **Step 3.3.1Manhattan**

In [137]:
manhattan_racial = Manhattan[['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']].mean()

In [138]:
manhattan_racial = manhattan_racial.to_frame()

In [139]:
manhattan_racial.rename(columns = {0:"racial_percentage"}, inplace = True)

In [140]:
manhattan_racial

Unnamed: 0,racial_percentage
ep_afam,19.7275
ep_asian,8.9104
ep_hisp,35.2374
ep_nhpi,0.0183
ep_white,32.2336
ep_twomore,3.0573
ep_otherrace,0.7306


In [141]:
man_total = manhattan_racial.racial_percentage.sum()
man_total, 100 - man_total
# ignore

(np.float64(99.91522297669299), np.float64(0.0847770233070122))

In [142]:
manhattan_racial.reset_index(inplace=True),
manhattan_racial

Unnamed: 0,index,racial_percentage
0,ep_afam,19.7275
1,ep_asian,8.9104
2,ep_hisp,35.2374
3,ep_nhpi,0.0183
4,ep_white,32.2336
5,ep_twomore,3.0573
6,ep_otherrace,0.7306


In [143]:
manhattan_racial.rename(columns={'index':'race_svi'}, inplace=True)

In [144]:
manhattan_racial
# so this is for one column/bar in the chart

Unnamed: 0,race_svi,racial_percentage
0,ep_afam,19.7275
1,ep_asian,8.9104
2,ep_hisp,35.2374
3,ep_nhpi,0.0183
4,ep_white,32.2336
5,ep_twomore,3.0573
6,ep_otherrace,0.7306


In [145]:
type(manhattan_racial)
# correct

### **Step 3.3.2 Brooklyn**

In [146]:
Brooklyn.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,*308072/22_5865,3037420029,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.6721,-73.8911,5.0,37.0,1152.0,3083989,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0
1,*313639/23_5202,3057940012,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.6359,-74.0119,7.0,38.0,118.0,3143881,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0
2,*324973/22_5308,3057820030,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.64,-74.0171,7.0,38.0,122.0,3143435,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6,1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,10.0
8,000344/24_64776,3040560040,000344/24,64776,322 MILFORD STREET,2F,2024-12-05,BROOKLYN,11208,Not an Ejectment,Possession,40.6714,-73.8763,5.0,42.0,1194.0,3089943,East New York,2024,2024-12,POINT (-73.876309 40.671365),0.4,1925.0,C2,2.0,5.0,"WANG, LI",3800.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1920-1929,11208,108180.0,0.9472,0.8285,0.9904,0.9425,0.9581,32.0,6.4,17.7,5.6,11.7,26.5,8.1,7.8,56.4,19.5,42.4,45.8,39.0,8.1,0.0,0.2,2.8,1.6,97.5,2.5,False,Q2,medium-low,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,7.0
9,00047/24_448,3087110015,00047/24,448,2912 BRIGHTON 12TH STREET APT 1G,1G,2024-12-06,BROOKLYN,11235,Not an Ejectment,Possession,40.5815,-73.9555,13.0,48.0,61004.0,3245453,Brighton Beach,2024,2024-12,POINT (-73.95546 40.581515),1.2,1951.0,D4,6.0,156.0,BAYSHORE GARDENS OWNERS CORP.,112389.0,post-war,condo-co-op,True,mid-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1950-1959,11235,83069.0,0.9094,0.9179,0.7623,0.9732,0.9524,26.8,5.7,10.4,8.1,25.3,17.1,15.0,25.6,47.6,15.4,41.4,2.4,9.1,14.8,0.0,0.0,5.2,0.5,32.1,67.9,False,Q2,medium-low,0.0,2.0,3.0,0.0,8.0,8.0,14.0,1.0,4.0,0.0,56.0,0.0,65.0,0.0,2.0,28.0,0.0,0.0,7.0,17.0,7.0,222.0


In [147]:
# brooklyn_racial = Brooklyn[['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']].mean()
# brooklyn_racial

In [148]:
brooklyn_racial = Brooklyn[['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']].mean()
brooklyn_racial = brooklyn_racial.to_frame()
brooklyn_racial.rename(columns = {0:"racial_percentage"}, inplace = True)
brooklyn_racial.reset_index(inplace=True)
brooklyn_racial.rename(columns={'index':'race_svi'}, inplace=True)
brooklyn_racial

Unnamed: 0,race_svi,racial_percentage
0,ep_afam,42.8839
1,ep_asian,7.5561
2,ep_hisp,19.2618
3,ep_nhpi,0.0282
4,ep_white,25.3865
5,ep_twomore,3.9287
6,ep_otherrace,0.8418


### **Step 3.3.3 Queens**

In [149]:
queens_racial = Queens[['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']].mean()
queens_racial = queens_racial.to_frame()
queens_racial.rename(columns = {0:"racial_percentage"}, inplace = True)
queens_racial.reset_index(inplace=True)
queens_racial.rename(columns={'index':'race_svi'}, inplace=True)
queens_racial

Unnamed: 0,race_svi,racial_percentage
0,ep_afam,18.8393
1,ep_asian,25.5302
2,ep_hisp,29.7163
3,ep_nhpi,-0.053
4,ep_white,19.6677
5,ep_twomore,3.1967
6,ep_otherrace,2.2679


In [150]:
queens_racial.loc[queens_racial.race_svi == 'ep_nhpi']

Unnamed: 0,race_svi,racial_percentage
3,ep_nhpi,-0.053


In [151]:
total = queens_racial.racial_percentage.sum()
total
# ep_nhpi comes from a data error from the source code. Will ignore for now. Will merge nhpi with twomore, other race

np.float64(99.16506967755062)

### **Step 3.3.4 Bronx**

In [152]:
bronx_racial = Bronx[['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']].mean()
bronx_racial = bronx_racial.to_frame()
bronx_racial.rename(columns = {0:"racial_percentage"}, inplace = True)
bronx_racial.reset_index(inplace=True)
bronx_racial.rename(columns={'index':'race_svi'}, inplace=True)
bronx_racial

Unnamed: 0,race_svi,racial_percentage
0,ep_afam,27.9152
1,ep_asian,3.1764
2,ep_hisp,60.0719
3,ep_nhpi,0.0221
4,ep_white,6.0374
5,ep_twomore,1.6778
6,ep_otherrace,0.8484


In [153]:
total = bronx_racial.racial_percentage.sum()
total
# not surprising, not all household would report their racial composites or they got missing along the way

np.float64(99.74933436155533)

### **Step 3.3.5 Staten Island**

In [154]:
si_racial = Staten_Island[['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']].mean()
si_racial = si_racial.to_frame()
si_racial.rename(columns = {0:"racial_percentage"}, inplace = True)
si_racial.reset_index(inplace=True)
si_racial.rename(columns={'index':'race_svi'}, inplace=True)
si_racial

Unnamed: 0,race_svi,racial_percentage
0,ep_afam,17.7638
1,ep_asian,10.6179
2,ep_hisp,24.5774
3,ep_nhpi,0.001
4,ep_white,43.8637
5,ep_twomore,2.5744
6,ep_otherrace,0.3879


In [155]:
total = si_racial.racial_percentage.sum()
total
# not surprising, not all household would report their racial composites or they got missing along the way

np.float64(99.78613402061855)

In [156]:
brooklyn_racial

Unnamed: 0,race_svi,racial_percentage
0,ep_afam,42.8839
1,ep_asian,7.5561
2,ep_hisp,19.2618
3,ep_nhpi,0.0282
4,ep_white,25.3865
5,ep_twomore,3.9287
6,ep_otherrace,0.8418


In [157]:
man_bk_racial = manhattan_racial.merge(
    brooklyn_racial,
    on='race_svi',
    suffixes=('_man', '_bk')
)
# correct
man_bk_racial

Unnamed: 0,race_svi,racial_percentage_man,racial_percentage_bk
0,ep_afam,19.7275,42.8839
1,ep_asian,8.9104,7.5561
2,ep_hisp,35.2374,19.2618
3,ep_nhpi,0.0183,0.0282
4,ep_white,32.2336,25.3865
5,ep_twomore,3.0573,3.9287
6,ep_otherrace,0.7306,0.8418


In [158]:
man_bk_q_racial = man_bk_racial.merge(
    queens_racial,
    on='race_svi',
    # unnecessary
    how = 'outer',
    # only applies the suffixes when there were conflicts
    suffixes=('_man_bk', '_queens')
)
man_bk_q_racial
# the numbers are correct

Unnamed: 0,race_svi,racial_percentage_man,racial_percentage_bk,racial_percentage
0,ep_afam,19.7275,42.8839,18.8393
1,ep_asian,8.9104,7.5561,25.5302
2,ep_hisp,35.2374,19.2618,29.7163
3,ep_nhpi,0.0183,0.0282,-0.053
4,ep_otherrace,0.7306,0.8418,2.2679
5,ep_twomore,3.0573,3.9287,3.1967
6,ep_white,32.2336,25.3865,19.6677


In [159]:
man_bk_q_br_racial = man_bk_q_racial.merge(
    bronx_racial,
    on='race_svi',
    # unnecessary
    # how = 'outer',
    # only applies the suffixes when there were conflicts
    # add the suffixes sequentially, it works
    suffixes=('_q', '_br')
)
man_bk_q_br_racial
# the numbers are correct now

Unnamed: 0,race_svi,racial_percentage_man,racial_percentage_bk,racial_percentage_q,racial_percentage_br
0,ep_afam,19.7275,42.8839,18.8393,27.9152
1,ep_asian,8.9104,7.5561,25.5302,3.1764
2,ep_hisp,35.2374,19.2618,29.7163,60.0719
3,ep_nhpi,0.0183,0.0282,-0.053,0.0221
4,ep_otherrace,0.7306,0.8418,2.2679,0.8484
5,ep_twomore,3.0573,3.9287,3.1967,1.6778
6,ep_white,32.2336,25.3865,19.6677,6.0374


In [160]:
all_racial = man_bk_q_br_racial.merge(
    si_racial,
    on='race_svi',
    # unnecessary
    # how = 'outer',
    # only applies the suffixes when there were conflicts
    # add the suffixes sequentially, it works
    # suffixes=('_q', '_br')
)
all_racial

Unnamed: 0,race_svi,racial_percentage_man,racial_percentage_bk,racial_percentage_q,racial_percentage_br,racial_percentage
0,ep_afam,19.7275,42.8839,18.8393,27.9152,17.7638
1,ep_asian,8.9104,7.5561,25.5302,3.1764,10.6179
2,ep_hisp,35.2374,19.2618,29.7163,60.0719,24.5774
3,ep_nhpi,0.0183,0.0282,-0.053,0.0221,0.001
4,ep_otherrace,0.7306,0.8418,2.2679,0.8484,0.3879
5,ep_twomore,3.0573,3.9287,3.1967,1.6778,2.5744
6,ep_white,32.2336,25.3865,19.6677,6.0374,43.8637


## **Item 1 for excel use**

In [161]:
all_racial.rename(columns= {'racial_percentage': "racial_percentage_si"}, inplace=True)
all_racial
# this is what we need for the bar chart.

Unnamed: 0,race_svi,racial_percentage_man,racial_percentage_bk,racial_percentage_q,racial_percentage_br,racial_percentage_si
0,ep_afam,19.7275,42.8839,18.8393,27.9152,17.7638
1,ep_asian,8.9104,7.5561,25.5302,3.1764,10.6179
2,ep_hisp,35.2374,19.2618,29.7163,60.0719,24.5774
3,ep_nhpi,0.0183,0.0282,-0.053,0.0221,0.001
4,ep_otherrace,0.7306,0.8418,2.2679,0.8484,0.3879
5,ep_twomore,3.0573,3.9287,3.1967,1.6778,2.5744
6,ep_white,32.2336,25.3865,19.6677,6.0374,43.8637


# **Step 4 We also need a dataframe that has boroughs as columns and average_year_eviction_count as rows and contents**

## **Step 4.1 First, we need to find a baseline, the average of eviction rates across all boroughs. This would help with Chi-test**

In [210]:
average_evictions = evictions_pre_post[['average_year_eviction_count']].mean()
average_evictions # per building per year

Unnamed: 0,0
average_year_eviction_count,1.1676


## **Step 4.2 get the boroughs' series and make a dataframe**

In [163]:
average_evictions_man = Manhattan[['average_year_eviction_count']].mean()
average_evictions_bk = Brooklyn[['average_year_eviction_count']].mean()
average_evictions_br = Bronx[['average_year_eviction_count']].mean()
average_evictions_si = Staten_Island[['average_year_eviction_count']].mean()
average_evictions_q = Queens[['average_year_eviction_count']].mean()

In [164]:
# type(average_evictions_man)
# so this is still a series

In [165]:
boro_evictions_df = pd.DataFrame({
    # 'borough': ['Manhattan', 'Brooklyn', 'Queens', 'Staten Island', 'Bronx'],
    'manhattan': average_evictions_man,
    'brooklyn': average_evictions_bk,
    'queens': average_evictions_q,
    'staten island': average_evictions_si,
    'bronx': average_evictions_br
})
boro_evictions_df
# good

Unnamed: 0,manhattan,brooklyn,queens,staten island,bronx
average_year_eviction_count,0.87,0.9894,0.9441,0.9223,1.5458


## **Step 4.3 An Item for excel use, chi-test**

In [166]:
average_evictions = evictions_pre_post[['average_year_eviction_count']].mean()
average_evictions

Unnamed: 0,0
average_year_eviction_count,1.1676


In [167]:
average_evictions_man, average_evictions_bk, average_evictions_br, average_evictions_si, average_evictions_q, \
average_evictions_man > average_evictions, average_evictions_bk > average_evictions, average_evictions_br > average_evictions, \
average_evictions_si > average_evictions, average_evictions_q > average_evictions

(average_year_eviction_count   0.8700
 dtype: float64,
 average_year_eviction_count   0.9894
 dtype: float64,
 average_year_eviction_count   1.5458
 dtype: float64,
 average_year_eviction_count   0.9223
 dtype: float64,
 average_year_eviction_count   0.9441
 dtype: float64,
 average_year_eviction_count    False
 dtype: bool,
 average_year_eviction_count    False
 dtype: bool,
 average_year_eviction_count    True
 dtype: bool,
 average_year_eviction_count    False
 dtype: bool,
 average_year_eviction_count    False
 dtype: bool)

### **In short summary, the only borough that has average eviction rate highger than the one across all five boros was Bronx**

In [168]:
# make a df to make it neat:
# will re-input the data in the simple way so that the table looks better
data = {
    'Borough': ['Manhattan', 'Brooklyn', 'Bronx', 'Queens', 'Staten Island'],
    # per building per year data
    'eviction_average': [average_evictions_man, average_evictions_bk, average_evictions_br, average_evictions_q, average_evictions_si],
    'Above_average': [average_evictions_man > average_evictions,
                     average_evictions_bk > average_evictions,
                     average_evictions_br > average_evictions,
                     average_evictions_q > average_evictions,
                     average_evictions_si > average_evictions]
}
boro_evictions_compare_df = pd.DataFrame(data)
boro_evictions_compare_df

Unnamed: 0,Borough,eviction_average,Above_average
0,Manhattan,average_year_eviction_count 0.8700 dtype: fl...,average_year_eviction_count False dtype: bool
1,Brooklyn,average_year_eviction_count 0.9894 dtype: fl...,average_year_eviction_count False dtype: bool
2,Bronx,average_year_eviction_count 1.5458 dtype: fl...,average_year_eviction_count True dtype: bool
3,Queens,average_year_eviction_count 0.9441 dtype: fl...,average_year_eviction_count False dtype: bool
4,Staten Island,average_year_eviction_count 0.9223 dtype: fl...,average_year_eviction_count False dtype: bool


In [169]:
data = {
    'Borough': ['Manhattan', 'Brooklyn', 'Bronx', 'Queens', 'Staten Island'],
    'eviction_average': [0.6, 0.4, 1.0, 0.4, 0.4],
    'Above_average': [False, False, True, False, False]
}
boro_evictions_compare_df = pd.DataFrame(data)
boro_evictions_compare_df

Unnamed: 0,Borough,eviction_average,Above_average
0,Manhattan,0.6,False
1,Brooklyn,0.4,False
2,Bronx,1.0,True
3,Queens,0.4,False
4,Staten Island,0.4,False


# **Step 5: We also need a dataframe that has boroughs as columns and general svi (the most important svi) as rows and contents**

## **Step 5.1 A baseline (derived but used the official data from the CDC website**

In [170]:
average_svi_eviction = evictions_pre_post[['rpl_themes']].mean()
average_svi_eviction
# really bad
# a bit too high, need to double check
# oh, actually, this is based on eviction rates. Because Bronx has overwhelmingly high eviction rates, so the weight is higher.

Unnamed: 0,0
rpl_themes,0.9044


In [171]:
svi_df.fips.nunique()

204

In [172]:
# svi_df.rpl_themes.unique(), 3.997e-01, -9.990e+02
bad_row = svi_df.loc[svi_df.rpl_themes == -9.990e+02]
bad_row.shape

(28, 153)

In [173]:
manhattan_svi_df = svi_df[svi_df.fips.isin(range(10001, 10283))]
brooklyn_svi_df = svi_df[svi_df.fips.isin(range(11201, 11257))]
queens_svi_df = svi_df[svi_df.fips.isin(range(11351, 11437)) | svi_df.fips.isin(range(11101, 11110)) | svi_df.fips.isin([11004, 11005, 11411, 11412, 11413, 11418, 11419, 11420, 11421, 11422, 11423, 11426, 11427, 11428, 11429])]
staten_island_svi_df = svi_df[svi_df.fips.isin(range(10301, 10315))]
bronx_svi_df = svi_df[svi_df.fips.isin(range(10451, 10476))]

In [174]:
manhattan_svi_average = manhattan_svi_df[['rpl_themes']].mean()
manhattan_svi_average
# some bad one is in manhattan
manhattan_svi_average = 0.7283
# hard coded this one from https://www.atsdr.cdc.gov/place-health/php/svi/svi-interactive-map.html

In [175]:
brooklyn_svi_average = brooklyn_svi_df[['rpl_themes']].mean()
brooklyn_svi_average
# this is roughly correct

Unnamed: 0,0
rpl_themes,0.8941


In [176]:
bronx_svi_average = bronx_svi_df[['rpl_themes']].mean()
bronx_svi_average

Unnamed: 0,0
rpl_themes,0.9676


In [177]:
queens_svi_average = queens_svi_df[['rpl_themes']].mean()
queens_svi_average
queens_svi_average = 0.8024
# queens also has some bad ones

In [178]:
staten_island_svi_average = staten_island_svi_df[['rpl_themes']].mean()
staten_island_svi_average
# bad ones in there too
staten_island_svi_average = 0.5956

In [179]:
(0.9962 + 0.8024 + 0.7283 + 0.8874 + 0.5956)/5
# take this average for now
# interesting
# this is the baseline
# https://www.atsdr.cdc.gov/place-health/php/svi/svi-interactive-map.html

0.80198

In [180]:
average_svi_all = 0.80198
# official data

## **Step 5.2 get the series and make a dataframe**

In [181]:
Manhattan.columns

Index(['primary_key', 'bbl', 'court_index_number', 'docket_number',
       'eviction_address', 'eviction_apartment_number', 'executed_date',
       'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'nta', 'year', 'month_year', 'geometry',
       'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
       'unitsres', 'ownername', 'bldgarea', 'building_type',
       'building_category', 'is_condo', 'floor_category', 'rent_era',
       'architectural_style', 'economic_period', 'residential_units_category',
       'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
       'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
       'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
       'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
       'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp', 'ep_asian',

In [182]:
average_svi_man = Manhattan[['rpl_themes']].mean()
average_svi_bk = Brooklyn[['rpl_themes']].mean()
average_svi_br = Bronx[['rpl_themes']].mean()
average_svi_si = Staten_Island[['rpl_themes']].mean()
average_svi_q = Queens[['rpl_themes']].mean()

In [183]:
# type(average_evictions_man)
# so this is still a series

In [184]:
boro_svi_df = pd.DataFrame({
    # 'borough': ['Manhattan', 'Brooklyn', 'Queens', 'Staten Island', 'Bronx'],
    'manhattan': average_svi_man,
    'brooklyn': average_svi_bk,
    'queens': average_svi_q,
    'staten island': average_svi_si,
    'bronx': average_svi_br
})
boro_svi_df
# good

Unnamed: 0,manhattan,brooklyn,queens,staten island,bronx
rpl_themes,0.8611,0.9302,0.6942,0.8871,0.9887


## **Step 5.3 An item for excel use, Chi-test**

In [185]:
manhattan_svi_average, brooklyn_svi_average, bronx_svi_average, staten_island_svi_average, queens_svi_average, \
manhattan_svi_average > average_svi_all, brooklyn_svi_average > average_svi_all, bronx_svi_average > average_svi_all, \
queens_svi_average > average_svi_all, staten_island_svi_average > average_svi_all

(0.7283,
 rpl_themes   0.8941
 dtype: float64,
 rpl_themes   0.9676
 dtype: float64,
 0.5956,
 0.8024,
 False,
 rpl_themes    True
 dtype: bool,
 rpl_themes    True
 dtype: bool,
 True,
 False)

In [186]:
# make a dataframe to make this neat:
# since there are hard-coded data, it would be better to just change the data directly
data = {
    'Borough': ['Manhattan', 'Brooklyn', 'Bronx', 'Queens', 'Staten Island'],
    'SVI_average': [manhattan_svi_average, brooklyn_svi_average,
                   bronx_svi_average, queens_svi_average,
                   staten_island_svi_average],
    'Above_average': [manhattan_svi_average > average_svi_all,
                     brooklyn_svi_average > average_svi_all,
                     bronx_svi_average > average_svi_all,
                     queens_svi_average > average_svi_all,
                     staten_island_svi_average > average_svi_all]
}
boro_svi_compare_df = pd.DataFrame(data)
boro_svi_compare_df

Unnamed: 0,Borough,SVI_average,Above_average
0,Manhattan,0.7283,False
1,Brooklyn,rpl_themes 0.8941 dtype: float64,rpl_themes True dtype: bool
2,Bronx,rpl_themes 0.9676 dtype: float64,rpl_themes True dtype: bool
3,Queens,0.8024,True
4,Staten Island,0.5956,False


In [187]:
data = {
    'Borough': ['Manhattan', 'Brooklyn', 'Bronx', 'Queens', 'Staten Island'],
    'SVI_average': [0.7283, 0.8941, 0.9676, 0.8024, 0.5956],
    'Above_average': [False, True, True, True, False]
}

boro_svi_compare_df = pd.DataFrame(data)
boro_svi_compare_df

Unnamed: 0,Borough,SVI_average,Above_average
0,Manhattan,0.7283,False
1,Brooklyn,0.8941,True
2,Bronx,0.9676,True
3,Queens,0.8024,True
4,Staten Island,0.5956,False


# **Finally, Chi-test data:**

## If the svi scores would have anything to do with above average eviction rates.

In [188]:
boro_svi_compare_df.merge(
    boro_evictions_compare_df,
    on='Borough',
    # how = 'outer',
    suffixes=('_svi', '_evi')
)

Unnamed: 0,Borough,SVI_average,Above_average_svi,eviction_average,Above_average_evi
0,Manhattan,0.7283,False,0.6,False
1,Brooklyn,0.8941,True,0.4,False
2,Bronx,0.9676,True,1.0,True
3,Queens,0.8024,True,0.4,False
4,Staten Island,0.5956,False,0.4,False


In [189]:
boro_svi_compare_df.to_csv('/content/drive/My Drive/X999/boro_si_compare_df.csv', index= False)

## **null hypothesis is: there is no associatation between above-average svi and above-average evictions**

In [190]:
from scipy.stats import chi2_contingency
data = {
    'Borough': ['Manhattan', 'Brooklyn', 'Bronx', 'Queens', 'Staten Island'],
    'SVI_average': [0.7283, 0.8941, 0.9676, 0.8024, 0.5956],
    'Above_average_svi': [False, True, True, True, False],
    'eviction_average': [0.6, 0.4, 1.0, 0.4, 0.4],
    'Above_average_evi': [False, False, True, False, False]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Borough,SVI_average,Above_average_svi,eviction_average,Above_average_evi
0,Manhattan,0.7283,False,0.6,False
1,Brooklyn,0.8941,True,0.4,False
2,Bronx,0.9676,True,1.0,True
3,Queens,0.8024,True,0.4,False
4,Staten Island,0.5956,False,0.4,False


In [191]:
contingency_table = pd.crosstab(df['Above_average_svi'], df['Above_average_evi'])
contingency_table

Above_average_evi,False,True
Above_average_svi,Unnamed: 1_level_1,Unnamed: 2_level_1
False,2,0
True,2,1


In [192]:
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
chi2, p_value, dof, expected

(np.float64(0.0),
 np.float64(1.0),
 1,
 array([[1.6, 0.4],
        [2.4, 0.6]]))

In [193]:
# correlation test
corr, p_value = spearmanr(
    df['Above_average_svi'],
    df['Above_average_evi']
)
corr, p_value
# p-value is too high to use the corr (moderate)

(np.float64(0.408248290463863), np.float64(0.495025346059711))

## **Results: null hypothesis was not refuted. Due to the small size, only the Bronx has both high svi and high evictions, so only one case.**

In [194]:
# even with fisher exct test (for small number of samples), it does not have an significance.
odds_ratio, p_value_fisher = fisher_exact(contingency_table)
p_value_fisher

1.0

## **If the percentage of blacks and hispanics have anything to do with above average eviction rates**

In [195]:
all_racial.rename(columns= {'racial_percentage_man': "Manhattan",
                            'racial_percentage_bk': 'Brooklyn',
                            'racial_percentage_q': 'Queens',
                            'racial_percentage_br': 'Bronx',
                            'racial_percentage_si': 'Staten Island'}, inplace=True)

In [196]:
all_racial

Unnamed: 0,race_svi,Manhattan,Brooklyn,Queens,Bronx,Staten Island
0,ep_afam,19.7275,42.8839,18.8393,27.9152,17.7638
1,ep_asian,8.9104,7.5561,25.5302,3.1764,10.6179
2,ep_hisp,35.2374,19.2618,29.7163,60.0719,24.5774
3,ep_nhpi,0.0183,0.0282,-0.053,0.0221,0.001
4,ep_otherrace,0.7306,0.8418,2.2679,0.8484,0.3879
5,ep_twomore,3.0573,3.9287,3.1967,1.6778,2.5744
6,ep_white,32.2336,25.3865,19.6677,6.0374,43.8637


In [197]:
all_racial= all_racial.set_index('race_svi')

In [198]:
black_hispanic_svi = all_racial.loc[['ep_afam', 'ep_hisp']].sum().to_frame(name='Black_Hispanic')

In [199]:
black_hispanic_svi

Unnamed: 0,Black_Hispanic
Manhattan,54.9649
Brooklyn,62.1457
Queens,48.5556
Bronx,87.9872
Staten Island,42.3412


In [200]:
black_hispanic_svi = black_hispanic_svi.reset_index().rename(columns={'index': 'Borough'})
black_hispanic_svi

Unnamed: 0,Borough,Black_Hispanic
0,Manhattan,54.9649
1,Brooklyn,62.1457
2,Queens,48.5556
3,Bronx,87.9872
4,Staten Island,42.3412


In [201]:
boro_svi_compare_df

Unnamed: 0,Borough,SVI_average,Above_average
0,Manhattan,0.7283,False
1,Brooklyn,0.8941,True
2,Bronx,0.9676,True
3,Queens,0.8024,True
4,Staten Island,0.5956,False


In [202]:
black_hispanic_svi.merge(
    boro_evictions_compare_df,
    on='Borough',
    # how = 'outer',
    suffixes=('_racial', '_svi')
)

Unnamed: 0,Borough,Black_Hispanic,eviction_average,Above_average
0,Manhattan,54.9649,0.6,False
1,Brooklyn,62.1457,0.4,False
2,Queens,48.5556,0.4,False
3,Bronx,87.9872,1.0,True
4,Staten Island,42.3412,0.4,False


In [203]:
black_hispanic_svi.to_csv('/content/drive/My Drive/X999/black_hispanic_evictions.csv', index = False)

In [204]:
# had to rearrange the table for easier comparison
data = {
    'Borough': ['Manhattan', 'Brooklyn', 'Bronx', 'Queens', 'Staten Island'],
    'Black_hispanic': [54.9649, 62.1457, 87.9872, 48.5556, 42.3412],
    'eviction_average': [0.6, 0.4, 1.0, 0.4, 0.4],
    'Above_average': [False, False, True, False, False]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Borough,Black_hispanic,eviction_average,Above_average
0,Manhattan,54.9649,0.6,False
1,Brooklyn,62.1457,0.4,False
2,Bronx,87.9872,1.0,True
3,Queens,48.5556,0.4,False
4,Staten Island,42.3412,0.4,False


### **The null hypothesis is: boroughs with a higher black/hispanic population are not more likely to experience above average eviction rates.


In [205]:
df = pd.DataFrame(data)
average_bh = df['Black_hispanic'].mean()
average_bh
# average black + hispanic percentage in each borough

np.float64(59.19892)

In [206]:
df['high_black_hispanic'] = df['Black_hispanic'] >= mean_bh
contingency_table = pd.crosstab(df['high_black_hispanic'], df['Above_average'])
contingency_table
# the input is the column assessing if the borough has population higher than mean of black and hispanic populations
# and the column if the borough has higher than mean of eviction rates

Above_average,False,True
high_black_hispanic,Unnamed: 1_level_1,Unnamed: 2_level_1
False,3,0
True,1,1


In [207]:
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
chi2, p_value, dof, expected

(np.float64(0.052083333333333336),
 np.float64(0.8194769767775212),
 1,
 array([[2.4, 0.6],
        [1.6, 0.4]]))

In [208]:
# correlation test
corr, p_value = spearmanr(
    df['Black_hispanic'],
    df['eviction_average']
)

In [209]:
corr, p_value
# some correlation but the p-value is too high, so cannot refute the null hypothesis.

(np.float64(0.6708203932499368), np.float64(0.21516994256955005))

## **result: the null hypothesis is not refuted. There is no statistically significant association between high black + hispanic population and above avergae evictions** (from bother chi-square test and fisher_exact test)

## **For Chi-test, only man and si are below average svi scores across all five boroughs**

# **In summary, repeat step 3 and do the chi-tests for the selected 11 or less neighborhoods. The chi-test would make more sense if we have more input sample data with neighborhood level.**