# **Introduction**

## **Chi-test (boroughs + svi)**
## **Bar-chart with svi as regression/scatterplot (boroughs first)**

source: https://www.atsdr.cdc.gov/place-health/media/pdfs/2024/10/SVI2022Documentation.pdf

source: https://www.atsdr.cdc.gov/place-health/php/svi/svi-interactive-map.html

Chi-test: Using the average_year_eviction_nta_count for Chi-square test for NTA, we get results as following:

1) SVI: P-value is 0.0095 and chi-square is 6.73. We can reject the null hypothesis. The Chi-square suggests a strong-to-moderate deviation from the distribution if the two groups/categories are independent. The odds are 3.3345, meaning if you live in a nta with higher than average svi, you are 3.3345 times more likely to be evicted than living in a nta with less than average svi.

2) Black + Hispanic Population percentage: P-value is 0.00032 and chi-squiare is 53.9875. We can refute the null hypothesis. We can say we are confident that high "black+hispanic percetage" boroughs are more likely to have high evictions (or vice versa) or there is a strong association between high BH pct and high eviction rates in the full dataset. The odds are 12.5125, meaning if you live in a neighborhood with b+h population above the average pct of nyc, you are 12.5125 times more likely to get evicted than someone living in a neighborhood with below average percentage of black + hispanic people.

In [None]:
import pandas as pd
import geopandas as gpd
import numpy as np
import datetime as dt
import scipy

# visualization
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import seaborn as sns

from scipy.stats import chi2_contingency
import statsmodels.api as sm

# system and utility
import warnings
import os
import io
from IPython.display import IFrame
from google.colab import files

# suppress warnings
warnings.filterwarnings('ignore')

# inline
%matplotlib inline

In [None]:
# !pip install geopandas folium matplotlib seaborn scipy esda splot

In [4]:
pd.set_option('display.float_format', lambda x: '%.4f' % x)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# **Step 1 Get the Eviction data**

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
# data source:
file_path1 = '/content/drive/My Drive/X999/bbl_evictions_311_svi_normal_times.csv'
file_path2 = '/content/drive/My Drive/X999/bbl_evictions_311_svi_covid.csv'

In [7]:
evictions_pre_post_raw = pd.read_csv(file_path1)
evictions_covid_raw = pd.read_csv(file_path2)
evictions_covid_raw.shape, evictions_pre_post_raw.shape

((5386, 93), (66397, 94))

In [8]:
evictions_pre_post = evictions_pre_post_raw.copy()
evictions_covid = evictions_covid_raw.copy()

In [9]:
evictions_pre_post.head(2)

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count,average_year_eviction_nta_count
0,*308072/22_5865,3037420029,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.6721,-73.8911,5.0,37.0,1152.0,3083989,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0,0.2667,0.0027
1,*313639/23_5202,3057940012,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.6359,-74.0119,7.0,38.0,118.0,3143881,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0,0.3,0.0004


In [10]:
evictions_covid.head(2)

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count,average_year_eviction_nta_count
0,004123/20_209969,2032140141,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.8654,-73.9013,7.0,14.0,265.0,2113173,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0,0.0667,0.0001
1,0050153/20_106030,4031560133,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.7242,-73.8556,6.0,29.0,71306.0,4074666,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0,0.0011,0.0


In [11]:
evictions_pre_post.columns, \
evictions_covid.columns, \
evictions_pre_post.shape, \
evictions_covid.shape

(Index(['primary_key', 'bbl', 'court_index_number', 'docket_number',
        'eviction_address', 'eviction_apartment_number', 'executed_date',
        'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
        'latitude', 'longitude', 'community_board', 'council_district',
        'census_tract', 'bin', 'nta', 'year', 'month_year', 'geometry',
        'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
        'unitsres', 'ownername', 'bldgarea', 'building_type',
        'building_category', 'is_condo', 'floor_category', 'rent_era',
        'architectural_style', 'economic_period', 'residential_units_category',
        'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
        'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
        'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
        'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
        'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp

In [12]:
link = '/content/drive/My Drive/X999/svi_cleaned.csv'

In [13]:
svi_df = pd.read_csv(link)
svi_df.head(2)

Unnamed: 0,fips,location,area_sqmi,e_totpop,m_totpop,e_hu,m_hu,e_hh,m_hh,e_pov150,m_pov150,e_unemp,m_unemp,e_hburd,m_hburd,e_nohsdp,m_nohsdp,e_uninsur,m_uninsur,e_age65,m_age65,e_age17,m_age17,e_disabl,m_disabl,e_sngpnt,m_sngpnt,e_limeng,m_limeng,e_minrty,m_minrty,e_munit,m_munit,e_mobile,m_mobile,e_crowd,m_crowd,e_noveh,m_noveh,e_groupq,m_groupq,ep_pov150,mp_pov150,ep_unemp,mp_unemp,ep_hburd,mp_hburd,ep_nohsdp,mp_nohsdp,ep_uninsur,mp_uninsur,ep_age65,mp_age65,ep_age17,mp_age17,ep_disabl,mp_disabl,ep_sngpnt,mp_sngpnt,ep_limeng,mp_limeng,ep_minrty,mp_minrty,ep_munit,mp_munit,ep_mobile,mp_mobile,ep_crowd,mp_crowd,ep_noveh,mp_noveh,ep_groupq,mp_groupq,epl_pov150,epl_unemp,epl_hburd,epl_nohsdp,epl_uninsur,spl_theme1,rpl_theme1,epl_age65,epl_age17,epl_disabl,epl_sngpnt,epl_limeng,spl_theme2,rpl_theme2,epl_minrty,spl_theme3,rpl_theme3,epl_munit,epl_mobile,epl_crowd,epl_noveh,epl_groupq,spl_theme4,rpl_theme4,spl_themes,rpl_themes,f_pov150,f_unemp,f_hburd,f_nohsdp,f_uninsur,f_theme1,f_age65,f_age17,f_disabl,f_sngpnt,f_limeng,f_theme2,f_minrty,f_theme3,f_munit,f_mobile,f_crowd,f_noveh,f_groupq,f_theme4,f_total,e_daypop,e_noint,m_noint,e_afam,m_afam,e_hisp,m_hisp,e_asian,m_asian,e_aian,m_aian,e_nhpi,m_nhpi,e_twomore,m_twomore,e_otherrace,m_otherrace,ep_noint,mp_noint,ep_afam,mp_afam,ep_hisp,mp_hisp,ep_asian,mp_asian,ep_aian,mp_aian,ep_nhpi,mp_nhpi,ep_twomore,mp_twomore,ep_otherrace,mp_otherrace
0,10001,ZCTA5 10001,0.6238,27004,1827,16975,831,14375,782,5248,797,761,266,3314,531,1930,534,831,289,3428,432,2694,643,2310,499,501,215,1381,405,13460,2305,15840,898,15,23,389,135,12285,840,2213,218,20.3,2.7,4.3,1.5,23.1,3.5,9.1,2.4,3.1,1.0,12.7,1.6,10.0,2.1,8.6,1.9,3.5,1.5,5.3,1.5,49.8,7.8,93.3,2.7,0.1,0.1,2.7,0.9,85.5,2.8,8.2,0.6,0.6108,0.4574,0.5573,0.5902,0.4436,2.6593,0.5688,0.142,0.1161,0.1891,0.4707,0.8777,1.7956,0.1692,0.867,0.867,0.867,0.9853,0.271,0.7402,0.9949,0.9104,3.9018,0.9806,9.2237,0.7414,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,3,3,239407,1047,389,2220,576,5206,943,5031,774,0,25,0,25,780,326,223,169,7.3,2.6,8.2,2.2,19.3,3.0,18.6,2.9,0.0,0.1,0.0,0.1,2.9,1.2,0.8,0.6
1,10002,ZCTA5 10002,0.8223,76518,2894,39094,1241,36028,1326,27908,2853,2833,574,14688,1367,18301,1376,4074,766,17681,1287,10028,1549,9896,1062,2211,499,18393,1640,56964,3226,35725,1677,16,28,2461,449,29828,1403,2090,39,36.8,3.5,7.6,1.4,40.8,3.5,30.0,2.0,5.4,1.0,23.1,1.7,13.1,1.8,13.0,1.4,6.1,1.4,24.7,2.0,74.4,3.1,91.4,3.2,0.0,0.1,6.8,1.2,82.8,1.8,2.7,0.1,0.9148,0.7946,0.9219,0.9741,0.7207,4.3261,0.9639,0.7296,0.1831,0.5186,0.739,0.9944,3.1647,0.8781,0.9369,0.9369,0.9369,0.979,0.0,0.9105,0.9915,0.773,3.654,0.9254,12.0817,0.9656,1,0,1,1,0,3,0,0,0,0,1,1,1,1,1,0,1,1,0,3,8,64307,8590,1110,6141,1194,19864,2190,28477,1989,74,83,24,45,1810,486,574,394,23.8,2.9,8.0,1.5,26.0,2.5,37.2,2.2,0.1,0.1,0.0,0.1,2.4,0.6,0.8,0.5


In [14]:
svi_df.shape

(204, 153)

In [15]:
# list(svi_df.columns)

In [16]:
svi_df.ep_nhpi.unique()

array([ 0.00e+00,  1.00e-01,  3.00e-01,  2.00e-01, -9.99e+02,  8.00e-01,
        1.20e+00,  5.00e-01,  4.00e-01])

# **Step 2: SVI items**

q quick double check

In [17]:
link = "/content/drive/My Drive/X999/NewYork_ZCTA.csv"

In [18]:
svi_raw = pd.read_csv(link)
svi_raw.head(2)

Unnamed: 0,ST,STATE,ST_ABBR,FIPS,LOCATION,AREA_SQMI,E_TOTPOP,M_TOTPOP,E_HU,M_HU,E_HH,M_HH,E_POV150,M_POV150,E_UNEMP,M_UNEMP,E_HBURD,M_HBURD,E_NOHSDP,M_NOHSDP,E_UNINSUR,M_UNINSUR,E_AGE65,M_AGE65,E_AGE17,M_AGE17,E_DISABL,M_DISABL,E_SNGPNT,M_SNGPNT,E_LIMENG,M_LIMENG,E_MINRTY,M_MINRTY,E_MUNIT,M_MUNIT,E_MOBILE,M_MOBILE,E_CROWD,M_CROWD,E_NOVEH,M_NOVEH,E_GROUPQ,M_GROUPQ,EP_POV150,MP_POV150,EP_UNEMP,MP_UNEMP,EP_HBURD,MP_HBURD,EP_NOHSDP,MP_NOHSDP,EP_UNINSUR,MP_UNINSUR,EP_AGE65,MP_AGE65,EP_AGE17,MP_AGE17,EP_DISABL,MP_DISABL,EP_SNGPNT,MP_SNGPNT,EP_LIMENG,MP_LIMENG,EP_MINRTY,MP_MINRTY,EP_MUNIT,MP_MUNIT,EP_MOBILE,MP_MOBILE,EP_CROWD,MP_CROWD,EP_NOVEH,MP_NOVEH,EP_GROUPQ,MP_GROUPQ,EPL_POV150,EPL_UNEMP,EPL_HBURD,EPL_NOHSDP,EPL_UNINSUR,SPL_THEME1,RPL_THEME1,EPL_AGE65,EPL_AGE17,EPL_DISABL,EPL_SNGPNT,EPL_LIMENG,SPL_THEME2,RPL_THEME2,EPL_MINRTY,SPL_THEME3,RPL_THEME3,EPL_MUNIT,EPL_MOBILE,EPL_CROWD,EPL_NOVEH,EPL_GROUPQ,SPL_THEME4,RPL_THEME4,SPL_THEMES,RPL_THEMES,F_POV150,F_UNEMP,F_HBURD,F_NOHSDP,F_UNINSUR,F_THEME1,F_AGE65,F_AGE17,F_DISABL,F_SNGPNT,F_LIMENG,F_THEME2,F_MINRTY,F_THEME3,F_MUNIT,F_MOBILE,F_CROWD,F_NOVEH,F_GROUPQ,F_THEME4,F_TOTAL,E_DAYPOP,E_NOINT,M_NOINT,E_AFAM,M_AFAM,E_HISP,M_HISP,E_ASIAN,M_ASIAN,E_AIAN,M_AIAN,E_NHPI,M_NHPI,E_TWOMORE,M_TWOMORE,E_OTHERRACE,M_OTHERRACE,EP_NOINT,MP_NOINT,EP_AFAM,MP_AFAM,EP_HISP,MP_HISP,EP_ASIAN,MP_ASIAN,EP_AIAN,MP_AIAN,EP_NHPI,MP_NHPI,EP_TWOMORE,MP_TWOMORE,EP_OTHERRACE,MP_OTHERRACE
0,36,New York,NY,6390,ZCTA5 06390,4.0467,53,39,253,49,19,19,17,16,0,13,9,26,0,13,27,34,0,13,6,11,31,33,0,18,9,53,20,51,0,18,4,5,0,18,0,13,17,16,32.1,18.8,0.0,52.7,47.4,100.0,0.0,51.4,50.9,45.7,0.0,45.2,11.3,19.0,58.5,41.4,0.0,94.7,17.0,99.2,37.7,92.1,0.0,7.1,1.6,2.0,0.0,94.7,0.0,75.5,32.1,18.8,0.879,0.0,0.9635,0.0,0.996,2.8385,0.6342,0.0,0.1408,0.9944,0.0,0.9775,2.1127,0.3009,0.8062,0.8062,0.8062,0.0,0.4654,0.0,0.0,0.9735,1.4389,0.2205,7.1963,0.4192,0,0,1,0,1,2,0,0,1,0,1,2,0,0,0,0,0,0,1,1,5,601,9,14,0,13,9,19,0,13,0,13,8,16,3,7,0,13,47.4,51.8,0.0,45.2,17.0,35.0,0.0,45.2,0.0,45.2,15.1,32.1,5.7,12.1,0.0,45.2
1,36,New York,NY,10001,ZCTA5 10001,0.6238,27004,1827,16975,831,14375,782,5248,797,761,266,3314,531,1930,534,831,289,3428,432,2694,643,2310,499,501,215,1381,405,13460,2305,15840,898,15,23,389,135,12285,840,2213,218,20.3,2.7,4.3,1.5,23.1,3.5,9.1,2.4,3.1,1.0,12.7,1.6,10.0,2.1,8.6,1.9,3.5,1.5,5.3,1.5,49.8,7.8,93.3,2.7,0.1,0.1,2.7,0.9,85.5,2.8,8.2,0.6,0.6108,0.4574,0.5573,0.5902,0.4436,2.6593,0.5688,0.142,0.1161,0.1891,0.4707,0.8777,1.7956,0.1692,0.867,0.867,0.867,0.9853,0.271,0.7402,0.9949,0.9104,3.9018,0.9806,9.2237,0.7414,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,3,3,239407,1047,389,2220,576,5206,943,5031,774,0,25,0,25,780,326,223,169,7.3,2.6,8.2,2.2,19.3,3.0,18.6,2.9,0.0,0.1,0.0,0.1,2.9,1.2,0.8,0.6


In [19]:
def is_nyc_zipcode(zipcode):
    zip_int = int(zipcode) if isinstance(zipcode, str) else zipcode

    # Manhattan: 10001-10282
    if 10001 <= zip_int <= 10282:
        return True
    # addition Manhattan: 10300-10499
    if 10300 <= zip_int <= 10499:
        return True
    # Bronx: 10451-10475
    if 10451 <= zip_int <= 10475:
        return True
    # Brooklyn: 11201-11256
    if 11201 <= zip_int <= 11256:
        return True
    # Queens: 11351-11436, 11101-11109
    if (11351 <= zip_int <= 11436) or (11101 <= zip_int <= 11109):
        return True
    # Staten Island: 10301-10314
    if 10301 <= zip_int <= 10314:
        return True
    # additional Queens ZIPs
    if zip_int in [11004, 11005, 11411, 11412, 11413, 11418, 11419, 11420, 11421, 11422, 11423, 11426, 11427, 11428, 11429]:
        return True
    return False

In [20]:
nyc_df = svi_raw[svi_raw['FIPS'].apply(is_nyc_zipcode)]

In [21]:
nyc_df.shape

(204, 156)

In [22]:
nyc_df.EP_NHPI.unique()
# interesting, so there was an error in the source data

array([ 0.00e+00,  1.00e-01,  3.00e-01,  2.00e-01, -9.99e+02,  8.00e-01,
        1.20e+00,  5.00e-01,  4.00e-01])

In [23]:
svi_raw.EP_NHPI.unique()

array([ 1.51e+01,  0.00e+00,  1.00e-01,  3.00e-01,  2.00e-01, -9.99e+02,
        8.00e-01,  1.20e+00,  5.00e-01,  4.00e-01,  1.10e+00,  7.00e-01,
        1.50e+00,  1.80e+00,  9.00e-01,  6.00e-01,  1.40e+00,  2.10e+00,
        2.30e+00,  1.00e+00])

In [24]:
-9.99e+02, 0.00e+00, 9.00e-01, 2.00e-01

(-999.0, 0.0, 0.9, 0.2)

# **Step 3: All boroughs and their eviction rates**

In [25]:
evictions_pre_post_mean = evictions_pre_post[['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']].mean()
evictions_pre_post_mean
# may need to merge ep_twomore and ep_otherrance toegther

Unnamed: 0,0
ep_afam,29.2346
ep_asian,8.904
ep_hisp,38.7307
ep_nhpi,0.0115
ep_white,19.0236
ep_twomore,2.8025
ep_otherrace,1.0239


In [26]:
evictions_pre_post_mean = evictions_pre_post_mean.reset_index()

In [27]:
evictions_pre_post_mean.rename(columns = {'index':'race_svi', 0: "racial percentage"}, inplace=True)

In [28]:
evictions_pre_post_mean

Unnamed: 0,race_svi,racial percentage
0,ep_afam,29.2346
1,ep_asian,8.904
2,ep_hisp,38.7307
3,ep_nhpi,0.0115
4,ep_white,19.0236
5,ep_twomore,2.8025
6,ep_otherrace,1.0239


In [29]:
# type(evictions_pre_post_mean)
# so this is correct

In [30]:
# evictions_pre_post.columns

In [31]:
neighbor_evictions = evictions_pre_post.groupby('nta').agg({'average_year_eviction_nta_count': 'mean', 'borough': 'first'}).reset_index()
neighbor_evictions.sort_values('average_year_eviction_nta_count', ascending=False, inplace=True)
neighbor_evictions

Unnamed: 0,nta,average_year_eviction_nta_count,borough
27,Central Harlem North-Polo Grounds,0.0109,MANHATTAN
182,Woodlawn-Wakefield,0.0074,BRONX
59,Flatbush,0.0039,BROOKLYN
38,Crown Heights North,0.0038,BROOKLYN
11,Bedford Park-Fordham North,0.0038,BRONX
178,Williamsbridge-Olinville,0.0037,BRONX
52,East Tremont,0.0037,BRONX
107,Mott Haven-Port Morris,0.0036,BRONX
169,Washington Heights South,0.0035,MANHATTAN
98,Marble Hill-Inwood,0.0034,MANHATTAN


In [32]:
man_nta = {
    'nta': neighbor_evictions['nta'].unique(),
    'eviction_rates': neighbor_evictions['average_year_eviction_nta_count']
}

man_nta_df = pd.DataFrame(man_nta)
man_nta_df

Unnamed: 0,nta,eviction_rates
27,Central Harlem North-Polo Grounds,0.0109
182,Woodlawn-Wakefield,0.0074
59,Flatbush,0.0039
38,Crown Heights North,0.0038
11,Bedford Park-Fordham North,0.0038
178,Williamsbridge-Olinville,0.0037
52,East Tremont,0.0037
107,Mott Haven-Port Morris,0.0036
169,Washington Heights South,0.0035
98,Marble Hill-Inwood,0.0034


In [33]:
avg_per_nta = neighbor_evictions.average_year_eviction_nta_count.mean()
avg_per_nta
# per building, per year, that's why it is similar to the borough one too

np.float64(0.0012530904589788123)

### **Step 3.2. All neighborhoods racial composite**

In [34]:
race_columns = ['ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace']
racial_avg_all = evictions_pre_post.groupby('nta')[['average_year_eviction_nta_count', 'ep_afam', 'ep_asian', 'ep_hisp', 'ep_nhpi', 'ep_white', 'ep_twomore', 'ep_otherrace', 'rpl_themes']].mean()
racial_avg_all.sort_values('average_year_eviction_nta_count', ascending=False, inplace=True)
racial_avg_all.reset_index(inplace=True)
racial_avg_all

Unnamed: 0,nta,average_year_eviction_nta_count,ep_afam,ep_asian,ep_hisp,ep_nhpi,ep_white,ep_twomore,ep_otherrace,rpl_themes
0,Central Harlem North-Polo Grounds,0.0109,53.1706,3.806,28.1355,0.0001,10.7591,3.0765,0.9779,0.9694
1,Woodlawn-Wakefield,0.0074,46.3838,2.3225,26.4032,0.0,19.5334,2.7199,2.3837,0.964
2,Flatbush,0.0039,50.3284,6.2411,12.8934,0.0136,25.4287,4.1145,0.9041,0.927
3,Crown Heights North,0.0038,54.5405,2.4402,14.5104,0.0004,23.1759,4.2121,0.9645,0.9298
4,Bedford Park-Fordham North,0.0038,16.0227,3.1098,72.5584,0.0,6.2559,1.4718,0.4192,0.9888
5,Williamsbridge-Olinville,0.0037,46.3329,4.3523,38.9846,0.0,5.714,2.2981,1.855,0.9845
6,East Tremont,0.0037,28.0218,1.3305,64.7416,0.0,3.1983,1.6115,0.7708,0.9962
7,Mott Haven-Port Morris,0.0036,25.111,0.5096,69.6073,0.5501,2.6292,0.8451,0.5355,0.9984
8,Washington Heights South,0.0035,9.4778,4.0466,66.1568,0.0,17.8351,1.9685,0.4406,0.9725
9,Marble Hill-Inwood,0.0034,6.8807,3.3681,65.9,0.0,20.764,2.3974,0.4966,0.9602


In [35]:
racial_avg_all.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186
nta,Central Harlem North-Polo Grounds,Woodlawn-Wakefield,Flatbush,Crown Heights North,Bedford Park-Fordham North,Williamsbridge-Olinville,East Tremont,Mott Haven-Port Morris,Washington Heights South,Marble Hill-Inwood,University Heights-Morris Heights,Kew Gardens Hills,Mount Hope,East Concourse-Concourse Village,Hunters Point-Sunnyside-West Maspeth,Battery Park City-Lower Manhattan,East New York,Central Harlem South,Hudson Yards-Chelsea-Flatiron-Union Square,Belmont,Washington Heights North,Queens Village,Starrett City,East Flatbush-Farragut,Baisley Park,Lenox Hill-Roosevelt Island,Brownsville,West Concourse,Morrisania-Melrose,West Farms-Bronx River,West New Brighton-New Brighton-St. George,Hamilton Heights,Clinton,South Jamaica,Van Cortlandt Village,East Harlem North,Hunts Point,Rugby-Remsen Village,Kew Gardens,Prospect Lefferts Gardens-Wingate,Fordham South,Highbridge,Longwood,Crotona Park East,Bedford,Stuyvesant Heights,Mariner's Harbor-Arlington-Port Ivory-Granitev...,Norwood,Soundview-Bruckner,Seagate-Coney Island,Astoria,Ocean Hill,Bronxdale,Co-op City,Claremont-Bathgate,Jackson Heights,Richmond Hill,Kingsbridge Heights,Bushwick South,Soundview-Castle Hill-Clason Point-Harding Park,Melrose South-Mott Haven North,Hollis,Jamaica,Crown Heights South,Grymes Hill-Clifton-Fox Hills,Yorkville,East Harlem South,North Riverdale-Fieldston-Riverdale,Woodhaven,Port Richmond,SoHo-TriBeCa-Civic Center-Little Italy,Westchester-Unionport,Turtle Bay-East Midtown,South Ozone Park,Springfield Gardens North,Corona,East New York (Pennsylvania Ave),Eastchester-Edenwald-Baychester,West Village,Erasmus,Pelham Parkway,Old Astoria,Stapleton-Rosebank,Canarsie,Flushing,Lower East Side,Upper East Side-Carnegie Hill,Midtown-Midtown South,Pelham Bay-Country Club-City Island,Bay Ridge,St. Albans,Steinway,Spuyten Duyvil-Kingsbridge,Forest Hills,Upper West Side,Van Nest-Morris Park-Westchester Square,Flatlands,Schuylerville-Throgs Neck-Edgewater Park,Briarwood-Jamaica Hills,DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill,Cypress Hills-City Line,Queensbridge-Ravenswood-Long Island City,Midwood,Rosedale,Murray Hill,Fort Greene,Clinton Hill,Bushwick North,North Side-South Side,College Point,Douglas Manor-Douglaston-Little Neck,Sheepshead Bay-Gerritsen Beach-Manhattan Beach,Bensonhurst West,Manhattanville,Murray Hill-Kips Bay,East Village,Elmhurst,Ft. Totten-Bay Terrace-Clearview,Pomonok-Flushing Heights-Hillcrest,Parkchester,Rego Park,Bensonhurst East,Ridgewood,Brighton Beach,Springfield Gardens South-Brookville,Fresh Meadows-Utopia,East Elmhurst,Greenpoint,Cambria Heights,Ozone Park,Lincoln Square,New Brighton-Silver Lake,Homecrest,Gramercy,Chinatown,Bellerose,Jamaica Estates-Holliswood,Sunset Park West,East Williamsburg,Charleston-Richmond Valley-Tottenville,Morningside Heights,Kensington-Ocean Parkway,Sunset Park East,Bayside-Bayside Hills,Laurelton,Lindenwood-Howard Beach,Middle Village,Madison,Bath Beach,Dyker Heights,Borough Park,Woodside,Carroll Gardens-Columbia Street-Red Hook,Allerton-Pelham Gardens,Oakwood-Oakwood Beach,West Brighton,Gravesend,Maspeth,Old Town-Dongan Hills-South Beach,Park Slope-Gowanus,Great Kills,North Corona,Georgetown-Marine Park-Bergen Beach-Mill Basin,Grasmere-Arrochar-Ft. Wadsworth,Williamsburg,Prospect Heights,Stuyvesant Town-Cooper Village,Glendale,Oakland Gardens,Glen Oaks-Floral Park-New Hyde Park,Whitestone,Elmhurst-Maspeth,Auburndale,New Springville-Bloomfield-Travis,New Dorp-Midland Beach,Queensboro Hill,Ocean Parkway South,East Flushing,Rossville-Woodrow,Todt Hill-Emerson Hill-Heartland Village-Light...,Windsor Terrace,park-cemetery-etc-Bronx,Westerleigh,Brooklyn Heights-Cobble Hill,Arden Heights,Annadale-Huguenot-Prince's Bay-Eltingville,park-cemetery-etc-Brooklyn
average_year_eviction_nta_count,0.0109,0.0074,0.0039,0.0038,0.0038,0.0037,0.0037,0.0036,0.0035,0.0034,0.0033,0.0033,0.0033,0.0032,0.0030,0.0028,0.0027,0.0027,0.0026,0.0026,0.0026,0.0025,0.0024,0.0024,0.0023,0.0023,0.0023,0.0023,0.0023,0.0022,0.0022,0.0022,0.0021,0.0021,0.0021,0.0021,0.0020,0.0020,0.0020,0.0020,0.0020,0.0020,0.0020,0.0019,0.0019,0.0019,0.0018,0.0017,0.0017,0.0017,0.0017,0.0016,0.0016,0.0016,0.0016,0.0015,0.0015,0.0015,0.0015,0.0015,0.0015,0.0015,0.0014,0.0014,0.0014,0.0013,0.0013,0.0013,0.0013,0.0013,0.0013,0.0013,0.0012,0.0012,0.0012,0.0012,0.0012,0.0012,0.0011,0.0011,0.0011,0.0011,0.0010,0.0010,0.0010,0.0010,0.0010,0.0010,0.0009,0.0009,0.0009,0.0009,0.0008,0.0008,0.0008,0.0008,0.0008,0.0008,0.0008,0.0008,0.0008,0.0007,0.0007,0.0007,0.0007,0.0007,0.0007,0.0007,0.0007,0.0007,0.0007,0.0007,0.0007,0.0006,0.0006,0.0006,0.0006,0.0006,0.0006,0.0006,0.0006,0.0006,0.0006,0.0005,0.0005,0.0005,0.0005,0.0005,0.0005,0.0005,0.0005,0.0005,0.0005,0.0005,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0000
ep_afam,53.1706,46.3838,50.3284,54.5405,16.0227,46.3329,28.0218,25.1110,9.4778,6.8807,24.4407,7.1250,28.4259,37.2462,3.5968,5.1135,49.8602,43.8578,6.2457,19.2911,3.2136,44.4504,61.7000,66.0359,68.0141,1.7300,70.0757,29.7957,35.4504,24.4517,19.9547,25.3079,5.3544,48.4326,14.4409,37.4339,27.4511,73.8256,6.7883,63.2727,19.7701,28.0424,26.5898,27.1736,33.9160,43.7517,27.7042,28.6000,23.1000,20.1385,4.7717,65.7064,29.4301,58.9600,30.5182,4.2780,9.2388,15.8222,37.1939,30.7604,29.0565,41.5005,21.6398,54.7185,25.1149,3.7964,24.4735,9.3980,3.9000,21.2966,2.9904,22.5381,2.5679,17.6146,78.4795,7.3968,55.8968,55.0526,2.8794,63.5239,19.8323,6.8923,21.0813,79.3561,2.5128,7.5460,2.7172,5.8074,7.8839,2.9430,73.3103,4.5262,11.9581,2.9038,6.7846,19.5497,48.9986,9.6984,21.3177,14.1253,47.2724,9.3452,14.3727,77.9907,2.3790,19.7271,24.8126,17.0372,5.9004,0.6500,1.0920,3.4201,1.4440,28.9019,5.2682,6.2083,1.5945,1.0050,10.2669,22.2000,3.3957,2.9276,2.3485,2.4114,80.6366,11.0357,2.8165,2.6000,85.0327,6.7684,4.5269,19.9309,5.0105,6.5630,7.7371,13.2767,21.3094,2.9753,14.5885,0.3323,19.8254,9.2556,3.2142,2.8550,85.2791,3.7148,1.0250,6.4066,1.5683,1.0864,2.9941,1.5051,11.1444,51.6827,3.0000,13.6025,5.5611,0.9450,6.9350,7.4853,1.2951,7.4986,40.5737,9.6898,7.2038,23.6750,7.2048,2.4202,1.6000,5.0824,0.3056,1.5192,4.7833,3.3000,3.4146,4.8677,11.1609,1.6265,0.5353,5.1357,7.7032,28.6000,11.5412,11.4000,0.8000,0.6765,21.1000
ep_asian,3.8060,2.3225,6.2411,2.4402,3.1098,4.3523,1.3305,0.5096,4.0466,3.3681,1.6079,27.9187,0.8842,0.8263,30.6782,18.7738,5.4992,6.8710,15.7050,2.7425,2.3211,19.5081,2.6000,3.7104,5.6036,10.9310,0.8424,1.0950,0.6731,5.7368,6.9297,4.0009,17.9763,16.9343,2.7809,5.7044,0.2221,1.2907,22.4743,2.9446,2.2371,1.0293,0.4757,1.0732,5.5412,4.2734,10.4827,6.2000,8.2000,9.8270,17.3413,1.6955,6.1401,2.7267,1.0987,22.9115,27.7371,2.3325,4.7231,2.1141,1.1642,25.7903,34.6499,2.7929,11.6023,8.7272,10.7473,3.5199,24.2869,6.5286,22.4956,10.2746,15.7632,32.9424,3.0306,11.3050,1.5462,4.3560,9.6448,3.2934,15.9261,15.8664,13.6288,2.5177,69.6381,25.3629,9.5957,18.7385,10.9935,16.4915,6.6202,10.1821,3.6905,28.9331,8.9182,11.5567,6.6704,6.5829,30.9335,11.6771,6.9815,25.8087,14.5658,3.0669,64.5205,8.1823,6.9238,5.8536,5.9106,34.4000,37.3800,17.4785,33.4381,6.5820,18.5861,14.9575,48.2681,29.7600,40.1715,16.6000,30.9804,30.5268,7.1182,14.8568,3.4473,42.9817,2.1835,4.9000,1.5981,28.7342,16.0862,6.8887,24.7288,16.4863,34.1605,40.5419,38.0210,33.9505,6.2317,3.2581,9.4149,17.7152,31.9137,42.8083,1.6604,6.3410,14.6125,23.6603,36.3623,32.5111,22.4337,36.5899,6.0226,5.4953,13.8000,11.4456,27.8376,15.3967,17.5550,9.5010,9.1390,11.4552,8.6921,18.1898,5.7577,6.9000,13.8190,7.3275,48.6000,47.4706,26.1889,39.5795,46.0556,18.0000,13.5927,56.5581,16.4906,60.5551,4.6412,16.7333,15.8355,6.2000,12.5000,14.2000,8.7000,6.8059,7.5000
ep_hisp,28.1355,26.4032,12.8934,14.5104,72.5584,38.9846,64.7416,69.6073,66.1568,65.9000,69.3353,18.5071,66.7510,56.9323,27.3174,9.5817,36.1831,21.6834,16.3429,69.5754,70.1921,19.1415,15.9000,7.4676,12.6763,7.4325,19.3509,63.4311,59.1572,62.4244,27.1777,50.0514,18.2173,17.6007,69.3459,40.0579,68.8494,14.6261,25.8432,11.5881,72.1502,65.4042,69.2973,65.5250,22.0170,27.1091,36.9380,52.9000,61.7000,19.4159,26.6804,15.7823,52.1306,30.4533,63.2627,50.9053,33.7512,76.5040,32.8167,62.0966,64.9251,13.5161,23.1637,11.8073,22.3858,11.3452,44.8647,34.0166,53.4142,40.2134,9.6741,55.1108,8.4167,22.3905,11.2336,75.7668,32.7744,29.6155,11.2600,14.3570,47.6325,25.7587,20.6370,9.2142,14.9697,25.2146,9.5336,17.3736,47.2900,21.9995,9.9275,24.3393,51.3399,17.8522,18.6000,52.2908,8.2359,47.5134,25.9996,12.6747,37.8611,22.3698,11.1658,9.1432,14.3215,13.9950,13.9850,50.2836,24.2743,39.7713,11.2480,9.0215,14.9840,37.3204,11.9763,17.8939,42.4846,11.2750,20.2715,47.2000,22.8543,14.2061,44.8673,9.1264,8.7786,21.0122,45.3759,15.6000,7.0904,39.0171,10.1450,27.1351,12.0759,10.2918,20.8311,19.9488,19.9659,42.1540,32.6691,10.4258,24.2418,16.3987,36.8353,15.4267,6.9242,25.1492,24.1847,8.7828,17.3407,15.2309,13.7491,39.3741,14.8081,29.8205,16.2000,15.6823,15.1705,36.7700,16.1150,16.3461,12.5707,75.2322,8.9026,16.5653,20.9923,14.0625,21.7810,44.7761,16.4000,12.9118,15.5472,40.5945,18.6333,15.2000,16.5122,21.1677,11.6844,16.3102,9.9765,16.1571,16.1065,52.9000,24.2529,11.0000,12.8000,11.4824,14.1000
ep_nhpi,0.0001,0.0000,0.0136,0.0004,0.0000,0.0000,0.0000,0.5501,0.0000,0.0000,0.0000,0.0004,0.0000,0.0000,0.0000,0.0413,0.1182,0.0000,0.0137,0.0000,0.0001,0.0114,0.0000,0.0000,0.0024,0.1690,0.0002,0.0000,0.0000,0.0551,0.0000,0.0000,0.1559,0.0017,0.0001,0.0000,0.0000,0.0001,0.0000,0.0001,0.0000,0.0000,0.0000,0.0000,0.0147,0.0555,0.0000,0.0000,0.1000,0.0000,0.0000,0.0004,0.0000,0.0000,0.0000,0.0799,0.0326,0.0000,0.0493,0.0000,0.0000,0.0687,0.0596,0.0000,0.0000,0.0128,0.0000,0.0000,0.0000,0.0000,0.0007,0.0257,0.0933,0.2104,0.0000,0.0000,0.0014,0.0000,0.0097,0.0007,0.0000,0.0000,0.0000,0.0000,0.2739,0.0003,0.0698,0.1061,0.0000,0.0000,0.2040,0.0000,0.0000,0.0255,0.0000,0.0000,0.0000,0.0000,0.0398,0.0000,0.1650,0.0000,0.0858,0.0034,0.3759,0.0000,0.0000,0.0130,0.0004,0.1181,0.0000,0.0297,0.0000,0.0000,0.0017,0.0459,0.0014,0.0433,0.0038,0.0000,0.0935,0.0000,0.0000,0.0000,0.0134,0.0000,-12.6443,0.0000,0.0000,0.0000,0.0169,0.0000,0.0377,0.0425,0.0437,0.0000,0.0529,0.0000,0.0000,0.0645,0.0000,0.0000,0.0000,0.0000,0.0044,0.0000,0.0125,0.0987,0.0000,0.0000,0.0000,0.0000,1.1323,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0026,0.0000,0.0000,0.0000,0.0000,0.0018,0.0000,0.0000,0.1972,0.0027,0.0000,0.0000,0.0000,0.0613,0.0844,0.0857,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
ep_white,10.7591,19.5334,25.4287,23.1759,6.2559,5.7140,3.1983,2.6292,17.8351,20.7640,2.0544,41.6759,1.8560,2.7574,33.7452,61.0143,3.9025,22.9082,56.9549,6.0168,21.3320,6.6801,15.0000,17.0875,2.2534,76.7173,3.6573,3.0530,2.5927,3.2033,41.5975,16.5456,53.3678,5.6046,11.4502,13.0285,2.1526,4.2998,38.6612,17.0358,3.9206,2.7253,1.7514,3.3835,32.8712,20.5893,21.1215,8.7000,2.1000,47.3927,46.1970,12.3684,8.6984,4.9000,3.0336,18.3224,13.6151,4.1492,21.2338,1.9302,2.7172,8.8687,12.0938,25.8661,38.7568,72.5574,16.3084,49.1656,12.6112,29.6210,60.9415,8.5335,68.9144,5.3608,1.9993,4.4275,5.2764,6.3871,72.6042,13.7612,13.3175,45.3839,42.5566,3.9173,10.4855,37.8000,74.7638,53.4791,31.9943,55.4398,2.6801,56.4071,29.1115,44.5541,61.7132,13.5962,30.9677,33.6246,13.0843,54.5024,3.3022,36.3524,54.8570,5.6144,15.8959,52.0000,47.7010,21.8724,60.0699,23.5691,46.3320,64.9294,46.0653,22.2668,58.8023,56.2602,5.2125,55.1600,26.1369,10.4000,36.1913,48.1211,43.1687,67.8136,2.1250,22.5052,21.9646,69.6000,3.2308,15.5671,64.5531,41.7093,54.1529,62.1342,32.9545,18.4000,12.4942,18.6702,43.2914,84.3645,41.8333,51.2371,25.5174,36.1283,1.4538,61.4590,56.4292,57.5265,40.7341,49.0025,56.6195,19.8259,60.3677,9.0134,63.2000,55.1152,47.3698,44.5350,56.5633,60.6373,74.8878,4.6860,37.0368,53.1000,61.4346,47.9375,52.2524,42.8972,30.2000,29.2588,55.1944,15.6699,27.4556,60.3000,62.6927,15.0258,55.7234,18.7673,82.6294,58.6976,54.9774,8.7000,48.4529,57.5000,76.1000,79.1471,52.1000
ep_twomore,3.0765,2.7199,4.1145,4.2121,1.4718,2.2981,1.6115,0.8451,1.9685,2.3974,1.8489,3.2125,1.4164,1.6357,3.9427,4.5833,2.8490,3.9573,4.0345,1.6611,1.8446,4.7516,3.3000,4.9143,5.3916,2.6845,5.3733,1.3816,1.5814,1.8585,3.4786,3.2988,4.4961,3.3761,1.1440,2.8921,0.7526,5.3272,3.7029,4.3500,1.3248,1.4030,1.3332,1.7266,4.7370,3.5450,3.3169,2.3000,1.8000,2.9144,4.1337,3.6325,2.2915,2.7333,1.4791,2.9211,6.2089,0.6403,3.1821,1.6034,1.3918,4.3318,4.2445,3.7998,1.5584,2.8557,2.0242,3.1205,3.1723,1.9193,3.4215,1.8548,3.5507,8.9785,3.1664,0.5039,2.9083,2.3050,3.2152,4.1827,2.1720,3.7133,1.5612,4.3909,1.2782,3.3274,2.7974,3.9919,1.0235,2.6201,3.7258,3.9488,2.4378,4.8548,3.4510,1.8914,4.1132,1.7952,3.7350,6.2782,2.8249,4.9897,3.7827,2.6576,1.6605,5.1072,5.7636,3.4703,3.1960,0.8606,2.9360,4.6068,2.7369,3.8175,5.0358,4.0735,1.7357,1.5867,2.3538,2.4000,5.4087,2.7689,2.0737,5.1882,2.6170,1.9443,-10.5127,6.5000,2.2192,4.9079,2.7188,3.4835,3.0665,3.8904,3.0994,4.5698,4.9210,1.7677,2.6288,1.3355,3.7995,4.5795,1.5411,1.8283,2.1484,2.9689,1.9958,3.2146,3.0084,1.3543,1.9107,1.7152,5.6032,2.2984,3.5000,3.7443,2.7423,1.3367,2.3733,5.4206,1.8756,0.5301,3.8211,1.9551,4.2154,6.7875,4.2048,2.1642,2.5000,2.3059,1.8694,1.7356,2.5111,2.4000,3.4878,1.4677,3.6281,1.9408,1.4647,2.5357,4.6452,2.3000,2.5235,5.3000,1.2000,1.3235,4.1000
ep_otherrace,0.9779,2.3837,0.9041,0.9645,0.4192,1.8550,0.7708,0.5355,0.4406,0.4966,0.7046,1.3674,0.5394,0.3825,0.5849,0.8952,1.5832,0.6441,0.6224,0.4844,0.9654,5.3386,1.2000,0.7734,5.5988,0.2588,0.6034,0.9442,0.3022,2.0909,0.2721,0.7076,0.3882,6.8437,0.6361,0.7767,0.1642,0.5474,2.1874,0.7072,0.5114,1.0964,0.3745,0.7472,0.8051,0.5082,0.4080,0.7000,3.0000,0.0159,0.7455,0.2397,0.7095,0.1333,0.4059,0.5392,8.6533,0.5249,0.7687,1.1973,0.6533,5.1272,3.4744,1.0145,0.4703,0.6875,1.5818,0.8099,2.6131,0.4059,0.3837,1.4325,0.6397,11.0244,1.8142,0.4997,1.5961,1.7445,0.3079,0.6934,0.7675,1.6811,0.4667,0.7030,0.5990,0.6992,0.4862,0.4764,0.7187,0.3840,3.1374,0.4179,0.8953,0.8019,0.4398,0.7971,1.0044,0.7561,4.3913,0.6394,1.5919,1.1341,1.2673,1.3458,0.5867,0.8243,0.6816,1.2368,0.5093,0.6096,0.3720,0.4147,1.0142,0.9071,0.2775,0.4602,0.4099,1.1267,0.6631,0.8000,1.0609,1.1346,0.4202,0.5036,2.1955,0.4261,-12.3253,0.6000,0.9173,4.8237,1.9519,0.2691,0.8293,0.4178,1.1293,3.1744,2.6065,0.2374,0.4604,0.2419,0.6995,0.9106,0.7389,0.7500,2.4275,0.2639,1.6542,0.2132,0.7946,0.7420,2.1000,0.5943,0.8202,0.9961,0.4000,0.1835,1.1987,0.8917,0.4150,0.5765,0.2366,0.5000,0.8711,0.4245,0.3577,0.5375,0.6000,0.4128,0.3000,2.8941,0.5028,0.5438,0.4611,0.5000,0.3951,0.6742,1.2484,0.5531,0.6647,0.4786,0.8097,0.7000,0.4235,0.5000,0.4000,0.5235,0.9000
rpl_themes,0.9694,0.9640,0.9270,0.9298,0.9888,0.9845,0.9962,0.9984,0.9725,0.9602,0.9979,0.9494,0.9985,0.9974,0.7801,0.3961,0.9682,0.9183,0.6906,0.9909,0.9781,0.8862,0.9845,0.9278,0.9105,0.5378,0.9933,0.9925,0.9965,0.9933,0.9212,0.9384,0.7875,0.9471,0.9870,0.9831,0.9951,0.9694,0.8872,0.8996,0.9921,0.9931,0.9952,0.9945,0.8881,0.9404,0.9247,0.9925,0.9937,0.9911,0.8473,0.9450,0.9909,0.9760,0.9971,0.9263,0.9267,0.9878,0.9531,0.9865,0.9943,0.9294,0.9627,0.8916,0.9444,0.5769,0.9703,0.8462,0.9060,0.9112,0.6857,0.9805,0.5234,0.8760,0.9319,0.9632,0.9837,0.9643,0.4954,0.9230,0.9706,0.8180,0.9344,0.9119,0.9448,0.9164,0.5451,0.7432,0.9442,0.8954,-2.4252,0.8261,0.9788,0.7642,0.7518,0.9763,0.9077,0.9515,0.9596,0.6721,0.9617,0.8480,0.9620,0.7996,0.9341,0.8567,0.7963,0.9446,0.8905,0.8925,0.7325,0.9387,0.9587,0.9314,0.4907,0.7145,0.9424,0.7743,0.9125,0.9708,0.8588,0.9599,0.8814,0.9524,0.8174,0.9052,-11.6977,0.6370,0.7059,0.8964,0.6556,0.9205,0.9386,0.5312,0.9213,0.8152,0.9564,0.9492,0.9371,0.4412,0.8909,0.9204,0.9576,0.8567,0.7489,0.8641,0.8658,0.9056,0.9652,0.9135,0.9433,0.9021,0.7827,0.9510,0.8739,0.9775,0.9599,0.8709,0.8938,0.6327,0.5291,0.9625,0.8969,0.9060,0.8649,0.7095,0.8059,0.8815,0.8240,0.7721,0.8570,0.9145,0.8954,0.7861,0.8744,0.9445,0.9654,0.9197,0.5606,0.8112,0.8414,0.9925,0.8489,0.6233,0.5333,0.5460,0.9427


## **This is for the race composites and neighborhoods bar char use**

# **Step 4 We want a dataframe that has neighborhoods as columns and average_year_eviction_nta_count as rows and contents**

In [36]:
evi_svi_df = evictions_pre_post.groupby('nta')[['average_year_eviction_nta_count','rpl_themes']].mean()
evi_svi_df.sort_values('average_year_eviction_nta_count', ascending=False, inplace=True)
evi_svi_df

Unnamed: 0_level_0,average_year_eviction_nta_count,rpl_themes
nta,Unnamed: 1_level_1,Unnamed: 2_level_1
Central Harlem North-Polo Grounds,0.0109,0.9694
Woodlawn-Wakefield,0.0074,0.964
Flatbush,0.0039,0.927
Crown Heights North,0.0038,0.9298
Bedford Park-Fordham North,0.0038,0.9888
Williamsbridge-Olinville,0.0037,0.9845
East Tremont,0.0037,0.9962
Mott Haven-Port Morris,0.0036,0.9984
Washington Heights South,0.0035,0.9725
Marble Hill-Inwood,0.0034,0.9602


In [37]:
evi_svi_df.reset_index(inplace=True)
evi_svi_df

Unnamed: 0,nta,average_year_eviction_nta_count,rpl_themes
0,Central Harlem North-Polo Grounds,0.0109,0.9694
1,Woodlawn-Wakefield,0.0074,0.964
2,Flatbush,0.0039,0.927
3,Crown Heights North,0.0038,0.9298
4,Bedford Park-Fordham North,0.0038,0.9888
5,Williamsbridge-Olinville,0.0037,0.9845
6,East Tremont,0.0037,0.9962
7,Mott Haven-Port Morris,0.0036,0.9984
8,Washington Heights South,0.0035,0.9725
9,Marble Hill-Inwood,0.0034,0.9602


In [38]:
avg_per_nta

np.float64(0.0012530904589788123)

In [39]:
average_svi = 0.80198
# https://www.atsdr.cdc.gov/place-health/php/svi/svi-interactive-map.html

In [40]:
evi_svi_df['above_eviction_average'] = evi_svi_df['average_year_eviction_nta_count'] > avg_per_nta
evi_svi_df

Unnamed: 0,nta,average_year_eviction_nta_count,rpl_themes,above_eviction_average
0,Central Harlem North-Polo Grounds,0.0109,0.9694,True
1,Woodlawn-Wakefield,0.0074,0.964,True
2,Flatbush,0.0039,0.927,True
3,Crown Heights North,0.0038,0.9298,True
4,Bedford Park-Fordham North,0.0038,0.9888,True
5,Williamsbridge-Olinville,0.0037,0.9845,True
6,East Tremont,0.0037,0.9962,True
7,Mott Haven-Port Morris,0.0036,0.9984,True
8,Washington Heights South,0.0035,0.9725,True
9,Marble Hill-Inwood,0.0034,0.9602,True


In [41]:
evi_svi_df['above_svi_average'] = evi_svi_df['rpl_themes'] > average_svi
evi_svi_df

Unnamed: 0,nta,average_year_eviction_nta_count,rpl_themes,above_eviction_average,above_svi_average
0,Central Harlem North-Polo Grounds,0.0109,0.9694,True,True
1,Woodlawn-Wakefield,0.0074,0.964,True,True
2,Flatbush,0.0039,0.927,True,True
3,Crown Heights North,0.0038,0.9298,True,True
4,Bedford Park-Fordham North,0.0038,0.9888,True,True
5,Williamsbridge-Olinville,0.0037,0.9845,True,True
6,East Tremont,0.0037,0.9962,True,True
7,Mott Haven-Port Morris,0.0036,0.9984,True,True
8,Washington Heights South,0.0035,0.9725,True,True
9,Marble Hill-Inwood,0.0034,0.9602,True,True


# **Step 4.3 Run Chi-Square test**
  **Null Hypothesis: There is no association between neighborhoods that have above the average of black and hispanic percentage and have above than average svi scores**

In [42]:
contingency_table = pd.crosstab(evi_svi_df.above_svi_average, evi_svi_df.above_eviction_average)
contingency_table

above_eviction_average,False,True
above_svi_average,Unnamed: 1_level_1,Unnamed: 2_level_1
False,31,7
True,85,64


In [43]:
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
chi2, p_value, dof, expected

(np.float64(6.730235146153845),
 np.float64(0.009479210274470042),
 1,
 array([[23.57219251, 14.42780749],
        [92.42780749, 56.57219251]]))

### **P-value is 0.0095 and chi-square is 6.73. We can reject the null hypothesis. The Chi-square suggests a strong-to-moderate deviation from the distribution if the two groups/categories are independent.**

# **conclusion: there is statistically significant association between the two "above average eviction" and "above average svi scores" neighborhoods.**

In [44]:
a = evi_svi_df[(evi_svi_df['above_svi_average'] == True) & (evi_svi_df['above_eviction_average'] == True)].shape[0]
b = evi_svi_df[(evi_svi_df['above_svi_average'] == True) & (evi_svi_df['above_eviction_average'] == False)].shape[0]
c = evi_svi_df[(evi_svi_df['above_svi_average'] == False) & (evi_svi_df['above_eviction_average'] == True)].shape[0]
d = evi_svi_df[(evi_svi_df['above_svi_average'] == False) & (evi_svi_df['above_eviction_average'] == False)].shape[0]

In [45]:
observed = np.array([[a, b], [c, d]])
observed

array([[64, 85],
       [ 7, 31]])

In [46]:
# a += 0.5
# b += 0.5
# c += 0.5
# d += 0.5
# # avoid 0 divisions

In [47]:
odds_ratio = (a * d) / (b * c)
odds_ratio
# extremely strong association between high svi and high evitction rates.

3.334453781512605

**This means, technically, neighborhoods with above average svi had 3.3345 times higher odds of having above average eviction rates compared to low-SVI neighborhoods.** This high odds may be due to all high svi neighborhoods had high evictions, and all high eviction neighborhoods had high svi from the sample.

In [48]:
chi2, p_value, dof, expected = chi2_contingency(observed)
chi2, p_value

(np.float64(6.730235146153846), np.float64(0.009479210274470046))

# **Step 5: We also need a dataframe that has neighborhoods as columns and black and hispanics percentages as rows and contents**

In [49]:
racial_avg_all['black_hispanic_pct'] = racial_avg_all['ep_afam'] + racial_avg_all['ep_hisp']
racial_avg_all

Unnamed: 0,nta,average_year_eviction_nta_count,ep_afam,ep_asian,ep_hisp,ep_nhpi,ep_white,ep_twomore,ep_otherrace,rpl_themes,black_hispanic_pct
0,Central Harlem North-Polo Grounds,0.0109,53.1706,3.806,28.1355,0.0001,10.7591,3.0765,0.9779,0.9694,81.3062
1,Woodlawn-Wakefield,0.0074,46.3838,2.3225,26.4032,0.0,19.5334,2.7199,2.3837,0.964,72.787
2,Flatbush,0.0039,50.3284,6.2411,12.8934,0.0136,25.4287,4.1145,0.9041,0.927,63.2219
3,Crown Heights North,0.0038,54.5405,2.4402,14.5104,0.0004,23.1759,4.2121,0.9645,0.9298,69.0509
4,Bedford Park-Fordham North,0.0038,16.0227,3.1098,72.5584,0.0,6.2559,1.4718,0.4192,0.9888,88.5811
5,Williamsbridge-Olinville,0.0037,46.3329,4.3523,38.9846,0.0,5.714,2.2981,1.855,0.9845,85.3175
6,East Tremont,0.0037,28.0218,1.3305,64.7416,0.0,3.1983,1.6115,0.7708,0.9962,92.7634
7,Mott Haven-Port Morris,0.0036,25.111,0.5096,69.6073,0.5501,2.6292,0.8451,0.5355,0.9984,94.7183
8,Washington Heights South,0.0035,9.4778,4.0466,66.1568,0.0,17.8351,1.9685,0.4406,0.9725,75.6346
9,Marble Hill-Inwood,0.0034,6.8807,3.3681,65.9,0.0,20.764,2.3974,0.4966,0.9602,72.7807


In [50]:
evi_bh_df = racial_avg_all[['nta', 'average_year_eviction_nta_count', 'black_hispanic_pct']]
evi_bh_df

Unnamed: 0,nta,average_year_eviction_nta_count,black_hispanic_pct
0,Central Harlem North-Polo Grounds,0.0109,81.3062
1,Woodlawn-Wakefield,0.0074,72.787
2,Flatbush,0.0039,63.2219
3,Crown Heights North,0.0038,69.0509
4,Bedford Park-Fordham North,0.0038,88.5811
5,Williamsbridge-Olinville,0.0037,85.3175
6,East Tremont,0.0037,92.7634
7,Mott Haven-Port Morris,0.0036,94.7183
8,Washington Heights South,0.0035,75.6346
9,Marble Hill-Inwood,0.0034,72.7807


In [51]:
average_bh_pct = evi_bh_df['black_hispanic_pct'].mean()
average_bh_pct

np.float64(49.541793265928135)

In [52]:
avg_per_nta, average_bh_pct
# evictions and black + hispanic pct

(np.float64(0.0012530904589788123), np.float64(49.541793265928135))

In [53]:
evi_bh_df['above_evi_average'] = evi_bh_df['average_year_eviction_nta_count'] > avg_per_nta
evi_bh_df

Unnamed: 0,nta,average_year_eviction_nta_count,black_hispanic_pct,above_evi_average
0,Central Harlem North-Polo Grounds,0.0109,81.3062,True
1,Woodlawn-Wakefield,0.0074,72.787,True
2,Flatbush,0.0039,63.2219,True
3,Crown Heights North,0.0038,69.0509,True
4,Bedford Park-Fordham North,0.0038,88.5811,True
5,Williamsbridge-Olinville,0.0037,85.3175,True
6,East Tremont,0.0037,92.7634,True
7,Mott Haven-Port Morris,0.0036,94.7183,True
8,Washington Heights South,0.0035,75.6346,True
9,Marble Hill-Inwood,0.0034,72.7807,True


In [54]:
evi_bh_df['above_bh_average'] = evi_bh_df['black_hispanic_pct'] > average_bh_pct
evi_bh_df

Unnamed: 0,nta,average_year_eviction_nta_count,black_hispanic_pct,above_evi_average,above_bh_average
0,Central Harlem North-Polo Grounds,0.0109,81.3062,True,True
1,Woodlawn-Wakefield,0.0074,72.787,True,True
2,Flatbush,0.0039,63.2219,True,True
3,Crown Heights North,0.0038,69.0509,True,True
4,Bedford Park-Fordham North,0.0038,88.5811,True,True
5,Williamsbridge-Olinville,0.0037,85.3175,True,True
6,East Tremont,0.0037,92.7634,True,True
7,Mott Haven-Port Morris,0.0036,94.7183,True,True
8,Washington Heights South,0.0035,75.6346,True,True
9,Marble Hill-Inwood,0.0034,72.7807,True,True


### **This is the neighborhood evictions + black/hispanic chi-test df:**

# **Run Chi-Square test**
  **Null Hypothesis: There is no association between neighborhoods that have above the average of black and hispanic percentage and have above than average svi scores**

In [55]:
contingency_table = pd.crosstab(evi_bh_df.above_bh_average, evi_bh_df.above_evi_average)
contingency_table

above_evi_average,False,True
above_bh_average,Unnamed: 1_level_1,Unnamed: 2_level_1
False,91,16
True,25,55


In [56]:
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
chi2, p_value, dof, expected

(np.float64(53.98752613823855),
 np.float64(2.0176646792192656e-13),
 1,
 array([[66.37433155, 40.62566845],
        [49.62566845, 30.37433155]]))

In [57]:
p_value

np.float64(2.0176646792192656e-13)

### **P-value is 0.00032 (extremely small) and chi-squiare is 53.9875. We can refute the null hypothesis. We can say we are confident that high "black+hispanic percetage" boroughs are more likely to have high evictions (or vice versa) or there is a strong association between high BH pct and high eviction rates in the full dataset**

# **conclusion: there is a statistically significant association between the two "above average eviction" and "above average black+hispanic pct" neighborhoods.**

In [58]:
a = evi_bh_df[(evi_bh_df['above_bh_average'] == True) & (evi_bh_df['above_evi_average'] == True)].shape[0]
b = evi_bh_df[(evi_bh_df['above_bh_average'] == True) & (evi_bh_df['above_evi_average'] == False)].shape[0]
c = evi_bh_df[(evi_bh_df['above_bh_average'] == False) & (evi_bh_df['above_evi_average'] == True)].shape[0]
d = evi_bh_df[(evi_bh_df['above_bh_average'] == False) & (evi_bh_df['above_evi_average'] == False)].shape[0]

In [59]:
observed = np.array([[a, b], [c, d]])
observed

array([[55, 25],
       [16, 91]])

In [60]:
odds_ratio = (a * d) / (b * c)
odds_ratio

12.5125

**This means, technically, neighborhoods with above average b+h pct had 12.5125 times higher odds of having above average eviction rates compared to low-bh-pct neighborhoods in nyc.** This high odds may be due to all high bh pct neighborhoods had high evictions, and all high eviction neighborhoods had high bh pct from the sample.

In [61]:
chi2, p_value, dof, expected = chi2_contingency(observed)
chi2, p_value
# 11.32 is a strong deviation from expected undert the null hypothesis

(np.float64(53.98752613823855), np.float64(2.0176646792192656e-13))

# **Quick extreme cases**

In [86]:
evi_svi_df_top_15 = evi_svi_df.sort_values('average_year_eviction_nta_count', ascending=False).head(15)
# 2 of the bottom 10 were in cemetery areas
evi_svi_df_bottom_15 = evi_svi_df.sort_values('average_year_eviction_nta_count', ascending=False).tail(15)
evi_svi_df_extremes = pd.concat([evi_svi_df_top_15, evi_svi_df_bottom_15])
evi_svi_df_extremes

Unnamed: 0,nta,average_year_eviction_nta_count,rpl_themes,above_eviction_average,above_svi_average
0,Central Harlem North-Polo Grounds,0.0109,0.9694,True,True
1,Woodlawn-Wakefield,0.0074,0.964,True,True
2,Flatbush,0.0039,0.927,True,True
3,Crown Heights North,0.0038,0.9298,True,True
4,Bedford Park-Fordham North,0.0038,0.9888,True,True
5,Williamsbridge-Olinville,0.0037,0.9845,True,True
6,East Tremont,0.0037,0.9962,True,True
7,Mott Haven-Port Morris,0.0036,0.9984,True,True
8,Washington Heights South,0.0035,0.9725,True,True
9,Marble Hill-Inwood,0.0034,0.9602,True,True


In [87]:
contingency_table = pd.crosstab(evi_svi_df_extremes.above_svi_average, evi_svi_df_extremes.above_eviction_average)
contingency_table

above_eviction_average,False,True
above_svi_average,Unnamed: 1_level_1,Unnamed: 2_level_1
False,5,1
True,10,14


In [88]:
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
chi2, p_value, dof, expected

(np.float64(1.875),
 np.float64(0.17090352023079358),
 1,
 array([[ 3.,  3.],
        [12., 12.]]))

### **P-value is 0.0935 (small) and chi-squiare is 2.8125. We can refute the null hypothesis. We can say we are confident that high "svi" boroughs are more likely to have high evictions (or vice versa) or there is a strong association between high SVI and high eviction rates in the full dataset**

# **conclusion: there is a statistically significant association between the two "above average eviction" and "above average black+hispanic pct" neighborhoods.**

In [94]:
a = evi_svi_df_extremes[(evi_svi_df_extremes['above_svi_average'] == True) & (evi_svi_df_extremes['above_eviction_average'] == True)].shape[0]
b = evi_svi_df_extremes[(evi_svi_df_extremes['above_svi_average'] == True) & (evi_svi_df_extremes['above_eviction_average'] == False)].shape[0]
c = evi_svi_df_extremes[(evi_svi_df_extremes['above_svi_average'] == False) & (evi_svi_df_extremes['above_eviction_average'] == True)].shape[0]
d = evi_svi_df_extremes[(evi_svi_df_extremes['above_svi_average'] == False) & (evi_svi_df_extremes['above_eviction_average'] == False)].shape[0]

In [96]:
observed = np.array([[a, b], [c, d]])
observed

array([[14, 10],
       [ 1,  5]])

In [95]:
# a += 0.5
# b += 0.5
# c += 0.5
# d += 0.5

In [97]:
odds_ratio = (a * d) / (b * c)
odds_ratio

7.0

**This means, technically, neighborhoods with above average b+h pct had 21 times higher odds of having above average eviction rates compared to low-bh-pct neighborhoods in nyc.** This high odds may be due to all high bh pct neighborhoods had high evictions, and all high eviction neighborhoods had high bh pct from the sample.

In [98]:
evi_bh_df_top_15 = evi_bh_df.sort_values('average_year_eviction_nta_count', ascending=False).head(15)
evi_bh_df_bottom_15 = evi_bh_df.sort_values('average_year_eviction_nta_count', ascending=False).tail(15)
evi_bh_df_extremes = pd.concat([evi_bh_df_top_15, evi_bh_df_bottom_15])
evi_bh_df_extremes

Unnamed: 0,nta,average_year_eviction_nta_count,black_hispanic_pct,above_evi_average,above_bh_average
0,Central Harlem North-Polo Grounds,0.0109,81.3062,True,True
1,Woodlawn-Wakefield,0.0074,72.787,True,True
2,Flatbush,0.0039,63.2219,True,True
3,Crown Heights North,0.0038,69.0509,True,True
4,Bedford Park-Fordham North,0.0038,88.5811,True,True
5,Williamsbridge-Olinville,0.0037,85.3175,True,True
6,East Tremont,0.0037,92.7634,True,True
7,Mott Haven-Port Morris,0.0036,94.7183,True,True
8,Washington Heights South,0.0035,75.6346,True,True
9,Marble Hill-Inwood,0.0034,72.7807,True,True


In [99]:
contingency_table = pd.crosstab(evi_bh_df_extremes.above_bh_average, evi_bh_df_extremes.above_evi_average)
contingency_table

above_evi_average,False,True
above_bh_average,Unnamed: 1_level_1,Unnamed: 2_level_1
False,14,2
True,1,13


In [100]:
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
chi2, p_value, dof, expected

(np.float64(16.205357142857142),
 np.float64(5.683316915964556e-05),
 1,
 array([[8., 8.],
        [7., 7.]]))

### **P-value is 0.00032 (extremely small) and chi-squiare is 12.929. We can refute the null hypothesis. We can say we are confident that high "black+hispanic" boroughs are more likely to have high evictions (or vice versa) or there is a strong association between high black+hispanic and high eviction rates in the full dataset**

# **conclusion: there is a statistically significant association between the two "above average eviction" and "above average black+hispanic pct" neighborhoods.**

In [104]:
a = evi_bh_df_extremes[(evi_bh_df_extremes['above_bh_average'] == True) & (evi_bh_df_extremes['above_evi_average'] == True)].shape[0]
b = evi_bh_df_extremes[(evi_bh_df_extremes['above_bh_average'] == True) & (evi_bh_df_extremes['above_evi_average'] == False)].shape[0]
c = evi_bh_df_extremes[(evi_bh_df_extremes['above_bh_average'] == False) & (evi_bh_df_extremes['above_evi_average'] == True)].shape[0]
d = evi_bh_df_extremes[(evi_bh_df_extremes['above_bh_average'] == False) & (evi_bh_df_extremes['above_evi_average'] == False)].shape[0]

In [105]:
observed = np.array([[a, b], [c, d]])
observed

array([[13,  1],
       [ 2, 14]])

In [103]:
# a += 0.5
# b += 0.5
# c += 0.5
# d += 0.5

In [106]:
odds_ratio = (a * d) / (b * c)
odds_ratio

91.0

**This means, technically, neighborhoods with above average b+h pct had 45 times higher odds of having above average eviction rates compared to low-bh-pct neighborhoods in nyc.** This high odds may be due to all high bh pct neighborhoods had high evictions, and all high eviction neighborhoods had high bh pct from the sample.