# **Introduction**

A summary and collection of all the cleaned final and semi-final datasets in one notebook for references. Most of them were thoroughly cleaned with no nans. Some of them kept some nans as they were wip dataset that would be merged with a new dataset soon. Not dropping all the rows that had nans might help preserve some rows and be better for further analysis. Dropping the nans immediately with those WIP datasets would make them ready for analysis, too, if necessary.

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import datetime as dt
import matplotlib
import matplotlib.pyplot as plt
import os
import io
import geopandas as gpd
import seaborn as sns
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler


# suppress warning
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

In [2]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# **1. The completely merged Evictions, BBL, SVI, and 311 complaints df, with no nans.**

In [7]:
normal_times = "/content/drive/My Drive/X999/bbl_evictions_311_svi_normal_times.csv"
covid = "/content/drive/My Drive/X999/bbl_evictions_311_svi_covid.csv"

In [11]:
df_1 = pd.read_csv(normal_times)
df_2 = pd.read_csv(covid)

In [12]:
df_1.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,*308072/22_5865,3037420029,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0
1,*313639/23_5202,3057940012,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0
2,*324973/22_5308,3057820030,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6,1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,10.0
3,*53336/16_170279,2032510420,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),0.8,1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,medium-high,6.0,0.0,0.0,0.0,9.0,0.0,2.0,5.0,5.0,0.0,23.0,0.0,145.0,0.0,2.0,41.0,0.0,0.0,1.0,5.0,4.0,248.0
4,*5990/17_2703,2025770038,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1.6,1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929,10455,44380.0,0.9971,0.9909,0.9972,0.9499,0.9971,48.5,12.5,32.1,9.5,10.1,28.1,19.5,17.9,75.1,14.5,51.9,21.1,74.1,1.1,0.0,0.0,1.4,1.0,98.6,1.4,False,Q4 (High),high,0.0,5.0,8.0,0.0,21.0,8.0,34.0,10.0,9.0,0.0,89.0,0.0,78.0,0.0,5.0,41.0,1.0,0.0,3.0,31.0,13.0,356.0


In [13]:
df_2.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,004123/20_209969,2032140141,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0
1,0050153/20_106030,4031560133,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0
2,0052002/19_101926,3051370021,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6,1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.93,0.4536,0.9639,0.9692,0.922,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
3,0057757/18_100889,3011850034,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8,1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.933,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low),0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,12.0,0.0,17.0,0.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,45.0
4,0061902/19_117253,4033220043,0061902/19,117253,83-33 118TH STREET,5N,2020-02-14,QUEENS,11415,Not an Ejectment,Possession,40.706235,-73.834603,9.0,29.0,134.0,4079390,Kew Gardens,2020,2020-02,POINT (-73.834603 40.706235),0.4,1979.0,D1,6.0,79.0,CIAMPA METROPOLITAN CO,72147.0,post-war,elevator,False,mid-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternativ...","1976–1990, fiscal crisis and recovery",21-100 units,False,very large,Q4 (largest 25%),1970-1979,11415,20315.0,0.7661,0.5573,0.898,0.9396,0.8761,14.6,5.6,11.8,4.7,17.0,18.0,10.9,7.5,44.3,8.5,32.3,6.7,22.9,22.3,0.2,0.0,3.4,2.1,57.7,42.3,False,Q1 (Low),0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,9.0,0.0,19.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0,38.0


In [15]:
df_1.shape, df_2.shape, df_1.isna().sum().sum(), df_2.isna().sum().sum(), df_1.duplicated().sum(), df_2.duplicated().sum()
# one less column due to 'svi_group' that was only present in normal times for regression analysis.

((66397, 92), (5386, 91), np.int64(0), np.int64(0), np.int64(0), np.int64(0))

# **2. The cleaned evictions dataset**

In [21]:
normal_times = '/content/drive/My Drive/X999/evictions_pre_post_covid.csv'
covid = '/content/drive/My Drive/X999/evictions_covid.csv'
all_years = '/content/drive/My Drive/X999/evictions_df_cleaned.csv'

In [22]:
df_3 = pd.read_csv(normal_times)
df_4 = pd.read_csv(covid)
df_5 = pd.read_csv(all_years)

In [18]:
df_3.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,eviction_postcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count
0,*308072/22_5865,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,3037420029,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8
1,*313639/23_5202,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,3057940012,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6
2,*324973/22_5308,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,3057820030,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6
3,*53336/16_170279,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,2032510420,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),0.8
4,*5990/17_2703,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,2025770038,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1.6


In [19]:
df_4.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,eviction_postcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count
0,004123/20_209969,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,2032140141,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2
1,0050153/20_106030,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,4031560133,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2
2,0052002/19_101926,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,3051370021,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6
3,0057757/18_100889,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,3011850034,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8
4,0058466/19_104327,0058466/19,104327,635 WEST 42ND STREET,UNIT 18B,2020-03-12,MANHATTAN,10036,Not an Ejectment,Possession,40.761463,-73.999816,4.0,3.0,129.0,1087539,1010907501,Clinton,2020,2020-03,POINT (-73.999816 40.761463),0.2


In [23]:
df_5.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,eviction_postcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year
0,*308072/22_5865,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989.0,3037420000.0,East New York,2024,2024-12
1,*313639/23_5202,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881.0,3057940000.0,Sunset Park East,2024,2024-03
2,*324973/22_5308,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435.0,3057820000.0,Sunset Park West,2024,2024-08
3,*53336/16_170279,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444.0,2032510000.0,Van Cortlandt Village,2018,2018-10
4,*5990/17_2703,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900.0,2025770000.0,Mott Haven-Port Morris,2019,2019-08


In [28]:
df_3.shape, df_4.shape, df_5.shape, df_3.isna().sum().sum(), df_4.isna().sum().sum(), df_5.isna().sum().sum(), \
df_3.duplicated().sum(), df_4.duplicated().sum(), df_5.duplicated().sum(), 76718 + 6564 # = 83282
# the two more columns are "geometry", "average_year_eviction_count", as the df_3, df_4 are geopanda and had avergae_year_eviction_count
# for each building per year.


((76718, 22),
 (6564, 22),
 (83282, 20),
 np.int64(0),
 np.int64(0),
 np.int64(0),
 np.int64(0),
 np.int64(0),
 np.int64(0),
 83282)

# **3. The cleaned BBL dataset**

In [29]:
bbl = '/content/drive/My Drive/X999/bbl_cleaned.csv'

In [30]:
df_6 = pd.read_csv(bbl)

In [31]:
df_6.head()

Unnamed: 0,borough,block,lot,community board,census tract 2010,cb2010,schooldist,council district,postcode,firecomp,policeprct,healtharea,sanitboro,sanitsub,address,zonedist1,zonedist2,zonedist3,overlay1,overlay2,spdist1,ltdheight,splitzone,bldgclass,landuse,easements,ownertype,ownername,lotarea,bldgarea,comarea,resarea,officearea,retailarea,garagearea,strgearea,factryarea,otherarea,areasource,numbldgs,numfloors,unitsres,unitstotal,lotfront,lotdepth,bldgfront,bldgdepth,ext,proxcode,irrlotcode,lottype,bsmtcode,assessland,assesstot,exempttot,yearbuilt,yearalter1,yearalter2,histdist,landmark,builtfar,residfar,commfar,facilfar,borocode,bbl,condono,tract2010,xcoord,ycoord,latitude,longitude,zonemap,zmcode,sanborn,taxmap,appbbl,appdate,plutomapid,version,sanitdistrict,healthcenterdistrict,firm07_flag,pfirm15_flag,dcpedited,building_category,building_type,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade
0,BK,8366,222,318.0,696.02,2002.0,22.0,46.0,11234.0,E323,63.0,8822.0,3.0,4E,6815 AVENUE N,R3-1,,,,,,,False,A5,1.0,0.0,,"EAST 69 AVENUE N DEVELOPMENT, LLC",2241.0,1288.0,0.0,1288.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,2,1,1.0,27.0,83.0,19.0,33.0,,2.0,False,0.0,2.0,5918.0,5918.0,0.0,2019.0,0.0,0.0,,,0.57,0.5,0.0,1.0,3,3083660222,,69602.0,1008419.0,165883.0,40.621954,-73.912938,23b,,315 067,3.0,3083660000.0,04/26/2019,1,20v5,18.0,35.0,,,,single-family,post-war,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","2009–present, post-financial crisis",single-unit,True,very small,Q1 (smallest 25%),2010-2020
1,BK,2571,28,301.0,561.0,1005.0,14.0,33.0,11222.0,L106,94.0,100.0,3.0,1A,87 CALYER STREET,M1-2/R6B,,,,,MX-8,,False,A5,1.0,0.0,,85 CALYER STREET LLC,1862.0,3478.0,0.0,3478.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,3,1,1.0,18.0,100.0,18.0,60.0,N,3.0,False,5.0,2.0,51000.0,66780.0,0.0,2018.0,0.0,2017.0,,,1.87,2.0,2.0,2.0,3,3025710028,,561.0,995995.0,204223.0,40.727214,-73.957625,12c,,304 033,30902.0,3025710000.0,05/09/2019,1,20v5,1.0,30.0,,,,single-family,post-war,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","2009–present, post-financial crisis",single-unit,True,medium-small,Q4 (largest 25%),2010-2020
2,BK,3197,8,304.0,429.0,1002.0,32.0,34.0,11237.0,E218,83.0,3200.0,3.0,1B,109 WILSON AVENUE,R6,,,,,,,False,S4,4.0,0.0,,SERLIN BUILDING LIMITED PARTNERSHIP,2500.0,4125.0,1375.0,2750.0,0.0,1375.0,0.0,0.0,0.0,0.0,2.0,1.0,3,4,5.0,25.0,100.0,25.0,55.0,N,0.0,False,3.0,5.0,158850.0,381150.0,45280.0,1931.0,2001.0,0.0,,,1.65,2.43,0.0,4.8,3,3031970008,,429.0,1004619.0,194842.0,40.70145,-73.926539,13b,,309 037,31102.0,,,1,20v5,4.0,34.0,,,,primarily_res_with_mixed_use,pre-war,False,low-rise,"Pre-1947, pre-rent-control","1931–1950, Manhattan Modern","1930-1945, great depression and WWII",3-5 units,False,medium-small,Q4 (largest 25%),1930-1939
3,QN,52,7,402.0,7.0,1000.0,30.0,26.0,11101.0,L115,108.0,720.0,4.0,2A,11-43 45 AVENUE,M1-4/R6A,,,,,LIC,,False,C1,2.0,0.0,,"TRIBECA TREASURES, LLC",2500.0,7416.0,380.0,7036.0,0.0,0.0,0.0,0.0,0.0,380.0,2.0,1.0,5,7,7.0,25.0,100.0,25.0,100.0,N,0.0,False,5.0,0.0,10350.0,977850.0,923360.0,1958.0,2007.0,2007.0,,,2.97,3.0,2.0,3.0,4,4000520007,,7.0,998601.0,211689.0,40.747702,-73.948207,9b,,401 019,40101.0,,,1,20v5,2.0,41.0,,,,walk-up,post-war,False,mid-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",6-20 units,True,medium,Q4 (largest 25%),1950-1959
4,BK,6714,55,314.0,534.0,3000.0,21.0,44.0,11230.0,L156,70.0,7310.0,3.0,4D,1081 EAST 12 STREET,R5,R7A,,,,,,True,C3,2.0,0.0,,"RAMBOD, SHAHROKH",3500.0,2112.0,0.0,2112.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,2,4,4.0,35.0,100.0,22.0,48.0,N,0.0,False,5.0,5.0,55350.0,255600.0,2390.0,1931.0,0.0,0.0,,,0.6,1.25,0.0,2.0,3,3067140055,,534.0,994376.0,166232.0,40.622939,-73.963523,22d,,313 040,32006.0,,,1,20v5,14.0,35.0,,,,walk-up,pre-war,False,low-rise,"Pre-1947, pre-rent-control","1931–1950, Manhattan Modern","1930-1945, great depression and WWII",3-5 units,False,small,Q3 (50-75%),1930-1939


In [32]:
df_6.shape, df_6.duplicated().sum()

((752619, 97), np.int64(11635570), np.int64(0))

In [35]:
df_6.isna().sum().head()
# This cleaned bbl_df still has some nans for good reasons.
# the nans were kept, because we wouldn't do any analysis on these columns that contained nans
# if we drop them now, we will lose a lot of rows that had important info in other columns
# but whenever we merge bbl with other dataset, we examine the nans case by case, and either fillna with sensible data or
# drop them case by case

Unnamed: 0,0
borough,0
block,0
lot,0
community board,38
census tract 2010,38


# **4. The cleaned SVI dataset**

In [42]:
svi = '/content/drive/My Drive/X999/svi_cleaned.csv'
results = '/content/drive/My Drive/X999/svi_selected_neighborhoods_results_df.csv'

In [43]:
df_7 = pd.read_csv(svi)
df_8 = pd.read_csv(results)

In [38]:
df_7.head()

Unnamed: 0,fips,location,area_sqmi,e_totpop,m_totpop,e_hu,m_hu,e_hh,m_hh,e_pov150,m_pov150,e_unemp,m_unemp,e_hburd,m_hburd,e_nohsdp,m_nohsdp,e_uninsur,m_uninsur,e_age65,m_age65,e_age17,m_age17,e_disabl,m_disabl,e_sngpnt,m_sngpnt,e_limeng,m_limeng,e_minrty,m_minrty,e_munit,m_munit,e_mobile,m_mobile,e_crowd,m_crowd,e_noveh,m_noveh,e_groupq,m_groupq,ep_pov150,mp_pov150,ep_unemp,mp_unemp,ep_hburd,mp_hburd,ep_nohsdp,mp_nohsdp,ep_uninsur,mp_uninsur,ep_age65,mp_age65,ep_age17,mp_age17,ep_disabl,mp_disabl,ep_sngpnt,mp_sngpnt,ep_limeng,mp_limeng,ep_minrty,mp_minrty,ep_munit,mp_munit,ep_mobile,mp_mobile,ep_crowd,mp_crowd,ep_noveh,mp_noveh,ep_groupq,mp_groupq,epl_pov150,epl_unemp,epl_hburd,epl_nohsdp,epl_uninsur,spl_theme1,rpl_theme1,epl_age65,epl_age17,epl_disabl,epl_sngpnt,epl_limeng,spl_theme2,rpl_theme2,epl_minrty,spl_theme3,rpl_theme3,epl_munit,epl_mobile,epl_crowd,epl_noveh,epl_groupq,spl_theme4,rpl_theme4,spl_themes,rpl_themes,f_pov150,f_unemp,f_hburd,f_nohsdp,f_uninsur,f_theme1,f_age65,f_age17,f_disabl,f_sngpnt,f_limeng,f_theme2,f_minrty,f_theme3,f_munit,f_mobile,f_crowd,f_noveh,f_groupq,f_theme4,f_total,e_daypop,e_noint,m_noint,e_afam,m_afam,e_hisp,m_hisp,e_asian,m_asian,e_aian,m_aian,e_nhpi,m_nhpi,e_twomore,m_twomore,e_otherrace,m_otherrace,ep_noint,mp_noint,ep_afam,mp_afam,ep_hisp,mp_hisp,ep_asian,mp_asian,ep_aian,mp_aian,ep_nhpi,mp_nhpi,ep_twomore,mp_twomore,ep_otherrace,mp_otherrace
0,10001,ZCTA5 10001,0.623822,27004,1827,16975,831,14375,782,5248,797,761,266,3314,531,1930,534,831,289,3428,432,2694,643,2310,499,501,215,1381,405,13460,2305,15840,898,15,23,389,135,12285,840,2213,218,20.3,2.7,4.3,1.5,23.1,3.5,9.1,2.4,3.1,1.0,12.7,1.6,10.0,2.1,8.6,1.9,3.5,1.5,5.3,1.5,49.8,7.8,93.3,2.7,0.1,0.1,2.7,0.9,85.5,2.8,8.2,0.6,0.6108,0.4574,0.5573,0.5902,0.4436,2.6593,0.5688,0.142,0.1161,0.1891,0.4707,0.8777,1.7956,0.1692,0.867,0.867,0.867,0.9853,0.271,0.7402,0.9949,0.9104,3.9018,0.9806,9.2237,0.7414,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,3,3,239407,1047,389,2220,576,5206,943,5031,774,0,25,0,25,780,326,223,169,7.3,2.6,8.2,2.2,19.3,3.0,18.6,2.9,0.0,0.1,0.0,0.1,2.9,1.2,0.8,0.6
1,10002,ZCTA5 10002,0.822292,76518,2894,39094,1241,36028,1326,27908,2853,2833,574,14688,1367,18301,1376,4074,766,17681,1287,10028,1549,9896,1062,2211,499,18393,1640,56964,3226,35725,1677,16,28,2461,449,29828,1403,2090,39,36.8,3.5,7.6,1.4,40.8,3.5,30.0,2.0,5.4,1.0,23.1,1.7,13.1,1.8,13.0,1.4,6.1,1.4,24.7,2.0,74.4,3.1,91.4,3.2,0.0,0.1,6.8,1.2,82.8,1.8,2.7,0.1,0.9148,0.7946,0.9219,0.9741,0.7207,4.3261,0.9639,0.7296,0.1831,0.5186,0.739,0.9944,3.1647,0.8781,0.9369,0.9369,0.9369,0.979,0.0,0.9105,0.9915,0.773,3.654,0.9254,12.0817,0.9656,1,0,1,1,0,3,0,0,0,0,1,1,1,1,1,0,1,1,0,3,8,64307,8590,1110,6141,1194,19864,2190,28477,1989,74,83,24,45,1810,486,574,394,23.8,2.9,8.0,1.5,26.0,2.5,37.2,2.2,0.1,0.1,0.0,0.1,2.4,0.6,0.8,0.5
2,10003,ZCTA5 10003,0.571603,53877,2579,30766,956,24987,936,6397,1171,1613,315,5445,853,1574,422,1282,404,8128,792,3866,718,3604,634,278,171,1217,828,19778,3548,27261,1299,0,31,798,282,20035,905,10199,194,14.3,2.5,4.7,0.9,21.8,3.3,4.2,1.2,2.4,0.7,15.1,1.4,7.2,1.2,6.7,1.2,1.1,0.7,2.3,1.6,36.7,6.3,88.6,3.2,0.0,0.1,3.2,1.1,80.2,2.8,18.9,1.0,0.4017,0.5142,0.4934,0.2486,0.3386,1.9965,0.3389,0.2513,0.0851,0.0965,0.2217,0.7589,1.4135,0.0969,0.7977,0.7977,0.7977,0.9745,0.0,0.7852,0.9869,0.9577,3.7043,0.9368,7.912,0.5373,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,3,3,138011,1458,478,2899,748,5541,943,9014,1065,51,52,27,32,2067,547,179,132,5.8,1.8,5.4,1.4,10.3,1.7,16.7,1.9,0.1,0.1,0.1,0.1,3.8,1.0,0.3,0.2
3,10004,ZCTA5 10004,0.455576,4579,926,2706,484,2123,394,169,101,11,16,130,84,32,28,13,19,190,131,840,341,104,82,8,16,39,73,2009,1102,2592,476,0,13,161,99,1548,318,35,13,3.7,2.1,0.4,0.5,6.1,3.8,0.9,0.8,0.3,0.4,4.1,2.9,18.3,5.1,2.3,1.9,0.4,0.7,0.9,1.8,43.9,22.4,95.8,4.0,0.0,1.5,7.6,4.4,72.9,9.4,0.8,0.2,0.0699,0.1413,0.057,0.0834,0.1213,0.4729,0.0252,0.04,0.3921,0.0412,0.1846,0.5977,1.2556,0.0769,0.8439,0.8439,0.8439,0.9898,0.0,0.9254,0.9749,0.551,3.4411,0.8558,6.0135,0.2299,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,3,3,77721,12,21,252,140,229,145,1370,703,0,13,0,13,158,113,0,13,0.6,1.0,5.5,3.0,5.0,3.0,29.9,12.2,0.0,0.9,0.0,0.9,3.5,2.4,0.0,0.9
4,10005,ZCTA5 10005,0.072868,8801,1132,6272,438,4881,550,647,363,257,158,532,266,168,135,106,96,158,129,924,395,99,80,140,147,22,78,2730,1506,6083,451,0,19,411,200,4503,549,50,30,7.4,4.0,3.4,2.0,10.9,5.3,2.5,2.0,1.2,1.0,1.8,1.5,10.5,3.8,1.1,0.9,2.9,3.0,0.3,0.9,31.0,16.6,97.0,2.4,0.0,0.6,8.4,4.0,92.3,3.6,0.6,0.3,0.158,0.3371,0.1031,0.1426,0.1766,0.9174,0.0631,0.0327,0.1245,0.0316,0.3966,0.4406,1.026,0.0427,0.7504,0.7504,0.7504,0.9921,0.0,0.9356,0.9972,0.4963,3.4212,0.8496,6.115,0.2437,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,3,3,56918,160,153,421,334,755,432,985,354,0,19,0,19,569,268,0,19,3.3,3.1,4.8,3.7,8.6,4.6,11.2,4.2,0.0,0.5,0.0,0.5,6.5,3.0,0.0,0.5


In [44]:
df_8.head()

Unnamed: 0,Neighborhood,Population,White (non-Hispanic),Black,Hispanic,Asian,Native American,Pacific Islander,Two or More,Other Race,Overall SVI,Socioeconomic,Household Composition,Minority Status,Housing & Transportation,Poverty Rate (%),Unemployment (%),No High School Diploma (%),Uninsured (%)
0,Mott Haven & Melrose,136764,2.949607,28.219414,65.764382,0.932263,0.220818,0.251528,1.122371,0.539616,0.995195,0.995195,0.989167,0.989132,0.949083,50.684675,11.704815,32.611518,8.622205
1,Morrisania,219703,2.61717,35.302203,59.284124,0.748283,0.20027,0.001365,1.462884,0.3837,0.996953,0.995146,0.985543,0.990868,0.977849,48.941707,12.940613,31.015226,8.008425
2,Brownsville,84006,3.473561,70.059281,19.480751,0.784468,0.115468,0.049996,5.430564,0.605909,0.9948,0.992,0.9487,0.9865,0.992,45.3,14.2,23.8,7.3
3,East New York,221789,4.664794,51.398401,34.548152,4.773005,0.021642,0.115876,2.909522,1.568608,0.971361,0.957172,0.87873,0.983861,0.961157,33.026303,8.580173,17.992746,5.532074
4,Jamaica,252625,7.459673,44.787729,18.683028,20.090252,0.63335,0.026126,4.039584,4.280257,0.949985,0.946251,0.769434,0.978836,0.922005,22.02278,10.251442,17.673503,8.100223


In [46]:
df_7.shape, df_8.shape, df_7.duplicated().sum(), df_8.duplicated().sum(), df_7.isna().sum().sum(), df_8.isna().sum().sum()

((204, 153), (16, 19), np.int64(0), np.int64(0), np.int64(0), np.int64(0))

# **5. The cleaned 311 data**

In [4]:
# more complete:
saved_2017_c = "/content/drive/My Drive/X999/311_different_years/filtered_df_2017.csv"
saved_2018_c = "/content/drive/My Drive/X999/311_different_years/filtered_df_2018.csv"
saved_2019_c = "/content/drive/My Drive/X999/311_different_years/filtered_df_2019.csv"
saved_2020_c = "/content/drive/My Drive/X999/311_different_years/filtered_df_2020.csv"
saved_2021_c = "/content/drive/My Drive/X999/311_different_years/filtered_df_2021.csv"
saved_2022_c = "/content/drive/My Drive/X999/311_different_years/filtered_df_2022.csv"
saved_2023_c = "/content/drive/My Drive/X999/311_different_years/filtered_df_2023.csv"
saved_2024_c = "/content/drive/My Drive/X999/311_different_years/filtered_df_2024.csv"

In [5]:
df_2017_c = pd.read_csv(saved_2017_c)
df_2018_c = pd.read_csv(saved_2018_c)
df_2019_c = pd.read_csv(saved_2019_c)
df_2020_c = pd.read_csv(saved_2020_c)
df_2021_c = pd.read_csv(saved_2021_c)
df_2022_c = pd.read_csv(saved_2022_c)
df_2023_c = pd.read_csv(saved_2023_c)
df_2024_c = pd.read_csv(saved_2024_c)

In [6]:
df_9 = pd.concat([df_2017_c, df_2018_c, df_2019_c, df_2023_c, df_2024_c])
df_10 = pd.concat([df_2020_c, df_2021_c, df_2022_c])

In [13]:
# the slimmer ones
saved_2017 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2017_reduced.csv"
saved_2018 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2018_reduced.csv"
saved_2019 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2019_reduced.csv"
saved_2020 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2020_reduced.csv"
saved_2021 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2021_reduced.csv"
saved_2022 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2022_reduced.csv"
saved_2023 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2023_reduced.csv"
saved_2024 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2024_reduced.csv"

In [14]:
df_2017 = pd.read_csv(saved_2017)
df_2018 = pd.read_csv(saved_2018)
df_2019 = pd.read_csv(saved_2019)
df_2020 = pd.read_csv(saved_2020)
df_2021 = pd.read_csv(saved_2021)
df_2022 = pd.read_csv(saved_2022)
df_2023 = pd.read_csv(saved_2023)
df_2024 = pd.read_csv(saved_2024)

In [15]:
df_11 = pd.concat([df_2017, df_2018, df_2019, df_2023, df_2024])
df_12 = pd.concat([df_2020, df_2021, df_2022])

In [7]:
df_9.head()

Unnamed: 0,unique_key,created_date,closed_date,complaint_type,descriptor,location_type,incident_zip,incident_address,city,status,bbl,borough,latitude,longitude,duration_days,duration_category,created_year,closed_year,intersection_street_1,intersection_street_2
0,38070156,2017-12-31 23:59:35,2018-01-04 19:27:02,HEAT/HOT WATER,ENTIRE BUILDING,RESIDENTIAL BUILDING,10030.0,181 WEST 135 STREET,NEW YORK,Closed,1019200000.0,MANHATTAN,40.815127,-73.943252,3.810729,1-7 days,2017.0,2018.0,,
1,38067146,2017-12-31 23:59:34,2018-01-01 00:57:19,Noise - Residential,Loud Talking,Residential Building/House,10035.0,2048 MADISON AVENUE,NEW YORK,Closed,1017540000.0,MANHATTAN,40.808655,-73.938532,0.040104,Same day,2017.0,2018.0,,
2,38066214,2017-12-31 23:59:15,2018-01-01 02:48:23,Noise - Residential,Loud Music/Party,Residential Building/House,10466.0,1902 NEREID AVENUE,BRONX,Closed,2050540000.0,BRONX,40.8987,-73.848528,0.117454,Same day,2017.0,2018.0,,
3,38067041,2017-12-31 23:58:38,2018-01-01 02:53:28,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11230.0,1201 AVENUE H,BROOKLYN,Closed,3066870000.0,BROOKLYN,40.629675,-73.964939,0.121412,Same day,2017.0,2018.0,,
4,38068229,2017-12-31 23:58:33,2018-01-08 13:30:58,HEAT/HOT WATER,APARTMENT ONLY,RESIDENTIAL BUILDING,11226.0,70 LINDEN BOULEVARD,BROOKLYN,Closed,3050860000.0,BROOKLYN,40.652289,-73.956328,7.564178,8-30 days,2017.0,2018.0,,


In [16]:
df_10.head()

Unnamed: 0,unique_key,created_date,closed_date,complaint_type,descriptor,location_type,incident_zip,incident_address,intersection_street_1,intersection_street_2,city,status,bbl,borough,latitude,longitude,duration_days,duration_category,created_year,closed_year
0,48538697,2020-12-31 23:59:55,2021-01-01 01:07:04,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10460.0,1569 HOE AVENUE,EAST 172 STREET,EAST 173 STREET,BRONX,Closed,2029820000.0,BRONX,40.83582,-73.887516,0.046632,Same day,,
1,48536596,2020-12-31 23:59:28,2021-01-01 01:33:12,Noise - Residential,Loud Music/Party,Residential Building/House,10028.0,235 EAST 83 STREET,3 AVENUE,2 AVENUE,NEW YORK,Closed,1015290000.0,MANHATTAN,40.776503,-73.954525,0.065093,Same day,,
2,48536500,2020-12-31 23:58:55,2021-01-01 00:24:54,Noise - Residential,Loud Music/Party,Residential Building/House,10468.0,2380 GRAND AVENUE,WEST 184 STREET,WEST FORDHAM ROAD,BRONX,Closed,2031990000.0,BRONX,40.861553,-73.904168,0.018044,Same day,,
3,48542024,2020-12-31 23:58:45,2021-01-14 16:49:17,Noise - Helicopter,NYPD,Above Address,10003.0,195 1 AVENUE,EAST 11 STREET,EAST 12 STREET,NEW YORK,Closed,1004530000.0,MANHATTAN,40.729916,-73.983616,13.701759,8-30 days,,
4,48543542,2020-12-31 23:58:39,2021-01-01 00:13:47,Noise - Residential,Loud Music/Party,Residential Building/House,10034.0,571 ACADEMY STREET,POST AVENUE,SHERMAN AVENUE,NEW YORK,Closed,1022218000.0,MANHATTAN,40.863565,-73.923221,0.010509,Same day,,


In [60]:
df_11.head()

Unnamed: 0,unique_key,created_date,closed_date,complaint_type,incident_zip,incident_address,bbl,borough,latitude,longitude
0,38070156,2017-12-31 23:59:35,2018-01-04 19:27:02,HEAT/HOT WATER,10030.0,181 WEST 135 STREET,1019200000.0,MANHATTAN,40.815127,-73.943252
1,38067146,2017-12-31 23:59:34,2018-01-01 00:57:19,Noise - Residential,10035.0,2048 MADISON AVENUE,1017540000.0,MANHATTAN,40.808655,-73.938532
2,38066214,2017-12-31 23:59:15,2018-01-01 02:48:23,Noise - Residential,10466.0,1902 NEREID AVENUE,2050540000.0,BRONX,40.8987,-73.848528
3,38067041,2017-12-31 23:58:38,2018-01-01 02:53:28,Noise - Street/Sidewalk,11230.0,1201 AVENUE H,3066870000.0,BROOKLYN,40.629675,-73.964939
4,38068229,2017-12-31 23:58:33,2018-01-08 13:30:58,HEAT/HOT WATER,11226.0,70 LINDEN BOULEVARD,3050860000.0,BROOKLYN,40.652289,-73.956328


In [61]:
df_12.head()

Unnamed: 0,unique_key,created_date,closed_date,complaint_type,incident_zip,incident_address,bbl,borough,latitude,longitude
0,48538697,2020-12-31 23:59:55,2021-01-01 01:07:04,Noise - Vehicle,10460.0,1569 HOE AVENUE,2029820000.0,BRONX,40.83582,-73.887516
1,48536596,2020-12-31 23:59:28,2021-01-01 01:33:12,Noise - Residential,10028.0,235 EAST 83 STREET,1015290000.0,MANHATTAN,40.776503,-73.954525
2,48536500,2020-12-31 23:58:55,2021-01-01 00:24:54,Noise - Residential,10468.0,2380 GRAND AVENUE,2031990000.0,BRONX,40.861553,-73.904168
3,48542024,2020-12-31 23:58:45,2021-01-14 16:49:17,Noise - Helicopter,10003.0,195 1 AVENUE,1004530000.0,MANHATTAN,40.729916,-73.983616
4,48543542,2020-12-31 23:58:39,2021-01-01 00:13:47,Noise - Residential,10034.0,571 ACADEMY STREET,1022218000.0,MANHATTAN,40.863565,-73.923221


In [8]:
df_9.shape, df_10.shape, df_9.duplicated().sum(), df_10.duplicated().sum()

((6036232, 20), (4052446, 20), np.int64(0), np.int64(0))

In [11]:
df_9.isna().sum()

Unnamed: 0,0
unique_key,0
created_date,0
closed_date,0
complaint_type,0
descriptor,40662
location_type,541452
incident_zip,633
incident_address,223
city,71
status,0


In [12]:
df_10.isna().sum()

Unnamed: 0,0
unique_key,0
created_date,0
closed_date,0
complaint_type,0
descriptor,36390
location_type,322016
incident_zip,99
incident_address,0
intersection_street_1,2601520
intersection_street_2,2601446


In [17]:
df_9.shape, df_10.shape, df_11.shape, df_12.shape, df_9.duplicated().sum(), \
df_10.duplicated().sum(), df_11.duplicated().sum(), df_12.duplicated().sum(),

((6036232, 20),
 (4052446, 20),
 (6036232, 10),
 (4052446, 10),
 np.int64(0),
 np.int64(0),
 np.int64(0),
 np.int64(0))

In [18]:
df_11.isna().sum()

Unnamed: 0,0
unique_key,0
created_date,0
closed_date,0
complaint_type,0
incident_zip,633
incident_address,223
bbl,0
borough,0
latitude,586
longitude,586


In [19]:
df_12.isna().sum()

Unnamed: 0,0
unique_key,0
created_date,0
closed_date,0
complaint_type,0
incident_zip,99
incident_address,0
bbl,0
borough,2
latitude,51
longitude,51


# **6. BBL_evictions merged**

In [8]:
noraml_times = '/content/drive/My Drive/X999/bbl_evictions_merged_normal_times.csv'
covid = '/content/drive/My Drive/X999/bbl_evictions_merged_covid.csv'

In [9]:
df_13 = pd.read_csv(noraml_times)
df_14 = pd.read_csv(covid)

In [10]:
df_13.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,eviction_postcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade
0,*308072/22_5865,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,3037420029,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939
1,*313639/23_5202,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,3057940012,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929
2,*324973/22_5308,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,3057820030,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6,1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909
3,*53336/16_170279,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,2032510420,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),0.8,1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969
4,*5990/17_2703,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,2025770038,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1.6,1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929


In [11]:
df_14.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,eviction_postcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade
0,004123/20_209969,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,2032140141,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009
1,0050153/20_106030,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,4031560133,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969
2,0052002/19_101926,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,3051370021,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6,1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929
3,0057757/18_100889,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,3011850034,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8,1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929
4,0058466/19_104327,0058466/19,104327,635 WEST 42ND STREET,UNIT 18B,2020-03-12,MANHATTAN,10036,Not an Ejectment,Possession,40.761463,-73.999816,4.0,3.0,129.0,1087539,1010907501,Clinton,2020,2020-03,POINT (-73.999816 40.761463),0.2,,,,,,,,,,,,,,,,,,


In [13]:
df_13.shape, df_14.shape, df_13.duplicated().sum(), df_14.duplicated().sum()

((76715, 40), (6564, 40), np.int64(0), np.int64(0))

In [14]:
df_13.isna().sum() # did not dropna all at once, but whenever for analysis, should use dropna() to drop all the nans.

Unnamed: 0,0
primary_key,0
court_index_number,0
docket_number,0
eviction_address,0
eviction_apartment_number,0
executed_date,0
borough,0
eviction_postcode,0
ejectment,0
eviction/legal_possession,0


In [15]:
df_14.isna().sum()

Unnamed: 0,0
primary_key,0
court_index_number,0
docket_number,0
eviction_address,0
eviction_apartment_number,0
executed_date,0
borough,0
eviction_postcode,0
ejectment,0
eviction/legal_possession,0


# **7. BBL_evictions_SVI merged**

In [16]:
noraml_times = '/content/drive/My Drive/X999/merged_df_clean_normal_times.csv'
covid = '/content/drive/My Drive/X999/merged_df_clean_covid.csv'

In [17]:
df_15 = pd.read_csv(noraml_times)
df_16 = pd.read_csv(covid)

In [18]:
df_15.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group
0,*308072/22_5865,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,3037420029,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high
1,*313639/23_5202,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,3057940012,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high
2,*324973/22_5308,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,3057820030,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6,1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high
3,*53336/16_170279,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,2032510420,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),0.8,1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,medium-high
4,*5990/17_2703,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,2025770038,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1.6,1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929,10455,44380.0,0.9971,0.9909,0.9972,0.9499,0.9971,48.5,12.5,32.1,9.5,10.1,28.1,19.5,17.9,75.1,14.5,51.9,21.1,74.1,1.1,0.0,0.0,1.4,1.0,98.6,1.4,False,Q4 (High),high


In [19]:
df_16.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile
0,004123/20_209969,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,2032140141,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3
1,0050153/20_106030,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,4031560133,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low)
2,0052002/19_101926,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,3051370021,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6,1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.93,0.4536,0.9639,0.9692,0.922,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2
3,0057757/18_100889,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,3011850034,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8,1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.933,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low)
4,0058466/19_104327,0058466/19,104327,635 WEST 42ND STREET,UNIT 18B,2020-03-12,MANHATTAN,10036,Not an Ejectment,Possession,40.761463,-73.999816,4.0,3.0,129.0,1087539,1010907501,Clinton,2020,2020-03,POINT (-73.999816 40.761463),0.2,,,,,,,,,,,,,,,,,,,10036,30930.0,0.6491,0.3328,0.8659,0.959,0.801,19.8,5.8,7.5,2.8,15.3,6.8,14.9,5.2,88.7,2.5,28.6,6.6,20.6,16.0,0.1,0.1,5.7,0.5,49.6,50.4,False,Q1 (Low)


In [20]:
df_15.shape, df_16.shape, df_15.duplicated().sum(), df_16.duplicated().sum()

((75010, 70), (6450, 69), np.int64(0), np.int64(0))

In [22]:
df_15.isna().sum() # the nans were carried over from the previous df
# did not dropna all at once, but whenever for analysis, should use dropna() to drop all the nans.

Unnamed: 0,0
primary_key,0
court_index_number,0
docket_number,0
eviction_address,0
eviction_apartment_number,0
executed_date,0
borough,0
zipcode,0
ejectment,0
eviction/legal_possession,0


In [23]:
df_16.isna().sum()

Unnamed: 0,0
primary_key,0
court_index_number,0
docket_number,0
eviction_address,0
eviction_apartment_number,0
executed_date,0
borough,0
zipcode,0
ejectment,0
eviction/legal_possession,0


# **8. BBL_evictions_simplified_SVI merged**

In [20]:
noraml_times = '/content/drive/My Drive/X999/bbl_evictions_311_svi.csv'
covid = '/content/drive/My Drive/X999/bbl_evictions_311_svi_covid.csv'

In [21]:
df_17 = pd.read_csv(noraml_times)
df_18 = pd.read_csv(covid)

In [22]:
df_17.head()

Unnamed: 0,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcodes,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,geometry,eviction_count,year,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,complaint_count,E_TOTPOP,RPL_THEME1,RPL_THEME2,RPL_THEME3,RPL_THEME4,RPL_THEMES,EP_POV150,EP_UNEMP,EP_NOHSDP,EP_UNINSUR,EP_AGE65,EP_AGE17,EP_DISABL,EP_LIMENG,EP_NOVEH,EP_CROWD,EP_HBURD,EP_AFAM,EP_HISP,EP_ASIAN,EP_AIAN,EP_NHPI,EP_TWOMORE,EP_OTHERRACE,EP_MINRTY,EP_WHITE
0,34859/16,53416,3476 SEYMOUR AVENUE,3-B,2017-01-03,BRONX,10469,Not an Ejectment,Possession,40.87762,-73.849806,12.0,12.0,386.0,2117041.0,2047200001,Eastchester-Edenwald-Baychester,POINT (-73.849806 40.87762),13,2017,2.6,1935.0,C1,4.0,158.0,EASTCHESTER HEIGHTS PROPERTY OWNER LLC,148800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1931–1950, Manhattan Modern","1930-1945, great depression and WWII",100+ units,True,mega,Q4 (largest 25%),1930-1939,191.0,71862.0,0.9255,0.9259,0.9746,0.8724,0.9507,22.6,8.4,17.5,4.8,17.0,22.2,12.0,6.2,34.7,6.9,36.5,52.2,29.5,5.5,0.6,0.0,2.3,1.0,91.2,8.8
1,B57808/16,74242,1426 BRYANT AVENUE,10 AKA 2ND FL UNIT,2017-01-03,BRONX,10459,Not an Ejectment,Possession,40.830691,-73.888555,3.0,17.0,123.0,2099901.0,2029990111,Crotona Park East,POINT (-73.888555 40.830691),1,2017,1.0,1995.0,B1,3.0,2.0,"BAYRON, AIDA L.",2520.0,post-war,two-family,False,low-rise,"1994–Present, vacancy decontrol","1981–2000, Post-Modernism","1991–2008, modern economic growth",2-unit,False,small,Q3 (50-75%),1990-1999,2.0,51964.0,0.9925,0.9846,0.9949,0.9333,0.9943,43.4,13.9,31.7,7.1,10.8,26.3,18.2,16.5,65.7,13.7,52.4,28.9,67.3,0.2,0.3,0.0,1.3,0.1,98.1,1.9
2,N069212/14,355977,1309 5TH AVENUE,24H,2017-01-03,MANHATTAN,10029,Not an Ejectment,Possession,40.797309,-73.948901,11.0,9.0,17402.0,1078884.0,1016160001,East Harlem South,POINT (-73.948901 40.797309),20,2017,4.0,1974.0,D7,34.0,600.0,HERITAGE HOLDINGS HOUSING DEVELOPMENT FU ND,680000.0,post-war,elevator,False,high-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1970-1979,1722.0,75614.0,0.9851,0.9094,0.9606,0.9476,0.9788,44.3,8.6,23.5,6.7,16.3,18.0,15.6,10.9,84.8,7.7,48.5,25.0,45.7,10.8,0.1,0.0,2.0,1.6,85.1,14.9
3,K065455/16,367441,458 EAST 51 STREET,6A,2017-01-03,BROOKLYN,11203,Not an Ejectment,Possession,40.650624,-73.929261,17.0,45.0,862.0,3102875.0,3046980037,Rugby-Remsen Village,POINT (-73.929261 40.650624),12,2017,2.666667,1940.0,D1,6.0,53.0,458 EAST 51ST PARTNERS LLC,43020.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1931–1950, Manhattan Modern","1930-1945, great depression and WWII",21-100 units,True,very large,Q4 (largest 25%),1940-1949,224.0,78506.0,0.8956,0.7886,0.9814,0.9595,0.9386,21.2,6.9,12.1,5.9,19.7,18.7,10.5,2.4,49.7,6.7,41.1,80.3,6.9,1.8,0.1,0.0,5.5,0.4,95.0,5.0
4,33992/16,458984,580 EAST 168TH STREE T,*,2017-01-03,BRONX,10456,Not an Ejectment,Possession,40.830494,-73.904108,3.0,16.0,185.0,2004234.0,2026110033,Morrisania-Melrose,POINT (-73.904108 40.830494),3,2017,3.0,,,,,,,,,,,,,,,,,,,1.0,88575.0,0.996,0.9903,0.991,0.9972,0.9994,49.1,14.7,33.4,7.3,11.3,27.1,19.3,14.7,76.4,11.1,54.9,38.2,56.3,0.7,0.2,0.0,1.8,0.4,97.6,2.4


In [23]:
df_18.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,004123/20_209969,2032140141,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0
1,0050153/20_106030,4031560133,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternativ...","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0
2,0052002/19_101926,3051370021,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6,1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.93,0.4536,0.9639,0.9692,0.922,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
3,0057757/18_100889,3011850034,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8,1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.933,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low),0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,12.0,0.0,17.0,0.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,45.0
4,0061902/19_117253,4033220043,0061902/19,117253,83-33 118TH STREET,5N,2020-02-14,QUEENS,11415,Not an Ejectment,Possession,40.706235,-73.834603,9.0,29.0,134.0,4079390,Kew Gardens,2020,2020-02,POINT (-73.834603 40.706235),0.4,1979.0,D1,6.0,79.0,CIAMPA METROPOLITAN CO,72147.0,post-war,elevator,False,mid-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternativ...","1976–1990, fiscal crisis and recovery",21-100 units,False,very large,Q4 (largest 25%),1970-1979,11415,20315.0,0.7661,0.5573,0.898,0.9396,0.8761,14.6,5.6,11.8,4.7,17.0,18.0,10.9,7.5,44.3,8.5,32.3,6.7,22.9,22.3,0.2,0.0,3.4,2.1,57.7,42.3,False,Q1 (Low),0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,9.0,0.0,19.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0,38.0


In [24]:
df_17.shape, df_18.shape, df_17.duplicated().sum(), df_18.duplicated().sum()

((74082, 66), (5386, 91), np.int64(0), np.int64(0))

In [25]:
df_17.isna().sum() # some of these zipcodes simply did not exist in the svi file or were no longer in use
# during analysis, use dropna() to drop them all at once.

Unnamed: 0,0
court_index_number,0
docket_number,0
eviction_address,0
eviction_apartment_number,0
executed_date,0
borough,0
zipcodes,0
ejectment,0
eviction/legal_possession,0
latitude,0


In [27]:
df_18.isna().sum().sum() # covid time was fine

np.int64(0)