# **Introduction**

In this notebook, we did the final merge of all relevant data during covid time, including BBL, evictions, SVI scores, and 311 complaints. We first got the already merged bbl_evictions_svi dataset and get rid of the nans for analysis (the previously version had nan for retrival purpose if we find it necesssary later). Then we combined all the 311 complaints and cleaned nans. We then groupby bbl and categories of the complaint data and reset them to a wide pivot table. Finally, we merged the pivot table with bbl_evictions_svi df to arrive at the final mega merged and cleaned df.

In [None]:
import pandas as pd
import numpy as np
from scipy import stats
import datetime as dt
import matplotlib
import matplotlib.pyplot as plt
import os
import io
import geopandas as gpd
import seaborn as sns

# suppress warning
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', None)
# display all columns

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# **Step 1: get the bbl evictions svi merged data**

In [None]:
file_path1 = '/content/drive/My Drive/X999/merged_df_clean_covid.csv'

In [None]:
bbl_evictions_svi = pd.read_csv(file_path1)

In [None]:
bbl_evictions_svi.shape

(6450, 69)

In [None]:
nan_counts = bbl_evictions_svi.isna().sum()
columns_with_nans = nan_counts[nan_counts > 0]
columns_with_nans

Unnamed: 0,0
yearbuilt,344
bldgclass,344
numfloors,344
unitsres,344
ownername,344
bldgarea,344
building_type,344
building_category,344
is_condo,344
floor_category,344


In [None]:
nan_percentage = (bbl_evictions_svi.isna().sum() / len(bbl_evictions_svi)) * 100
nan_percentage = nan_percentage[nan_percentage > 0]
nan_percentage = nan_percentage.sort_values(ascending=False)
nan_percentage

Unnamed: 0,0
yearbuilt,5.333333
bldgclass,5.333333
numfloors,5.333333
unitsres,5.333333
ownername,5.333333
bldgarea,5.333333
building_type,5.333333
building_category,5.333333
is_condo,5.333333
floor_category,5.333333


## **There is really not much to do with these nan values, as they simply cannot be imputed with high confidence. For purely retrival purpose, I think we can keep the nans. They ocurred because these bbls in the eviction dataset could not find their matches in the bbl dataset. But for any other analysis (what we mainly care about here), we will remove them.**

In [None]:
bbl_evictions_svi = bbl_evictions_svi.dropna()
bbl_evictions_svi.shape, 6106 - 344, f'{344 / 6106*100:.2f} % removed'

((6106, 69), 5762, '5.63 % removed')

In [None]:
bbl_evictions_svi.isna().sum().sum()

np.int64(0)

In [None]:
bbl_evictions_svi.columns, bbl_evictions_svi.shape

(Index(['primary_key', 'court_index_number', 'docket_number',
        'eviction_address', 'eviction_apartment_number', 'executed_date',
        'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
        'latitude', 'longitude', 'community_board', 'council_district',
        'census_tract', 'bin', 'bbl', 'nta', 'year', 'month_year', 'geometry',
        'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
        'unitsres', 'ownername', 'bldgarea', 'building_type',
        'building_category', 'is_condo', 'floor_category', 'rent_era',
        'architectural_style', 'economic_period', 'residential_units_category',
        'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
        'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
        'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
        'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
        'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp

# **Step2: get the combined 311 complaints data**

In [None]:
# saved_2017 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2017_reduced.csv"
# saved_2018 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2018_reduced.csv"
# saved_2019 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2019_reduced.csv"
saved_2020 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2020_reduced.csv"
saved_2021 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2021_reduced.csv"
saved_2022 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2022_reduced.csv"
# saved_2023 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2023_reduced.csv"
# saved_2024 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2024_reduced.csv"

In [None]:
# df_2017 = pd.read_csv(saved_2017)
# df_2018 = pd.read_csv(saved_2018)
# df_2019 = pd.read_csv(saved_2019)
df_2020 = pd.read_csv(saved_2020)
df_2021 = pd.read_csv(saved_2021)
df_2022 = pd.read_csv(saved_2022)
# df_2023 = pd.read_csv(saved_2023)
# df_2024 = pd.read_csv(saved_2024)

In [None]:
covid_311_df = pd.concat([df_2020, df_2021, df_2022])

In [None]:
covid_311_df.head()

Unnamed: 0,unique_key,created_date,closed_date,complaint_type,incident_zip,incident_address,bbl,borough,latitude,longitude
0,48538697,2020-12-31 23:59:55,2021-01-01 01:07:04,Noise - Vehicle,10460.0,1569 HOE AVENUE,2029820000.0,BRONX,40.83582,-73.887516
1,48536596,2020-12-31 23:59:28,2021-01-01 01:33:12,Noise - Residential,10028.0,235 EAST 83 STREET,1015290000.0,MANHATTAN,40.776503,-73.954525
2,48536500,2020-12-31 23:58:55,2021-01-01 00:24:54,Noise - Residential,10468.0,2380 GRAND AVENUE,2031990000.0,BRONX,40.861553,-73.904168
3,48542024,2020-12-31 23:58:45,2021-01-14 16:49:17,Noise - Helicopter,10003.0,195 1 AVENUE,1004530000.0,MANHATTAN,40.729916,-73.983616
4,48543542,2020-12-31 23:58:39,2021-01-01 00:13:47,Noise - Residential,10034.0,571 ACADEMY STREET,1022218000.0,MANHATTAN,40.863565,-73.923221


In [None]:
covid_311_df.columns, covid_311_df.shape

(Index(['unique_key', 'created_date', 'closed_date', 'complaint_type',
        'incident_zip', 'incident_address', 'bbl', 'borough', 'latitude',
        'longitude'],
       dtype='object'),
 (4052446, 10))

In [None]:
covid_311_df.bbl = covid_311_df.bbl.astype('int64')

In [None]:
covid_311_df.isna().sum().sum(), covid_311_df.duplicated().sum()

(np.int64(203), np.int64(0))

In [None]:
covid_311_df.isna().sum()

Unnamed: 0,0
unique_key,0
created_date,0
closed_date,0
complaint_type,0
incident_zip,99
incident_address,0
bbl,0
borough,2
latitude,51
longitude,51


## **In this case, it makes sense to just fillna with string 'unknown' or integer "0" depending on the columns, because these columns are not that essential, once they are merged with the evictions_bbl_svi data, as these columns with nans will be replaced by the ones from the main table. We can drop these columns afterwards if they would cause problems.**

In [None]:
covid_311_df['incident_address'] = covid_311_df['incident_address'].fillna('unknown')

In [None]:
# other_columns = ['incident_zip', 'latitude', 'longitude']
covid_311_df['incident_zip'] = covid_311_df['incident_zip'].fillna(0)
covid_311_df['latitude'] = covid_311_df['latitude'].fillna(0)
covid_311_df['longitude'] = covid_311_df['longitude'].fillna(0)

In [None]:
covid_311_df.shape, covid_311_df.isna().sum().sum(), covid_311_df.duplicated().sum()

((4052446, 10), np.int64(2), np.int64(0))

# **Step 3: merge bbl_evictions_svi with 311 compalints data.**

### It turns out, we do need a  **pivot table**, but need to groupby first to make the merge process more seamless. Doing so also helps us ignore the nan issues we just had in the above steps as we totally ignore the columns that had troubled data

In [None]:
bbl_evictions_svi.columns, bbl_evictions_svi.shape

(Index(['primary_key', 'court_index_number', 'docket_number',
        'eviction_address', 'eviction_apartment_number', 'executed_date',
        'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
        'latitude', 'longitude', 'community_board', 'council_district',
        'census_tract', 'bin', 'bbl', 'nta', 'year', 'month_year', 'geometry',
        'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
        'unitsres', 'ownername', 'bldgarea', 'building_type',
        'building_category', 'is_condo', 'floor_category', 'rent_era',
        'architectural_style', 'economic_period', 'residential_units_category',
        'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
        'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
        'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
        'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
        'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp

In [None]:
covid_311_df.columns, covid_311_df.shape, bbl_evictions_svi.columns, bbl_evictions_svi.shape

(Index(['unique_key', 'created_date', 'closed_date', 'complaint_type',
        'incident_zip', 'incident_address', 'bbl', 'borough', 'latitude',
        'longitude'],
       dtype='object'),
 (4052446, 10),
 Index(['primary_key', 'court_index_number', 'docket_number',
        'eviction_address', 'eviction_apartment_number', 'executed_date',
        'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
        'latitude', 'longitude', 'community_board', 'council_district',
        'census_tract', 'bin', 'bbl', 'nta', 'year', 'month_year', 'geometry',
        'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
        'unitsres', 'ownername', 'bldgarea', 'building_type',
        'building_category', 'is_condo', 'floor_category', 'rent_era',
        'architectural_style', 'economic_period', 'residential_units_category',
        'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
        'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'r

In [None]:
court_bbl_map = bbl_evictions_svi[['primary_key', 'bbl']].drop_duplicates()
court_bbl_map.shape
# there are actually no duplicates, 70882, good

(6106, 2)

In [None]:
def categorize_complaint(complaint_type):
    complaint = complaint_type.lower().strip()

    # building systems and utilities stuff
    if 'heat' in complaint or 'hot water' in complaint:
        return 'heat_hot_water'
    elif any(term in complaint for term in ['water leak', 'plumbing', 'sewage']):
        return 'plumbing_issues'
    elif 'electric' in complaint:
        return 'electrical_issues'
    elif 'elevator' in complaint:
        return 'elevator_issues'

    # building structure and maintenance
    elif 'door' in complaint or 'window' in complaint:
        return 'doors_windows'
    elif any(term in complaint for term in ['paint', 'plaster', 'mold']):
        return 'walls_ceilings'
    elif 'floor' in complaint or 'stair' in complaint:
        return 'floors_stairs'
    elif 'outside building' in complaint:
        return 'building_exterior'
    elif 'appliance' in complaint:
        return 'appliances'

    # health and environmental impact
    elif 'unsanitary' in complaint or 'condition' in complaint:
        return 'sanitation_issues'
    elif any(pest in complaint for pest in ['rodent', 'mosquito', 'bee', 'wasp', 'pigeon']):
        return 'pest_issues'
    elif 'air' in complaint or 'asbestos' in complaint or 'smoking' in complaint:
        return 'air_quality'

    # noise (all noise complaints together)
    elif 'noise' in complaint:
        return 'noise_complaints'

    # public space influences and nuances
    elif 'homeless' in complaint or 'encampment' in complaint:
        return 'homeless_issues'
    elif 'graffiti' in complaint or 'advertisement' in complaint:
        return 'graffiti_posting'
    elif any(nuisance in complaint for nuisance in ['disorderly', 'panhandling', 'drinking', 'urinating', 'fireworks']):
        return 'public_nuisance'

    # living safety and services
    elif 'safety' in complaint:
        return 'safety_concerns'
    elif 'animal' in complaint or 'abuse' in complaint:
        return 'animal_issues'
    elif 'police' in complaint:
        return 'police_matters'

    # miscellaneous
    elif 'general' in complaint:
        return 'general_complaints'
    else:
        return 'other_issues'

## **We replaced real complaint types with categories to reduce the number of columns needed for a merged table. First, we re-group the complaint type and assign the counts to each category. Then we use a pivot table to show all the categries' names and counts. Then, we merge with the bbl_evictions_svi with the categries as columns so that the count of each type of complaints associated with each bbl will be preserved, and the size would be smaller (than if we didn't categorize) and easier for merge.**

In [None]:
covid_311_df['complaint_category'] = covid_311_df['complaint_type'].apply(categorize_complaint)

In [None]:
covid_311_df.shape
# add a new column to label the exact compalint type. Later we will use the wide form to expand all the values in this
# column and map them onto the column to form a pivot table

(4052446, 11)

In [None]:
covid_311_df.head()

Unnamed: 0,unique_key,created_date,closed_date,complaint_type,incident_zip,incident_address,bbl,borough,latitude,longitude,complaint_category
0,48538697,2020-12-31 23:59:55,2021-01-01 01:07:04,Noise - Vehicle,10460.0,1569 HOE AVENUE,2029820027,BRONX,40.83582,-73.887516,noise_complaints
1,48536596,2020-12-31 23:59:28,2021-01-01 01:33:12,Noise - Residential,10028.0,235 EAST 83 STREET,1015290018,MANHATTAN,40.776503,-73.954525,noise_complaints
2,48536500,2020-12-31 23:58:55,2021-01-01 00:24:54,Noise - Residential,10468.0,2380 GRAND AVENUE,2031990003,BRONX,40.861553,-73.904168,noise_complaints
3,48542024,2020-12-31 23:58:45,2021-01-14 16:49:17,Noise - Helicopter,10003.0,195 1 AVENUE,1004530034,MANHATTAN,40.729916,-73.983616,noise_complaints
4,48543542,2020-12-31 23:58:39,2021-01-01 00:13:47,Noise - Residential,10034.0,571 ACADEMY STREET,1022217501,MANHATTAN,40.863565,-73.923221,noise_complaints


In [None]:
covid_311_df.isna().sum().sum(), covid_311_df.duplicated().sum()

(np.int64(2), np.int64(0))

In [None]:
covid_311_df.shape
# no duplicates, 4052446

(4052446, 11)

In [None]:
covid_311_df.columns

Index(['unique_key', 'created_date', 'closed_date', 'complaint_type',
       'incident_zip', 'incident_address', 'bbl', 'borough', 'latitude',
       'longitude', 'complaint_category'],
      dtype='object')

In [None]:
bbl_evictions_svi.columns

Index(['primary_key', 'court_index_number', 'docket_number',
       'eviction_address', 'eviction_apartment_number', 'executed_date',
       'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'bbl', 'nta', 'year', 'month_year', 'geometry',
       'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
       'unitsres', 'ownername', 'bldgarea', 'building_type',
       'building_category', 'is_condo', 'floor_category', 'rent_era',
       'architectural_style', 'economic_period', 'residential_units_category',
       'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
       'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
       'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
       'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
       'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp', 'ep_asian',

In [None]:
bbl_evictions_svi.bbl.dtype

dtype('int64')

In [None]:
# count each category for each bbl
# group the complaints by bbl and categories and then count them
bbl_category_counts = covid_311_df.groupby(['bbl', 'complaint_category']).size().reset_index(name='count')

In [None]:
bbl_category_counts.bbl = bbl_category_counts.bbl.astype('int64')

In [None]:
bbl_category_counts

Unnamed: 0,bbl,complaint_category,count
0,0,animal_issues,1
1,0,appliances,7
2,0,doors_windows,8
3,0,electrical_issues,2
4,0,elevator_issues,16
...,...,...,...
639730,5270000501,sanitation_issues,1
639731,5270000504,sanitation_issues,1
639732,5270000506,noise_complaints,1
639733,5270000508,noise_complaints,1


## **It's necessary to use a bit pivot table transformation here, because we want this table to have a "wide" format so that:**

- each row represents a single bbl
- each complaint category becomes its own column
- the values show the count for each category

In [None]:
# use a bit pivot table here, to make this a wide format with categories as columns
# pivot to have categories as columns
bbl_complaints_wide = bbl_category_counts.pivot(
    index='bbl',
    columns='complaint_category',
    values='count'
).fillna(0).reset_index()

In [None]:
bbl_complaints_wide.isna().sum().sum(), bbl_complaints_wide.duplicated().sum()

(np.int64(0), np.int64(0))

In [None]:
bbl_evictions_svi.bbl.nunique(), covid_311_df.bbl.nunique(), bbl_complaints_wide.bbl.nunique(), bbl_complaints_wide.shape

(4827, 282098, 282098, (282098, 22))

In [None]:
bbl_complaints_wide
# correct shape, (342961, 22)

complaint_category,bbl,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings
0,0,0.0,1.0,7.0,0.0,8.0,2.0,16.0,6.0,9.0,1.0,43.0,1.0,244.0,22.0,19.0,47.0,3.0,0.0,0.0,17.0,13.0
1,1000010010,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0
2,1000010101,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1000010201,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1000020001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
282093,5270000501,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,4.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
282094,5270000504,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
282095,5270000506,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
282096,5270000508,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
len(bbl_complaints_wide.columns) - 1

21

In [None]:
all_categories = [
    'heat_hot_water', 'plumbing_issues', 'electrical_issues', 'elevator_issues',
    'doors_windows', 'walls_ceilings', 'floors_stairs', 'building_exterior',
    'appliances', 'sanitation_issues', 'pest_issues', 'air_quality',
    'noise_complaints', 'homeless_issues', 'graffiti_posting', 'public_nuisance',
    'safety_concerns', 'animal_issues', 'police_matters', 'general_complaints',
    'other_issues'
]
# complete
len(all_categories)

21

In [None]:
# add a total column
bbl_complaints_wide['total_complaints'] = bbl_complaints_wide[all_categories].sum(axis=1)

In [None]:
bbl_complaints_wide
# so far, we do have the 311 complaint part figure out

complaint_category,bbl,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,0,0.0,1.0,7.0,0.0,8.0,2.0,16.0,6.0,9.0,1.0,43.0,1.0,244.0,22.0,19.0,47.0,3.0,0.0,0.0,17.0,13.0,459.0
1,1000010010,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,12.0
2,1000010101,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
3,1000010201,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
4,1000020001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
282093,5270000501,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,4.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,8.0
282094,5270000504,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0
282095,5270000506,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
282096,5270000508,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [None]:
bbl_evictions_svi.bbl.dtype, bbl_complaints_wide.bbl.dtype

(dtype('int64'), dtype('int64'))

In [None]:
bbl_complaints_wide.shape

(282098, 23)

In [None]:
bbl_evictions_svi.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile
0,004123/20_209969,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,2032140141,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3
1,0050153/20_106030,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,4031560133,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low)
2,0052002/19_101926,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,3051370021,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6,1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.93,0.4536,0.9639,0.9692,0.922,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2
3,0057757/18_100889,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,3011850034,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8,1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.933,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low)
5,0061902/19_117253,0061902/19,117253,83-33 118TH STREET,5N,2020-02-14,QUEENS,11415,Not an Ejectment,Possession,40.706235,-73.834603,9.0,29.0,134.0,4079390,4033220043,Kew Gardens,2020,2020-02,POINT (-73.834603 40.706235),0.4,1979.0,D1,6.0,79.0,CIAMPA METROPOLITAN CO,72147.0,post-war,elevator,False,mid-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",21-100 units,False,very large,Q4 (largest 25%),1970-1979,11415,20315.0,0.7661,0.5573,0.898,0.9396,0.8761,14.6,5.6,11.8,4.7,17.0,18.0,10.9,7.5,44.3,8.5,32.3,6.7,22.9,22.3,0.2,0.0,3.4,2.1,57.7,42.3,False,Q1 (Low)


In [None]:
bbl_evictions_svi.shape

(6106, 69)

In [None]:
bbl_evictions_svi_311 = bbl_evictions_svi.merge(
    bbl_complaints_wide,
    on='bbl',
    how='left'
)
# the final merge with bbl, evictions, svi with 311 complaints

In [None]:
bbl_evictions_svi_311.isna().sum()

Unnamed: 0,0
primary_key,0
court_index_number,0
docket_number,0
eviction_address,0
eviction_apartment_number,0
...,...
public_nuisance,720
safety_concerns,720
sanitation_issues,720
walls_ceilings,720


In [None]:
f"{720/bbl_evictions_svi_311.shape[0]*100:.2f} % of the rows have nans"

'11.79 % of the rows have nans'

In [None]:
nan_counts = bbl_evictions_svi_311.isna().sum()
columns_with_nans = nan_counts[nan_counts > 0]
columns_with_nans

Unnamed: 0,0
air_quality,720
animal_issues,720
appliances,720
building_exterior,720
doors_windows,720
electrical_issues,720
elevator_issues,720
floors_stairs,720
general_complaints,720
graffiti_posting,720


## **In this case, it would make no sense to fill these nans, as it will only add more inaccuracies to the dataset. We will drop all the rows that have nans in them.**

In [None]:
bbl_evictions_svi_311 = bbl_evictions_svi_311.dropna()

In [None]:
bbl_evictions_svi_311.isna().sum().sum(), bbl_evictions_svi_311.duplicated().sum(), bbl_evictions_svi_311.shape

(np.int64(0), np.int64(0), (5386, 91))

In [None]:
bbl_evictions_svi_311

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,004123/20_209969,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,2032140141,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.9870,0.9470,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0
1,0050153/20_106030,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,4031560133,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0
2,0052002/19_101926,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,3051370021,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6,1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.9300,0.4536,0.9639,0.9692,0.9220,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
3,0057757/18_100889,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,3011850034,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8,1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.9330,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low),0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,12.0,0.0,17.0,0.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,45.0
4,0061902/19_117253,0061902/19,117253,83-33 118TH STREET,5N,2020-02-14,QUEENS,11415,Not an Ejectment,Possession,40.706235,-73.834603,9.0,29.0,134.0,4079390,4033220043,Kew Gardens,2020,2020-02,POINT (-73.834603 40.706235),0.4,1979.0,D1,6.0,79.0,CIAMPA METROPOLITAN CO,72147.0,post-war,elevator,False,mid-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",21-100 units,False,very large,Q4 (largest 25%),1970-1979,11415,20315.0,0.7661,0.5573,0.8980,0.9396,0.8761,14.6,5.6,11.8,4.7,17.0,18.0,10.9,7.5,44.3,8.5,32.3,6.7,22.9,22.3,0.2,0.0,3.4,2.1,57.7,42.3,False,Q1 (Low),0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,9.0,0.0,19.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0,38.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6100,R52623/19_101227,R52623/19,101227,185 ST.MARKS PLACE,13A,2020-02-19,STATEN ISLAND,10301,Not an Ejectment,Possession,40.645358,-74.080729,1.0,49.0,7.0,5108502,5000130008,West New Brighton-New Brighton-St. George,2020,2020-02,POINT (-74.080729 40.645358),0.6,1976.0,D3,20.0,454.0,NYC HOUSING DEVELOPMENT CORP.,524513.0,post-war,elevator,False,high-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",100+ units,False,mega,Q4 (largest 25%),1970-1979,10301,40331.0,0.8784,0.7487,0.8992,0.9869,0.9329,20.4,7.1,13.7,5.2,15.6,20.7,11.6,6.5,25.5,8.1,32.2,19.9,26.3,7.6,0.7,0.0,3.7,0.3,58.6,41.4,False,Q2,0.0,4.0,3.0,0.0,19.0,3.0,2.0,22.0,4.0,0.0,24.0,0.0,253.0,1.0,0.0,27.0,6.0,9.0,9.0,20.0,16.0,422.0
6101,R52635/19_101039,R52635/19,101039,185 ST.MARKS PLACE,5E,2020-01-06,STATEN ISLAND,10301,Not an Ejectment,Possession,40.645358,-74.080729,1.0,49.0,7.0,5108502,5000130008,West New Brighton-New Brighton-St. George,2020,2020-01,POINT (-74.080729 40.645358),0.6,1976.0,D3,20.0,454.0,NYC HOUSING DEVELOPMENT CORP.,524513.0,post-war,elevator,False,high-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",100+ units,False,mega,Q4 (largest 25%),1970-1979,10301,40331.0,0.8784,0.7487,0.8992,0.9869,0.9329,20.4,7.1,13.7,5.2,15.6,20.7,11.6,6.5,25.5,8.1,32.2,19.9,26.3,7.6,0.7,0.0,3.7,0.3,58.6,41.4,False,Q2,0.0,4.0,3.0,0.0,19.0,3.0,2.0,22.0,4.0,0.0,24.0,0.0,253.0,1.0,0.0,27.0,6.0,9.0,9.0,20.0,16.0,422.0
6102,R52697/19_101672,R52697/19,101672,351 ADELAIDE AVENUE,18,2020-01-22,STATEN ISLAND,10306,Not an Ejectment,Possession,40.558150,-74.118731,3.0,50.0,12805.0,5063024,5046760073,Oakwood-Oakwood Beach,2020,2020-01,POINT (-74.118731 40.55815),0.2,1965.0,C9,2.0,16.0,351 ADELAIDE AVENUE HOLDINGS LLC,8460.0,post-war,walk-up,False,low-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",6-20 units,True,medium,Q4 (largest 25%),1960-1969,10306,56232.0,0.7769,0.8011,0.7994,0.8319,0.8739,16.6,6.7,11.5,4.3,18.8,21.3,11.1,8.5,14.7,4.3,28.0,3.0,16.2,13.8,0.0,0.0,3.5,0.4,36.8,63.2,False,Q1 (Low),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
6103,R52769/19_102174,R52769/19,102174,31 MARKHAM LANE,1A,2020-02-19,STATEN ISLAND,10310,Not an Ejectment,Possession,40.639034,-74.116274,1.0,49.0,97.0,5108650,5001690001,West New Brighton-New Brighton-St. George,2020,2020-02,POINT (-74.116274 40.639034),0.2,2007.0,C1,3.0,240.0,MARKHAM GARDENS L.P.,236523.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",100+ units,False,mega,Q4 (largest 25%),2000-2009,10310,26239.0,0.7569,0.7972,0.8975,0.8991,0.8928,21.0,4.2,12.6,4.5,12.6,25.7,8.7,6.0,21.8,4.4,28.0,20.0,29.0,5.3,0.1,0.0,3.0,0.2,57.6,42.4,False,Q1 (Low),0.0,0.0,0.0,0.0,3.0,1.0,0.0,2.0,3.0,0.0,11.0,0.0,25.0,0.0,4.0,9.0,0.0,4.0,1.0,8.0,8.0,79.0


In [None]:
zero_bbl_count = (bbl_evictions_svi_311['bbl'] == 0).sum()
zero_bbl_count
# no bbl == 0 rows

np.int64(0)

In [None]:
all_columns = list(bbl_evictions_svi_311.columns),
# len(all_columns)
# all_columns
type(all_columns), len(all_columns[0]) # wierd, have to use list comprehension, as remove() does not work

(tuple, 91)

## **There is one less column in this covid df compared to the normal time df, and that is svi_group, where we categories the svi theme1's score to low, medium, and high. It was only in normal time df because only normal time svi merged df did the regression analysis where this column was added. (see the evidence at the very end)**

In [None]:
# bbl_evictions_svi_311

In [None]:
# the goal is to move "bbl" to the front of the dataframe
# all_columns = merged_with_complaints.columns.tolist()
# print(all_columns)
# if 'court_index_number' in all_columns:
#     print("yes, court_index_number")
#     all_columns.remove('court_index_number')
# if 'bbl' in all_columns:
#     print("yes, bbl")
#     all_columns.remove('bbl')
# all_columns
remaining_columns = [col for col in all_columns if col not in ['primary_key', 'bbl']]
remaining_columns = remaining_columns[0]
print(len(remaining_columns))
remaining_columns.remove('primary_key')
remaining_columns.remove('bbl')

91


In [None]:
len(remaining_columns)
# good

89

In [None]:
new_column_order = ['primary_key', 'bbl'] + remaining_columns

In [None]:
# new order in place
bbl_evictions_svi_311 = bbl_evictions_svi_311[new_column_order]

In [None]:
display(bbl_evictions_svi_311.head())

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,004123/20_209969,2032140141,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0
1,0050153/20_106030,4031560133,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0
2,0052002/19_101926,3051370021,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6,1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.93,0.4536,0.9639,0.9692,0.922,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
3,0057757/18_100889,3011850034,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8,1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.933,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low),0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,12.0,0.0,17.0,0.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,45.0
4,0061902/19_117253,4033220043,0061902/19,117253,83-33 118TH STREET,5N,2020-02-14,QUEENS,11415,Not an Ejectment,Possession,40.706235,-73.834603,9.0,29.0,134.0,4079390,Kew Gardens,2020,2020-02,POINT (-73.834603 40.706235),0.4,1979.0,D1,6.0,79.0,CIAMPA METROPOLITAN CO,72147.0,post-war,elevator,False,mid-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",21-100 units,False,very large,Q4 (largest 25%),1970-1979,11415,20315.0,0.7661,0.5573,0.898,0.9396,0.8761,14.6,5.6,11.8,4.7,17.0,18.0,10.9,7.5,44.3,8.5,32.3,6.7,22.9,22.3,0.2,0.0,3.4,2.1,57.7,42.3,False,Q1 (Low),0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,9.0,0.0,19.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0,38.0


In [None]:
bbl_evictions_svi_311.shape

(5386, 91)

In [None]:
# remove rows with BBL = 0
bbl_evictions_svi_311 = bbl_evictions_svi_311[bbl_evictions_svi_311['bbl'] != 0] # good
len(bbl_evictions_svi_311)

5386

In [None]:
bbl_evictions_svi_311.isna().sum().sum(), bbl_evictions_svi_311.duplicated().sum() # all clean

(np.int64(0), np.int64(0))

In [None]:
bbl_evictions_svi_311.shape
# final shape

(5386, 91)

In [None]:
bbl_evictions_svi_311.info(), \
bbl_evictions_svi_311.shape

<class 'pandas.core.frame.DataFrame'>
Index: 5386 entries, 0 to 6104
Data columns (total 91 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   primary_key                  5386 non-null   object 
 1   bbl                          5386 non-null   int64  
 2   court_index_number           5386 non-null   object 
 3   docket_number                5386 non-null   int64  
 4   eviction_address             5386 non-null   object 
 5   eviction_apartment_number    5386 non-null   object 
 6   executed_date                5386 non-null   object 
 7   borough                      5386 non-null   object 
 8   zipcode                      5386 non-null   int64  
 9   ejectment                    5386 non-null   object 
 10  eviction/legal_possession    5386 non-null   object 
 11  latitude                     5386 non-null   float64
 12  longitude                    5386 non-null   float64
 13  community_board        

(None, (5386, 91))

In [None]:
complaint_cols = ['bbl'] + all_categories + ['total_complaints']
existing_cols = [col for col in complaint_cols if col in bbl_evictions_svi_311.columns]
existing_cols

['bbl',
 'heat_hot_water',
 'plumbing_issues',
 'electrical_issues',
 'elevator_issues',
 'doors_windows',
 'walls_ceilings',
 'floors_stairs',
 'building_exterior',
 'appliances',
 'sanitation_issues',
 'pest_issues',
 'air_quality',
 'noise_complaints',
 'homeless_issues',
 'graffiti_posting',
 'public_nuisance',
 'safety_concerns',
 'animal_issues',
 'police_matters',
 'general_complaints',
 'other_issues',
 'total_complaints']

In [None]:
# just take a look at the ones related to the 311 complaint part
display(bbl_evictions_svi_311[['primary_key'] + existing_cols].head())

Unnamed: 0,primary_key,bbl,heat_hot_water,plumbing_issues,electrical_issues,elevator_issues,doors_windows,walls_ceilings,floors_stairs,building_exterior,appliances,sanitation_issues,pest_issues,air_quality,noise_complaints,homeless_issues,graffiti_posting,public_nuisance,safety_concerns,animal_issues,police_matters,general_complaints,other_issues,total_complaints
0,004123/20_209969,2032140141,1.0,2.0,0.0,0.0,3.0,1.0,2.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14.0
1,0050153/20_106030,4031560133,62.0,4.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,2.0,0.0,0.0,34.0,0.0,0.0,0.0,0.0,2.0,1.0,2.0,0.0,112.0
2,0052002/19_101926,3051370021,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
3,0057757/18_100889,3011850034,12.0,2.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,2.0,3.0,0.0,17.0,0.0,0.0,0.0,2.0,1.0,1.0,1.0,0.0,45.0
4,0061902/19_117253,4033220043,9.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,6.0,1.0,0.0,19.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,38.0


In [None]:
# count how many buildings have each type of complaint
buildings_with_complaints_clean = {col: (bbl_evictions_svi_311[col] > 0).sum() for col in existing_cols[1:]}
# sorted_counts = sorted(buildings_with_complaints.items(), key=lambda x: x[1], reverse=True)
# this is just a list
complaint_counts_df = pd.DataFrame(list(buildings_with_complaints_clean.items()),
                                  columns=['complaint_category', 'building_count'])

In [None]:
complaint_counts_df = complaint_counts_df.sort_values('building_count', ascending=False)
complaint_counts_df = complaint_counts_df.reset_index(drop=True)
complaint_counts_df

Unnamed: 0,complaint_category,building_count
0,total_complaints,5386
1,noise_complaints,4687
2,plumbing_issues,3986
3,heat_hot_water,3957
4,sanitation_issues,3773
5,doors_windows,3093
6,walls_ceilings,3089
7,electrical_issues,2608
8,general_complaints,2539
9,pest_issues,2430


In [None]:
evictions_per_building = bbl_evictions_svi_311.groupby('bin').size().reset_index(name='total_evictions')
evictions_per_building

Unnamed: 0,bin,total_evictions
0,1000000,6
1,1000793,2
2,1000810,1
3,1000826,1
4,1000828,1
...,...,...
4365,5158918,1
4366,5161399,1
4367,5164857,1
4368,5164902,1


In [None]:
evictions_per_building['average_year_eviction_count'] = evictions_per_building['total_evictions'] / 3
evictions_per_building

Unnamed: 0,bin,total_evictions,average_year_eviction_count
0,1000000,6,2.000000
1,1000793,2,0.666667
2,1000810,1,0.333333
3,1000826,1,0.333333
4,1000828,1,0.333333
...,...,...,...
4365,5158918,1,0.333333
4366,5161399,1,0.333333
4367,5164857,1,0.333333
4368,5164902,1,0.333333


In [None]:
# merge the average_year_eviction_count back to the evictions_df based on bin
# bbl_evictions_svi_311 = bbl_evictions_svi_311.merge(evictions_per_building[['bin', 'average_year_eviction_count']], on='bin', how='left')
bbl_evictions_svi_311[['bin', 'average_year_eviction_count']].sort_values('average_year_eviction_count', ascending=False)

Unnamed: 0,bin,average_year_eviction_count
4436,3000000,2.6
4437,3000000,2.6
2395,3000000,2.6
5551,3000000,2.6
5568,3000000,2.6
...,...,...
22,1055174,0.2
6070,5106092,0.2
6071,5006052,0.2
6073,5061241,0.2


In [None]:
bbl_evictions_svi_311['average_year_eviction_unit_count'] = bbl_evictions_svi_311.average_year_eviction_count / bbl_evictions_svi_311.unitsres
bbl_evictions_svi_311[['bin', 'average_year_eviction_count', 'unitsres','average_year_eviction_unit_count']].sort_values('average_year_eviction_unit_count', ascending=False).head()

Unnamed: 0,bin,average_year_eviction_count,unitsres,average_year_eviction_unit_count
6050,5112952,1.2,1.0,1.2
3286,3033141,1.2,1.0,1.2
3285,3033141,1.2,1.0,1.2
3287,3033141,1.2,1.0,1.2
6038,5112952,1.2,1.0,1.2


In [None]:
# Start here

In [None]:
# add ave evi per unit year for each building
bbl_evictions_svi_311['average_year_eviction_unit_count'] = bbl_evictions_svi_311.average_year_eviction_count / bbl_evictions_svi_311.unitsres
bbl_evictions_svi_311['average_year_eviction_unit_count'].head()

Unnamed: 0,average_year_eviction_unit_count
0,0.066667
1,0.001105
2,0.3
3,0.016667
4,0.005063


In [None]:
bbl_evictions_svi_311[['primary_key', 'bin', 'nta', 'borough', 'eviction_address',
                       'average_year_eviction_count', 'unitsres',
                       'average_year_eviction_unit_count']].sort_values('average_year_eviction_unit_count', ascending=False).head()

Unnamed: 0,primary_key,bin,nta,borough,eviction_address,average_year_eviction_count,unitsres,average_year_eviction_unit_count
6050,R300753/21_104122,5112952,West New Brighton-New Brighton-St. George,STATEN ISLAND,97 TAYLOR STREET,1.2,1.0,1.2
3286,56539/19B_96673,3033141,Crown Heights South,BROOKLYN,1035 PRESIDENT ST,1.2,1.0,1.2
3285,56539/19A_96672,3033141,Crown Heights South,BROOKLYN,1035 PRESIDENT ST,1.2,1.0,1.2
3287,56540/19A_96596,3033141,Crown Heights South,BROOKLYN,1035 PRESIDENT ST,1.2,1.0,1.2
6038,R300572/21_104044,5112952,West New Brighton-New Brighton-St. George,STATEN ISLAND,97 TAYLOR STREET,1.2,1.0,1.2


In [None]:
bbl_evictions_svi_311.eviction_address.isna().sum()

np.int64(0)

In [None]:
max(bbl_evictions_svi_311.bin), min(bbl_evictions_svi_311.bin)

(5165243, 1000000)

In [None]:
bin_4000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000000]
bin_1000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 1000000]
bin_2000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 2000000]
bin_3000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 3000000]
bin_5000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 5000000]
bin_1000000.shape, bin_2000000.shape, bin_3000000.shape, bin_4000000.shape, bin_5000000.shape

((6, 92), (5, 92), (8, 92), (0, 92), (1, 92))

In [None]:
# bin_1000000

In [None]:
problematic_ones = pd.concat([bin_1000000, bin_2000000, bin_3000000, bin_5000000])
problematic_ones.columns

Index(['primary_key', 'bbl', 'court_index_number', 'docket_number',
       'eviction_address', 'eviction_apartment_number', 'executed_date',
       'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'nta', 'year', 'month_year', 'geometry',
       'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
       'unitsres', 'ownername', 'bldgarea', 'building_type',
       'building_category', 'is_condo', 'floor_category', 'rent_era',
       'architectural_style', 'economic_period', 'residential_units_category',
       'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
       'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
       'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
       'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
       'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp', 'ep_asian',

In [None]:
unique_addresses = problematic_ones.eviction_address.unique()
unique_addresses, len(unique_addresses)

(array(['626 FIRST AVENUE', '411 WEST 35TH STREET', '222 EAST 44TH STREET',
        '1117 HOE AVENUE', '571 EAST 167TH ST', '1395 NELSON AVENUE',
        '1071 TINTON AVENUE', '486 EAST 165TH ST', '385 CLASSON AVENUE',
        '1157 MYRTLE AVENUE', '2178 BERGEN STREET',
        '2960 WEST 29TH STREE T', '743 LAFAYETTE AVE',
        '385-387 FRANKLIN AVE', '306 MACDOUGAL STREET',
        '460N BRIELLE AVENUE'], dtype=object),
 16)

# **Important: Why there is address that has such high per unit count eviction per year? Because they share the same bin (source error), and we get the average eviction per year per building by grouping by the bins. Therefore, each of these addresses (different ones) has the total evictions grouped under them under the same bin (namely, 400000, 100000, 200000, 300000, 500000, 600000 etc)**

## **So is there any bin numbers for us to take an empty place to insert this duplicated bin but unique address into it??**

In [None]:
# bins_to_occupy = list(range(4000001, 4000153))
# bins_to_occupy[-5:], type(bins_to_occupy)

In [None]:
unique_bins = bbl_evictions_svi_311.bin.unique()
len(unique_bins), min(unique_bins), max(unique_bins)

(4370, np.int64(1000000), np.int64(5165243))

In [None]:
bins_to_occupy = list(range(5171959, 5171959+152))
bins_to_occupy[-5:], type(bins_to_occupy)

([5172106, 5172107, 5172108, 5172109, 5172110], list)

In [None]:
# bins_to_occupy = [
#     [b] if np.isscalar(b) else b
#     for b in bins_to_occupy
# ]
# type(bins_to_occupy)

In [None]:
# any(bin in unique_bins for bins in bins_to_occupy for bin in bins)

In [None]:
# # so is there any bin that is 4000001 for us to take an empty place to insert this duplicated bin but unique address into it??
# bin_4000001 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000001]
# bin_4000002 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000002]
# bin_4000003 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000003]
# bin_4000004 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000004]
# bin_4000005 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000005]
# bin_4000006 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000006]
# bin_4000007 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000007]
# bin_4000008 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000008]
# bin_4000009 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000009]
# bin_4000010 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000010]
# bin_4000011 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000011]
# bin_4000012 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000012]
# bin_4000013 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000013]
# bin_4000014 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000014]
# bin_4000015 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000015]
# bin_4000009.shape, bin_4000010.shape, bin_4000011.shape, bin_4000012.shape, bin_4000013.shape, bin_4000014.shape, bin_4000015.shape
# # so, yes there is, and therefore, we can manually move those address to those empty slots

In [None]:
# bin_4000001.shape, bin_4000002.shape, bin_4000003.shape, bin_4000004.shape, bin_4000005.shape, bin_4000006.shape, bin_4000007.shape, bin_4000008.shape,

In [None]:
problematic_ones.shape

(20, 92)

In [None]:
# use mapping to do this, so unique addresses can have a new bin
new_bins = range(5165243, 5165243 + 20)
# bin_mapping = (
#     problematic_ones.groupby('eviction_address')['bin']
#     .first()
#     .reset_index()
#     .assign(new_bin=new_bins)
#     .set_index('eviction_address')
#     ['new_bin']
#     .to_dict()
# )
unique_addresses = problematic_ones['eviction_address'].unique()
bin_mapping = {
    address: new_bin
    for address, new_bin in zip(unique_addresses, new_bins)
}

In [None]:
problematic_ones.shape

(20, 92)

In [None]:
bin_mapping
# mapped each address with a new unique bin

{'626 FIRST AVENUE': 5165243,
 '411 WEST 35TH STREET': 5165244,
 '222 EAST 44TH STREET': 5165245,
 '1117 HOE AVENUE': 5165246,
 '571 EAST 167TH ST': 5165247,
 '1395 NELSON AVENUE': 5165248,
 '1071 TINTON AVENUE': 5165249,
 '486 EAST 165TH ST': 5165250,
 '385 CLASSON AVENUE': 5165251,
 '1157 MYRTLE AVENUE': 5165252,
 '2178 BERGEN STREET': 5165253,
 '2960 WEST 29TH STREE T': 5165254,
 '743 LAFAYETTE AVE': 5165255,
 '385-387 FRANKLIN AVE': 5165256,
 '306 MACDOUGAL STREET': 5165257,
 '460N BRIELLE AVENUE': 5165258}

In [None]:
# stop

In [None]:
# bbl_evictions_svi_311['bin'] = bbl_evictions_svi_311['eviction_address'].map(bin_mapping)
# bbl_evictions_svi_311.shape, bbl_evictions_svi_311.bin.isna().sum().sum()
# 66397 - 66120
bbl_evictions_svi_311['bin'] = bbl_evictions_svi_311['bin'].where(
    ~bbl_evictions_svi_311['eviction_address'].isin(bin_mapping.keys()),
    bbl_evictions_svi_311['eviction_address'].map(bin_mapping)
)
# only changing the ones had errors so the good ones do not turn nan

In [None]:
bbl_evictions_svi_311.shape, bbl_evictions_svi_311.bin.isna().sum().sum()
# good

((5386, 92), np.int64(0))

In [None]:
bin_5172110 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 5165243]
bin_5172110
# an example

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count
1041,302038/21_361289,1009670001,302038/21,361289,626 FIRST AVENUE,W.48B,2022-08-08,MANHATTAN,10016,Not an Ejectment,Possession,40.744898,-73.972688,6.0,4.0,8601.0,5165243,Turtle Bay-East Midtown,2022,2022-08,POINT (-73.972688 40.744898),1.4,2014.0,D8,47.0,761.0,616 FIRST AVENUE LLC,922828.0,post-war,elevator,False,high-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","2009–present, post-financial crisis",100+ units,True,mega,Q4 (largest 25%),2010-2020,10016,54369.0,0.3033,0.0718,0.8355,0.9214,0.4822,12.1,4.1,3.4,4.1,12.6,8.9,6.0,1.7,86.8,3.3,19.6,4.6,12.5,19.2,0.0,0.0,5.3,0.2,41.8,58.2,False,Q1 (Low),0.0,8.0,0.0,0.0,8.0,0.0,2.0,0.0,3.0,0.0,1.0,0.0,79.0,0.0,3.0,0.0,2.0,1.0,1.0,2.0,1.0,111.0,0.00184
4077,69710/19_358093,1009670001,69710/19,358093,626 FIRST AVENUE,E.18G,2020-01-06,MANHATTAN,10016,Not an Ejectment,Possession,40.744898,-73.972688,6.0,4.0,8601.0,5165243,Turtle Bay-East Midtown,2020,2020-01,POINT (-73.972688 40.744898),1.4,2014.0,D8,47.0,761.0,616 FIRST AVENUE LLC,922828.0,post-war,elevator,False,high-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","2009–present, post-financial crisis",100+ units,True,mega,Q4 (largest 25%),2010-2020,10016,54369.0,0.3033,0.0718,0.8355,0.9214,0.4822,12.1,4.1,3.4,4.1,12.6,8.9,6.0,1.7,86.8,3.3,19.6,4.6,12.5,19.2,0.0,0.0,5.3,0.2,41.8,58.2,False,Q1 (Low),0.0,8.0,0.0,0.0,8.0,0.0,2.0,0.0,3.0,0.0,1.0,0.0,79.0,0.0,3.0,0.0,2.0,1.0,1.0,2.0,1.0,111.0,0.00184
4333,72813/19_359649,1009670001,72813/19,359649,626 FIRST AVENUE,W25L,2020-03-13,MANHATTAN,10016,Not an Ejectment,Possession,40.744898,-73.972688,6.0,4.0,8601.0,5165243,Turtle Bay-East Midtown,2020,2020-03,POINT (-73.972688 40.744898),1.4,2014.0,D8,47.0,761.0,616 FIRST AVENUE LLC,922828.0,post-war,elevator,False,high-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","2009–present, post-financial crisis",100+ units,True,mega,Q4 (largest 25%),2010-2020,10016,54369.0,0.3033,0.0718,0.8355,0.9214,0.4822,12.1,4.1,3.4,4.1,12.6,8.9,6.0,1.7,86.8,3.3,19.6,4.6,12.5,19.2,0.0,0.0,5.3,0.2,41.8,58.2,False,Q1 (Low),0.0,8.0,0.0,0.0,8.0,0.0,2.0,0.0,3.0,0.0,1.0,0.0,79.0,0.0,3.0,0.0,2.0,1.0,1.0,2.0,1.0,111.0,0.00184
6095,R52294/19_101032,5012430020,R52294/19,101032,120 HARBOR LOOP,120A,2020-01-06,STATEN ISLAND,10303,Not an Ejectment,Possession,40.632576,-74.161431,1.0,49.0,31901.0,5165243,Mariner's Harbor-Arlington-Port Ivory-Graniteville,2020,2020-01,POINT (-74.161431 40.632576),0.2,1992.0,C9,2.0,40.0,BUSHWICK SHARP REALTY LLC,27063.0,post-war,walk-up,False,low-rise,"1970–1993, deregularization","1981–2000, Post-Modernism","1991–2008, modern economic growth",21-100 units,True,very large,Q4 (largest 25%),1990-1999,10303,27083.0,0.9209,0.8484,0.9555,0.8906,0.9375,25.3,7.9,19.3,5.3,11.8,23.9,10.5,7.9,26.5,7.9,28.5,30.0,39.0,9.8,0.0,0.0,3.4,0.4,82.6,17.4,False,Q2,0.0,0.0,3.0,0.0,3.0,6.0,0.0,2.0,4.0,0.0,5.0,0.0,11.0,0.0,0.0,6.0,0.0,0.0,2.0,6.0,10.0,58.0,0.005


In [None]:
bbl_evictions_svi_311.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count
0,004123/20_209969,2032140141,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),0.2,2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0,0.066667
1,0050153/20_106030,4031560133,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),0.2,1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0,0.001105
2,0052002/19_101926,3051370021,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),0.6,1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.93,0.4536,0.9639,0.9692,0.922,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.3
3,0057757/18_100889,3011850034,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),0.8,1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.933,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low),0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,12.0,0.0,17.0,0.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,45.0,0.016667
4,0061902/19_117253,4033220043,0061902/19,117253,83-33 118TH STREET,5N,2020-02-14,QUEENS,11415,Not an Ejectment,Possession,40.706235,-73.834603,9.0,29.0,134.0,4079390,Kew Gardens,2020,2020-02,POINT (-73.834603 40.706235),0.4,1979.0,D1,6.0,79.0,CIAMPA METROPOLITAN CO,72147.0,post-war,elevator,False,mid-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",21-100 units,False,very large,Q4 (largest 25%),1970-1979,11415,20315.0,0.7661,0.5573,0.898,0.9396,0.8761,14.6,5.6,11.8,4.7,17.0,18.0,10.9,7.5,44.3,8.5,32.3,6.7,22.9,22.3,0.2,0.0,3.4,2.1,57.7,42.3,False,Q1 (Low),0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,9.0,0.0,19.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0,38.0,0.005063


In [None]:
bbl_evictions_svi_311.isna().sum().sum(), bbl_evictions_svi_311.duplicated().sum()
# nice

(np.int64(0), np.int64(0))

In [None]:
bbl_evictions_svi_311.sort_values('average_year_eviction_unit_count', ascending=True).head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count
2513,35653/19_172385,2051410120,35653/19,172385,600 BAYCHESTER AVE,9B,2022-10-31,BRONX,10475,Not an Ejectment,Possession,40.871669,-73.833317,10.0,12.0,46201.0,2095397,Co-op City,2022,2022-10,POINT (-73.833317 40.871669),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,13.0,6.0,8.0,1.0,78.0,68.0,28.0,109.0,68.0,0.0,154.0,1.0,790.0,4.0,13.0,292.0,11.0,6.0,12.0,293.0,211.0,2166.0,1.8e-05
480,25776/19_172081,2051410120,25776/19,172081,920 BAYCHESTER AVE,3B,2020-01-15,BRONX,10475,Not an Ejectment,Possession,40.877178,-73.833625,10.0,12.0,46201.0,2128810,Co-op City,2020,2020-01,POINT (-73.833625 40.877178),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,13.0,6.0,8.0,1.0,78.0,68.0,28.0,109.0,68.0,0.0,154.0,1.0,790.0,4.0,13.0,292.0,11.0,6.0,12.0,293.0,211.0,2166.0,1.8e-05
371,20651/19_172369,2051410120,20651/19,172369,140 BENCHLEY PLACE,4K,2020-02-11,BRONX,10475,Not an Ejectment,Possession,40.874189,-73.825876,10.0,12.0,46201.0,2096755,Co-op City,2020,2020-02,POINT (-73.825876 40.874189),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,13.0,6.0,8.0,1.0,78.0,68.0,28.0,109.0,68.0,0.0,154.0,1.0,790.0,4.0,13.0,292.0,11.0,6.0,12.0,293.0,211.0,2166.0,1.8e-05
369,20642/19_172322,2051410120,20642/19,172322,120 ALDRICH STREET,10B,2020-02-11,BRONX,10475,Not an Ejectment,Possession,40.870146,-73.831665,10.0,12.0,46201.0,2128836,Co-op City,2020,2020-02,POINT (-73.831665 40.870146),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,13.0,6.0,8.0,1.0,78.0,68.0,28.0,109.0,68.0,0.0,154.0,1.0,790.0,4.0,13.0,292.0,11.0,6.0,12.0,293.0,211.0,2166.0,1.8e-05
2515,35672/19_172386,2051410120,35672/19,172386,2A ADLER PLACE,unknown,2020-02-25,BRONX,10475,Not an Ejectment,Possession,40.871475,-73.828183,10.0,12.0,46201.0,2124552,Co-op City,2020,2020-02,POINT (-73.828183 40.871475),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,13.0,6.0,8.0,1.0,78.0,68.0,28.0,109.0,68.0,0.0,154.0,1.0,790.0,4.0,13.0,292.0,11.0,6.0,12.0,293.0,211.0,2166.0,1.8e-05


In [None]:
bbl_evictions_svi_311.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,5386.0
mean,0.30609
std,0.205902
min,0.2
25%,0.2
50%,0.2
75%,0.4
max,2.6


In [None]:
# Stopp

In [None]:
bbl_evictions_svi_311.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,5386.0
mean,0.30609
std,0.205902
min,0.2
25%,0.2
50%,0.2
75%,0.4
max,2.6


In [None]:
bbl_evictions_svi_311.drop(columns=['average_year_eviction_count'], inplace=True)

In [None]:
# bbl_evictions_svi_311.drop(columns=['average_year_eviction_count_x'], inplace=True)
# bbl_evictions_svi_311.rename(columns={'average_year_eviction_count'})

In [None]:
# groupby
evictions_per_building = bbl_evictions_svi_311.groupby('bin').size().reset_index(name='total_evictions')
bin_5165243 = evictions_per_building[evictions_per_building['bin'] == 5165243]
bin_5165243

Unnamed: 0,bin,total_evictions
4365,5165243,4


In [None]:
evictions_per_building['average_year_eviction_count'] = evictions_per_building['total_evictions']/5
bbl_evictions_svi_311 = bbl_evictions_svi_311.merge(evictions_per_building[['bin', 'average_year_eviction_count']], on='bin', how='left')
# bbl_evictions_svi_311.sort_values('average_year_eviction_count', ascending=False)[['average_year_eviction_count', 'unitsres']].head()
# # good
bbl_evictions_svi_311.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count,average_year_eviction_count
0,004123/20_209969,2032140141,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0,0.066667,0.2
1,0050153/20_106030,4031560133,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0,0.001105,0.2
2,0052002/19_101926,3051370021,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.93,0.4536,0.9639,0.9692,0.922,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.3,0.6
3,0057757/18_100889,3011850034,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.933,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low),0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,12.0,0.0,17.0,0.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,45.0,0.016667,0.8
4,0061902/19_117253,4033220043,0061902/19,117253,83-33 118TH STREET,5N,2020-02-14,QUEENS,11415,Not an Ejectment,Possession,40.706235,-73.834603,9.0,29.0,134.0,4079390,Kew Gardens,2020,2020-02,POINT (-73.834603 40.706235),1979.0,D1,6.0,79.0,CIAMPA METROPOLITAN CO,72147.0,post-war,elevator,False,mid-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",21-100 units,False,very large,Q4 (largest 25%),1970-1979,11415,20315.0,0.7661,0.5573,0.898,0.9396,0.8761,14.6,5.6,11.8,4.7,17.0,18.0,10.9,7.5,44.3,8.5,32.3,6.7,22.9,22.3,0.2,0.0,3.4,2.1,57.7,42.3,False,Q1 (Low),0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,9.0,0.0,19.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0,38.0,0.005063,0.4


In [None]:
bbl_evictions_svi_311.sort_values('average_year_eviction_count', ascending=False)[['average_year_eviction_count', 'unitsres']].head()

Unnamed: 0,average_year_eviction_count,unitsres
3963,1.2,360.0
2936,1.2,1654.0
2937,1.2,1.0
1788,1.2,385.0
2893,1.2,68.0


In [None]:
bbl_evictions_svi_311.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,5386.0
mean,0.300854
std,0.180794
min,0.2
25%,0.2
50%,0.2
75%,0.4
max,1.2


In [None]:
# Stop

# **average_year_eviction_unit_count**

In [None]:
# get the correct average_year_eviction_unit_count
bbl_evictions_svi_311['average_year_eviction_unit_count'] = bbl_evictions_svi_311.average_year_eviction_count / bbl_evictions_svi_311.unitsres
bbl_evictions_svi_311[['bin', 'average_year_eviction_unit_count']].sort_values('average_year_eviction_unit_count', ascending=False)
# good

Unnamed: 0,bin,average_year_eviction_unit_count
2939,3033141,1.200000
2942,3033141,1.200000
719,5112952,1.200000
675,5112952,1.200000
5338,5112952,1.200000
...,...,...
502,2095392,0.000018
499,2095387,0.000018
343,2096755,0.000018
2948,2096810,0.000018


In [None]:
bbl_evictions_svi_311[['bin', 'average_year_eviction_unit_count']]

Unnamed: 0,bin,average_year_eviction_unit_count
0,2113173,0.066667
1,4074666,0.001105
2,3117969,0.300000
3,3029673,0.016667
4,4079390,0.005063
...,...,...
5381,5108502,0.001322
5382,5108502,0.001322
5383,5063024,0.012500
5384,5108650,0.000833


In [None]:
bbl_evictions_svi_311.columns

Index(['primary_key', 'bbl', 'court_index_number', 'docket_number',
       'eviction_address', 'eviction_apartment_number', 'executed_date',
       'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'nta', 'year', 'month_year', 'geometry',
       'yearbuilt', 'bldgclass', 'numfloors', 'unitsres', 'ownername',
       'bldgarea', 'building_type', 'building_category', 'is_condo',
       'floor_category', 'rent_era', 'architectural_style', 'economic_period',
       'residential_units_category', 'is_llc', 'building_size_category',
       'size_quartile', 'decade', 'fips', 'e_totpop', 'rpl_theme1',
       'rpl_theme2', 'rpl_theme3', 'rpl_theme4', 'rpl_themes', 'ep_pov150',
       'ep_unemp', 'ep_nohsdp', 'ep_uninsur', 'ep_age65', 'ep_age17',
       'ep_disabl', 'ep_limeng', 'ep_noveh', 'ep_crowd', 'ep_hburd', 'ep_afam',
       'ep_hisp', 'ep_asian', 'ep_aian', 'ep_nhpi', 'ep_twom

# **average_year_eviction_nta_count**

In [None]:
evictions_per_nta = bbl_evictions_svi_311['nta'].value_counts().reset_index()
evictions_per_nta.columns = ['nta', 'total_evictions']
evictions_per_nta

Unnamed: 0,nta,total_evictions
0,Crown Heights North,138
1,Flatbush,114
2,Prospect Lefferts Gardens-Wingate,114
3,Williamsbridge-Olinville,109
4,Bedford Park-Fordham North,104
...,...,...
180,Stuyvesant Town-Cooper Village,2
181,Lindenwood-Howard Beach,1
182,Glen Oaks-Floral Park-New Hyde Park,1
183,Auburndale,1


In [None]:
evictions_per_nta.sort_values('total_evictions', ascending=False).head()

Unnamed: 0,nta,total_evictions
0,Crown Heights North,138
1,Flatbush,114
2,Prospect Lefferts Gardens-Wingate,114
3,Williamsbridge-Olinville,109
4,Bedford Park-Fordham North,104


In [None]:
population_per_nta = bbl_evictions_svi.drop_duplicates('nta')[['nta', 'e_totpop']]
population_per_nta.shape

(186, 2)

In [None]:
nta_rates = pd.merge(
    evictions_per_nta,
    population_per_nta,
    on='nta',
    how='left'
)
nta_rates

Unnamed: 0,nta,total_evictions,e_totpop
0,Crown Heights North,138,83125.0
1,Flatbush,114,90245.0
2,Prospect Lefferts Gardens-Wingate,114,58476.0
3,Williamsbridge-Olinville,109,98713.0
4,Bedford Park-Fordham North,104,82678.0
...,...,...,...
180,Stuyvesant Town-Cooper Village,2,32410.0
181,Lindenwood-Howard Beach,1,30374.0
182,Glen Oaks-Floral Park-New Hyde Park,1,19733.0
183,Auburndale,1,46424.0


In [None]:
num_years = 3
nta_rates['average_year_eviction_nta_count'] = (
    nta_rates['total_evictions'] / num_years
)
nta_rates = nta_rates[['nta', 'average_year_eviction_nta_count']]
nta_rates

Unnamed: 0,nta,average_year_eviction_nta_count
0,Crown Heights North,46.000000
1,Flatbush,38.000000
2,Prospect Lefferts Gardens-Wingate,38.000000
3,Williamsbridge-Olinville,36.333333
4,Bedford Park-Fordham North,34.666667
...,...,...
180,Stuyvesant Town-Cooper Village,0.666667
181,Lindenwood-Howard Beach,0.333333
182,Glen Oaks-Floral Park-New Hyde Park,0.333333
183,Auburndale,0.333333


In [None]:
bbl_evictions_svi_311 = pd.merge(
    bbl_evictions_svi_311,
    nta_rates,
    on='nta',
    how='left'
)

In [None]:
bbl_evictions_svi_311[['nta', 'average_year_eviction_nta_count']].sort_values('average_year_eviction_nta_count', ascending=False)

Unnamed: 0,nta,average_year_eviction_nta_count
674,Crown Heights North,46.000000
595,Crown Heights North,46.000000
3196,Crown Heights North,46.000000
4216,Crown Heights North,46.000000
4219,Crown Heights North,46.000000
...,...,...
2515,park-cemetery-etc-Bronx,0.666667
2995,Glen Oaks-Floral Park-New Hyde Park,0.333333
3074,Lindenwood-Howard Beach,0.333333
3370,Auburndale,0.333333


In [None]:
bbl_evictions_svi_311.shape
# 93 is correct

(5386, 93)

# **Step 4: Save the final bbl_evictions_svi_311_merged dataset to the cloud for later use.**

### This should be considered a thoroughly cleaned merged df that's good for any analysis with no nans or duplicates.

In [None]:
bbl_evictions_svi_311.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count,average_year_eviction_count,average_year_eviction_nta_count
0,004123/20_209969,2032140141,004123/20,209969,2541 A GRAND AVE,ROOM 3B,2022-08-22,BRONX,10468,Not an Ejectment,Possession,40.865396,-73.901317,7.0,14.0,265.0,2113173,Kingsbridge Heights,2022,2022-08,POINT (-73.901317 40.865396),2004.0,C0,3.0,3.0,MONJU SARKER,3420.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",3-5 units,False,medium-small,Q4 (largest 25%),2000-2009,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,0.0,0.0,0.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,1.0,14.0,0.066667,0.2,18.666667
1,0050153/20_106030,4031560133,0050153/20,106030,98-05 67TH AVENUE,12F,2022-04-14,QUEENS,11375,Not an Ejectment,Possession,40.724241,-73.855552,6.0,29.0,71306.0,4074666,Forest Hills,2022,2022-04,POINT (-73.855552 40.724241),1960.0,D3,13.0,181.0,MARSEILLES LEASING LIMITED PARTNERSHIP,177710.0,post-war,elevator,False,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,11375,75212.0,0.4759,0.5698,0.8789,0.8057,0.7322,12.0,4.8,6.1,3.7,20.4,18.0,10.5,7.9,41.9,5.8,25.4,2.7,16.4,28.5,0.1,0.0,4.6,0.7,53.0,47.0,False,Q1 (Low),0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,62.0,0.0,34.0,0.0,0.0,4.0,1.0,0.0,0.0,2.0,5.0,112.0,0.001105,0.2,6.0
2,0052002/19_101926,3051370021,0052002/19,101926,199 VERONICA PLACE,1ST FLOOR,2020-03-02,BROOKLYN,11226,Not an Ejectment,Possession,40.645404,-73.952578,17.0,40.0,792.0,3117969,Erasmus,2020,2020-03,POINT (-73.952578 40.645404),1920.0,B3,2.0,2.0,"AANS, LLC.",1496.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q2 (25-50%),1920-1929,11226,101053.0,0.93,0.4536,0.9639,0.9692,0.922,23.7,5.9,13.9,9.1,13.1,18.7,6.7,5.6,66.1,10.0,39.2,63.2,14.9,3.2,0.3,0.0,4.1,0.7,86.3,13.7,False,Q2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.3,0.6,23.0
3,0057757/18_100889,3011850034,0057757/18,100889,302 EASTERN PARKWAY,4B,2020-02-03,BROOKLYN,11225,Not an Ejectment,Possession,40.670832,-73.958843,9.0,35.0,213.0,3029673,Crown Heights South,2020,2020-02,POINT (-73.958843 40.670832),1923.0,D1,6.0,48.0,302 EASTERN CORP,42984.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,False,very large,Q4 (largest 25%),1920-1929,11225,58476.0,0.8905,0.3157,0.933,0.8342,0.8538,23.1,6.6,11.5,5.9,15.3,16.7,9.6,2.2,66.2,6.9,37.3,53.7,10.8,3.3,0.0,0.0,3.9,0.9,72.6,27.4,False,Q1 (Low),0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,12.0,0.0,17.0,0.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,45.0,0.016667,0.8,23.0
4,0061902/19_117253,4033220043,0061902/19,117253,83-33 118TH STREET,5N,2020-02-14,QUEENS,11415,Not an Ejectment,Possession,40.706235,-73.834603,9.0,29.0,134.0,4079390,Kew Gardens,2020,2020-02,POINT (-73.834603 40.706235),1979.0,D1,6.0,79.0,CIAMPA METROPOLITAN CO,72147.0,post-war,elevator,False,mid-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",21-100 units,False,very large,Q4 (largest 25%),1970-1979,11415,20315.0,0.7661,0.5573,0.898,0.9396,0.8761,14.6,5.6,11.8,4.7,17.0,18.0,10.9,7.5,44.3,8.5,32.3,6.7,22.9,22.3,0.2,0.0,3.4,2.1,57.7,42.3,False,Q1 (Low),0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,9.0,0.0,19.0,0.0,1.0,0.0,0.0,0.0,0.0,6.0,1.0,38.0,0.005063,0.4,6.0


In [None]:
bbl_evictions_svi_311.to_csv('/content/drive/My Drive/X999/bbl_evictions_311_svi_covid.csv', index=False)
# good, not too big, with all the necessary information
# great for analysis.
# if only for retrival purposes, we could have kept some of the rows that had nans for completeness.