# **Introduction**

In this notebook, we did the final merge of all relevant data, including BBL, evictions, SVI scores, and 311 complaints. We first got the already merged bbl_evictions_svi dataset and get rid of the nans for analysis (the previously version had nan for retrival purpose if we find it necesssary later). Then we combined all the 311 complaints from normal times and cleaned nans. We then groupby bbl and categories of the complaint data and reset them to a wide pivot table. Finally, we merged the pivot table with bbl_evictions_svi df to arrive at the final mega merged and cleaned df.

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import datetime as dt
import matplotlib
import matplotlib.pyplot as plt
import os
import io
import geopandas as gpd
import seaborn as sns

# suppress warning
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', None)
# display all columns

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# **Step 1: get the bbl evictions svi merged data**

In [4]:
file_path1 = '/content/drive/My Drive/X999/merged_df_clean_normal_times.csv'

In [5]:
bbl_evictions_svi = pd.read_csv(file_path1)

In [6]:
bbl_evictions_svi.shape

(75010, 70)

In [12]:
bbl_evictions_svi[['bin', 'average_year_eviction_count']].sort_values('average_year_eviction_count', ascending=False).head()

Unnamed: 0,bin,average_year_eviction_count
67828,3000000,35.6
11805,3000000,35.6
67810,3000000,35.6
69612,3000000,35.6
69611,3000000,35.6


In [7]:
nan_counts = bbl_evictions_svi.isna().sum()
columns_with_nans = nan_counts[nan_counts > 0]
columns_with_nans

Unnamed: 0,0
yearbuilt,4128
bldgclass,4128
numfloors,4128
unitsres,4128
ownername,4128
bldgarea,4128
building_type,4128
building_category,4128
is_condo,4128
floor_category,4128


In [8]:
nan_percentage = (bbl_evictions_svi.isna().sum() / len(bbl_evictions_svi)) * 100
nan_percentage = nan_percentage[nan_percentage > 0]
nan_percentage = nan_percentage.sort_values(ascending=False)
nan_percentage

Unnamed: 0,0
yearbuilt,5.503266
bldgclass,5.503266
numfloors,5.503266
unitsres,5.503266
ownername,5.503266
bldgarea,5.503266
building_type,5.503266
building_category,5.503266
is_condo,5.503266
floor_category,5.503266


## **There is really not much to do with these nan values, as they simply cannot be imputed with high confidence. For purely retrival purpose, I think we can keep the nans. They ocurred because these bbls in the eviction dataset could not find their matches in the bbl dataset. But for any other analysis (what we mainly care about here), we will remove them.**

In [9]:
bbl_evictions_svi = bbl_evictions_svi.dropna()
bbl_evictions_svi.shape, 75010 - 4128, f'{4128 / 75010*100:.2f} % removed'

((70882, 70), 70882, '5.50 % removed')

In [10]:
bbl_evictions_svi.isna().sum().sum()

np.int64(0)

In [11]:
bbl_evictions_svi.columns, bbl_evictions_svi.shape

(Index(['primary_key', 'court_index_number', 'docket_number',
        'eviction_address', 'eviction_apartment_number', 'executed_date',
        'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
        'latitude', 'longitude', 'community_board', 'council_district',
        'census_tract', 'bin', 'bbl', 'nta', 'year', 'month_year', 'geometry',
        'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
        'unitsres', 'ownername', 'bldgarea', 'building_type',
        'building_category', 'is_condo', 'floor_category', 'rent_era',
        'architectural_style', 'economic_period', 'residential_units_category',
        'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
        'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
        'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
        'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
        'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp

# **Step2: get the combined 311 complaints data**

In [15]:
saved_2017 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2017_reduced.csv"
saved_2018 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2018_reduced.csv"
saved_2019 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2019_reduced.csv"
saved_2020 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2020_reduced.csv"
saved_2021 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2021_reduced.csv"
saved_2022 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2022_reduced.csv"
saved_2023 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2023_reduced.csv"
saved_2024 = "/content/drive/My Drive/X999/311_different_years/filtered_df_2024_reduced.csv"

In [16]:
df_2017 = pd.read_csv(saved_2017)
df_2018 = pd.read_csv(saved_2018)
df_2019 = pd.read_csv(saved_2019)
df_2020 = pd.read_csv(saved_2020)
df_2021 = pd.read_csv(saved_2021)
df_2022 = pd.read_csv(saved_2022)
df_2023 = pd.read_csv(saved_2023)
df_2024 = pd.read_csv(saved_2024)

In [17]:
normal_times_311_df = pd.concat([df_2017, df_2018, df_2019, df_2023, df_2024])

In [18]:
normal_times_311_df.head()

Unnamed: 0,unique_key,created_date,closed_date,complaint_type,incident_zip,incident_address,bbl,borough,latitude,longitude
0,38070156,2017-12-31 23:59:35,2018-01-04 19:27:02,HEAT/HOT WATER,10030.0,181 WEST 135 STREET,1019200000.0,MANHATTAN,40.815127,-73.943252
1,38067146,2017-12-31 23:59:34,2018-01-01 00:57:19,Noise - Residential,10035.0,2048 MADISON AVENUE,1017540000.0,MANHATTAN,40.808655,-73.938532
2,38066214,2017-12-31 23:59:15,2018-01-01 02:48:23,Noise - Residential,10466.0,1902 NEREID AVENUE,2050540000.0,BRONX,40.8987,-73.848528
3,38067041,2017-12-31 23:58:38,2018-01-01 02:53:28,Noise - Street/Sidewalk,11230.0,1201 AVENUE H,3066870000.0,BROOKLYN,40.629675,-73.964939
4,38068229,2017-12-31 23:58:33,2018-01-08 13:30:58,HEAT/HOT WATER,11226.0,70 LINDEN BOULEVARD,3050860000.0,BROOKLYN,40.652289,-73.956328


In [19]:
normal_times_311_df.columns, normal_times_311_df.shape

(Index(['unique_key', 'created_date', 'closed_date', 'complaint_type',
        'incident_zip', 'incident_address', 'bbl', 'borough', 'latitude',
        'longitude'],
       dtype='object'),
 (6036232, 10))

In [20]:
normal_times_311_df.bbl = normal_times_311_df.bbl.astype('int64')

In [21]:
normal_times_311_df.isna().sum().sum(), normal_times_311_df.duplicated().sum()

(np.int64(2028), np.int64(0))

In [22]:
normal_times_311_df.isna().sum()

Unnamed: 0,0
unique_key,0
created_date,0
closed_date,0
complaint_type,0
incident_zip,633
incident_address,223
bbl,0
borough,0
latitude,586
longitude,586


## **In this case, it makes sense to just fillna with string 'unknown' or integer "0" depending on the columns, because these columns are not that essential, once they are merged with the evictions_bbl_svi data, as these columns with nans will be replaced by the ones from the main table. We can drop these columns afterwards if they would cause problems.**

In [23]:
normal_times_311_df['incident_address'] = normal_times_311_df['incident_address'].fillna('unknown')

In [24]:
# other_columns = ['incident_zip', 'latitude', 'longitude']
normal_times_311_df['incident_zip'] = normal_times_311_df['incident_zip'].fillna(0)
normal_times_311_df['latitude'] = normal_times_311_df['latitude'].fillna(0)
normal_times_311_df['longitude'] = normal_times_311_df['longitude'].fillna(0)

In [25]:
normal_times_311_df.shape, normal_times_311_df.isna().sum().sum(), normal_times_311_df.duplicated().sum()

((6036232, 10), np.int64(0), np.int64(0))

# **Step 3: merge bbl_evictions_svi with 311 compalints data.**

### It turns out, we do need a  **pivot table**, but need to groupby first to make the merge process more seamless. Doing so also helps us ignore the nan issues we just had in the above steps as we totally ignore the columns that had troubled data

In [26]:
bbl_evictions_svi.columns, bbl_evictions_svi.shape

(Index(['primary_key', 'court_index_number', 'docket_number',
        'eviction_address', 'eviction_apartment_number', 'executed_date',
        'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
        'latitude', 'longitude', 'community_board', 'council_district',
        'census_tract', 'bin', 'bbl', 'nta', 'year', 'month_year', 'geometry',
        'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
        'unitsres', 'ownername', 'bldgarea', 'building_type',
        'building_category', 'is_condo', 'floor_category', 'rent_era',
        'architectural_style', 'economic_period', 'residential_units_category',
        'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
        'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
        'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
        'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
        'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp

In [27]:
normal_times_311_df.columns, normal_times_311_df.shape, bbl_evictions_svi.columns, bbl_evictions_svi.shape

(Index(['unique_key', 'created_date', 'closed_date', 'complaint_type',
        'incident_zip', 'incident_address', 'bbl', 'borough', 'latitude',
        'longitude'],
       dtype='object'),
 (6036232, 10),
 Index(['primary_key', 'court_index_number', 'docket_number',
        'eviction_address', 'eviction_apartment_number', 'executed_date',
        'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
        'latitude', 'longitude', 'community_board', 'council_district',
        'census_tract', 'bin', 'bbl', 'nta', 'year', 'month_year', 'geometry',
        'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
        'unitsres', 'ownername', 'bldgarea', 'building_type',
        'building_category', 'is_condo', 'floor_category', 'rent_era',
        'architectural_style', 'economic_period', 'residential_units_category',
        'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
        'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'r

In [28]:
court_bbl_map = bbl_evictions_svi[['primary_key', 'bbl']].drop_duplicates()
court_bbl_map.shape
# there are actually no duplicates, 70882, good

(70882, 2)

In [29]:
def categorize_complaint(complaint_type):
    complaint = complaint_type.lower().strip()

    # building systems and utilities stuff
    if 'heat' in complaint or 'hot water' in complaint:
        return 'heat_hot_water'
    elif any(term in complaint for term in ['water leak', 'plumbing', 'sewage']):
        return 'plumbing_issues'
    elif 'electric' in complaint:
        return 'electrical_issues'
    elif 'elevator' in complaint:
        return 'elevator_issues'

    # building structure and maintenance
    elif 'door' in complaint or 'window' in complaint:
        return 'doors_windows'
    elif any(term in complaint for term in ['paint', 'plaster', 'mold']):
        return 'walls_ceilings'
    elif 'floor' in complaint or 'stair' in complaint:
        return 'floors_stairs'
    elif 'outside building' in complaint:
        return 'building_exterior'
    elif 'appliance' in complaint:
        return 'appliances'

    # health and environmental impact
    elif 'unsanitary' in complaint or 'condition' in complaint:
        return 'sanitation_issues'
    elif any(pest in complaint for pest in ['rodent', 'mosquito', 'bee', 'wasp', 'pigeon']):
        return 'pest_issues'
    elif 'air' in complaint or 'asbestos' in complaint or 'smoking' in complaint:
        return 'air_quality'

    # noise (all noise complaints together)
    elif 'noise' in complaint:
        return 'noise_complaints'

    # public space influences and nuances
    elif 'homeless' in complaint or 'encampment' in complaint:
        return 'homeless_issues'
    elif 'graffiti' in complaint or 'advertisement' in complaint:
        return 'graffiti_posting'
    elif any(nuisance in complaint for nuisance in ['disorderly', 'panhandling', 'drinking', 'urinating', 'fireworks']):
        return 'public_nuisance'

    # living safety and services
    elif 'safety' in complaint:
        return 'safety_concerns'
    elif 'animal' in complaint or 'abuse' in complaint:
        return 'animal_issues'
    elif 'police' in complaint:
        return 'police_matters'

    # miscellaneous
    elif 'general' in complaint:
        return 'general_complaints'
    else:
        return 'other_issues'

## **We replaced real complaint types with categories to reduce the number of columns needed for a merged table. First, we re-group the complaint type and assign the counts to each category. Then we use a pivot table to show all the categries' names and counts. Then, we merge with the bbl_evictions_svi with the categries as columns so that the count of each type of complaints associated with each bbl will be preserved, and the size would be smaller (than if we didn't categorize) and easier for merge.**

In [30]:
normal_times_311_df['complaint_category'] = normal_times_311_df['complaint_type'].apply(categorize_complaint)

In [31]:
normal_times_311_df.shape
# add a new column to label the exact compalint type. Later we will use the wide form to expand all the values in this
# column and map them onto the column to form a pivot table

(6036232, 11)

In [32]:
normal_times_311_df.head()

Unnamed: 0,unique_key,created_date,closed_date,complaint_type,incident_zip,incident_address,bbl,borough,latitude,longitude,complaint_category
0,38070156,2017-12-31 23:59:35,2018-01-04 19:27:02,HEAT/HOT WATER,10030.0,181 WEST 135 STREET,1019200007,MANHATTAN,40.815127,-73.943252,heat_hot_water
1,38067146,2017-12-31 23:59:34,2018-01-01 00:57:19,Noise - Residential,10035.0,2048 MADISON AVENUE,1017540155,MANHATTAN,40.808655,-73.938532,noise_complaints
2,38066214,2017-12-31 23:59:15,2018-01-01 02:48:23,Noise - Residential,10466.0,1902 NEREID AVENUE,2050540041,BRONX,40.8987,-73.848528,noise_complaints
3,38067041,2017-12-31 23:58:38,2018-01-01 02:53:28,Noise - Street/Sidewalk,11230.0,1201 AVENUE H,3066870049,BROOKLYN,40.629675,-73.964939,noise_complaints
4,38068229,2017-12-31 23:58:33,2018-01-08 13:30:58,HEAT/HOT WATER,11226.0,70 LINDEN BOULEVARD,3050860041,BROOKLYN,40.652289,-73.956328,heat_hot_water


In [33]:
normal_times_311_df.isna().sum().sum(), normal_times_311_df.duplicated().sum()

(np.int64(0), np.int64(0))

In [34]:
# normal_times_311_df.drop_duplicates()

In [35]:
normal_times_311_df.shape
# no duplicates

(6036232, 11)

In [36]:
normal_times_311_df.columns

Index(['unique_key', 'created_date', 'closed_date', 'complaint_type',
       'incident_zip', 'incident_address', 'bbl', 'borough', 'latitude',
       'longitude', 'complaint_category'],
      dtype='object')

In [37]:
bbl_evictions_svi.columns

Index(['primary_key', 'court_index_number', 'docket_number',
       'eviction_address', 'eviction_apartment_number', 'executed_date',
       'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'bbl', 'nta', 'year', 'month_year', 'geometry',
       'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
       'unitsres', 'ownername', 'bldgarea', 'building_type',
       'building_category', 'is_condo', 'floor_category', 'rent_era',
       'architectural_style', 'economic_period', 'residential_units_category',
       'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
       'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
       'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
       'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
       'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp', 'ep_asian',

In [38]:
bbl_evictions_svi.bbl.dtype

dtype('int64')

In [39]:
# count each category for each bbl
# group the complaints by bbl and categories and then count them
bbl_category_counts = normal_times_311_df.groupby(['bbl', 'complaint_category']).size().reset_index(name='count')

In [40]:
# complaints_pivot = bbl_category_counts.unstack(fill_value=0)

In [41]:
bbl_category_counts.bbl = bbl_category_counts.bbl.astype('int64')

In [42]:
bbl_category_counts

Unnamed: 0,bbl,complaint_category,count
0,0,animal_issues,2
1,0,appliances,27
2,0,doors_windows,54
3,0,electrical_issues,22
4,0,elevator_issues,56
...,...,...,...
872716,5200429999,noise_complaints,5
872717,5270000501,plumbing_issues,2
872718,5270000508,plumbing_issues,2
872719,5270000511,noise_complaints,1


## **It's necessary to use a bit pivot table transformation here, because we want this table to have a "wide" format so that:**

- each row represents a single bbl
- each complaint category becomes its own column
- the values show the count for each category

In [43]:
# use a bit pivot table here, to make this a wide format with categories as columns
# pivot to have categories as columns
bbl_complaints_wide = bbl_category_counts.pivot(
    index='bbl',
    columns='complaint_category',
    values='count'
).fillna(0).reset_index()

In [44]:
bbl_complaints_wide.isna().sum().sum(), bbl_complaints_wide.duplicated().sum()

(np.int64(0), np.int64(0))

In [45]:
bbl_evictions_svi.bbl.nunique(), normal_times_311_df.bbl.nunique(), bbl_complaints_wide.bbl.nunique(), bbl_complaints_wide.shape

(31815, 342961, 342961, (342961, 22))

In [46]:
bbl_complaints_wide
# correct shape, (342961, 22)

complaint_category,bbl,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings
0,0,0.0,2.0,27.0,0.0,54.0,22.0,56.0,18.0,39.0,1.0,241.0,4.0,431.0,0.0,45.0,170.0,6.0,1.0,89.0,72.0,57.0
1,144969020,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,1000010010,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,22.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
3,1000010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1000020001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
342956,5200429999,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
342957,5270000501,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0
342958,5270000508,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0
342959,5270000511,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [47]:
len(bbl_complaints_wide.columns) - 1

21

In [48]:
all_categories = [
    'heat_hot_water', 'plumbing_issues', 'electrical_issues', 'elevator_issues',
    'doors_windows', 'walls_ceilings', 'floors_stairs', 'building_exterior',
    'appliances', 'sanitation_issues', 'pest_issues', 'air_quality',
    'noise_complaints', 'homeless_issues', 'graffiti_posting', 'public_nuisance',
    'safety_concerns', 'animal_issues', 'police_matters', 'general_complaints',
    'other_issues'
]
# complete
len(all_categories)

21

In [49]:
# add a total column
bbl_complaints_wide['total_complaints'] = bbl_complaints_wide[all_categories].sum(axis=1)

In [50]:
bbl_complaints_wide
# so far, we do have the 311 complaint part figure out

complaint_category,bbl,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,0,0.0,2.0,27.0,0.0,54.0,22.0,56.0,18.0,39.0,1.0,241.0,4.0,431.0,0.0,45.0,170.0,6.0,1.0,89.0,72.0,57.0,1335.0
1,144969020,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0
2,1000010010,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,22.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,25.0
3,1000010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
4,1000020001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
342956,5200429999,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
342957,5270000501,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,2.0
342958,5270000508,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,2.0
342959,5270000511,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [51]:
bbl_evictions_svi.bbl.dtype, bbl_complaints_wide.bbl.dtype

(dtype('int64'), dtype('int64'))

In [52]:
bbl_complaints_wide.shape

(342961, 23)

In [53]:
bbl_evictions_svi.head()

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group
0,*308072/22_5865,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,3037420029,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high
1,*313639/23_5202,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,3057940012,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high
2,*324973/22_5308,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,3057820030,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6,1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high
3,*53336/16_170279,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,2032510420,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),0.8,1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,medium-high
4,*5990/17_2703,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,2025770038,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1.6,1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929,10455,44380.0,0.9971,0.9909,0.9972,0.9499,0.9971,48.5,12.5,32.1,9.5,10.1,28.1,19.5,17.9,75.1,14.5,51.9,21.1,74.1,1.1,0.0,0.0,1.4,1.0,98.6,1.4,False,Q4 (High),high


In [54]:
bbl_evictions_svi.shape

(70882, 70)

In [55]:
bbl_evictions_svi_311 = bbl_evictions_svi.merge(
    bbl_complaints_wide,
    on='bbl',
    how='left'
)
# the final merge with bbl, evictions, svi with 311 complaints

In [56]:
bbl_evictions_svi_311[['bin', 'average_year_eviction_count', 'total_complaints']].sort_values('average_year_eviction_count', ascending=False).head()

Unnamed: 0,bin,average_year_eviction_count,total_complaints
64230,3000000,35.6,8.0
11296,3000000,35.6,7.0
64217,3000000,35.6,69.0
65885,3000000,35.6,2.0
65884,3000000,35.6,2.0


In [57]:
bbl_evictions_svi_311.isna().sum()

Unnamed: 0,0
primary_key,0
court_index_number,0
docket_number,0
eviction_address,0
eviction_apartment_number,0
...,...
public_nuisance,4485
safety_concerns,4485
sanitation_issues,4485
walls_ceilings,4485


In [58]:
f"{4485/bbl_evictions_svi_311.shape[0]*100:.2f} % of the rows have nans"

'6.33 % of the rows have nans'

In [59]:
nan_counts = bbl_evictions_svi_311.isna().sum()
columns_with_nans = nan_counts[nan_counts > 0]
columns_with_nans

Unnamed: 0,0
air_quality,4485
animal_issues,4485
appliances,4485
building_exterior,4485
doors_windows,4485
electrical_issues,4485
elevator_issues,4485
floors_stairs,4485
general_complaints,4485
graffiti_posting,4485


## **In this case, it would make no sense to fill these nans, as it will only add more inaccuracies to the dataset. We will drop all the rows that have nans in them.**

In [60]:
bbl_evictions_svi_311 = bbl_evictions_svi_311.dropna()

In [61]:
bbl_evictions_svi_311.isna().sum().sum(), bbl_evictions_svi_311.duplicated().sum(), bbl_evictions_svi_311.shape

(np.int64(0), np.int64(0), (66397, 92))

In [62]:
bbl_evictions_svi_311

Unnamed: 0,primary_key,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,*308072/22_5865,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,3037420029,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.9140,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0
1,*313639/23_5202,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,3057940012,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0
2,*324973/22_5308,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,3057820030,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6,1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,10.0
3,*53336/16_170279,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.877190,-73.889569,7.0,11.0,409.0,2015444,2032510420,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),0.8,1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10468,81397.0,0.9954,0.9407,0.9870,0.9470,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,medium-high,6.0,0.0,0.0,0.0,9.0,0.0,2.0,5.0,5.0,0.0,23.0,0.0,145.0,0.0,2.0,41.0,0.0,0.0,1.0,5.0,4.0,248.0
4,*5990/17_2703,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.908810,1.0,8.0,35.0,2003900,2025770038,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1.6,1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929,10455,44380.0,0.9971,0.9909,0.9972,0.9499,0.9971,48.5,12.5,32.1,9.5,10.1,28.1,19.5,17.9,75.1,14.5,51.9,21.1,74.1,1.1,0.0,0.0,1.4,1.0,98.6,1.4,False,Q4 (High),high,0.0,5.0,8.0,0.0,21.0,8.0,34.0,10.0,9.0,0.0,89.0,0.0,78.0,0.0,5.0,41.0,1.0,0.0,3.0,31.0,13.0,356.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70873,R53634/16_76078,R53634/16,76078,185 ST.MARKS PLACE,4F,2017-04-19,STATEN ISLAND,10301,Not an Ejectment,Possession,40.645358,-74.080729,1.0,49.0,7.0,5108502,5000130008,West New Brighton-New Brighton-St. George,2017,2017-04,POINT (-74.080729 40.645358),3.8,1976.0,D3,20.0,454.0,NYC HOUSING DEVELOPMENT CORP.,524513.0,post-war,elevator,False,high-rise,"1970–1993, deregularization","1951–1980, the International Style, Alternative Modernism","1976–1990, fiscal crisis and recovery",100+ units,False,mega,Q4 (largest 25%),1970-1979,10301,40331.0,0.8784,0.7487,0.8992,0.9869,0.9329,20.4,7.1,13.7,5.2,15.6,20.7,11.6,6.5,25.5,8.1,32.2,19.9,26.3,7.6,0.7,0.0,3.7,0.3,58.6,41.4,False,Q2,medium-low,1.0,2.0,6.0,0.0,26.0,5.0,5.0,26.0,5.0,0.0,88.0,0.0,308.0,1.0,8.0,63.0,1.0,3.0,8.0,38.0,35.0,629.0
70874,R53669/18_96694,R53669/18,96694,495 JEWETT AVENUE,REAR APT.,2019-06-12,STATEN ISLAND,10302,Not an Ejectment,Possession,40.624827,-74.131808,1.0,49.0,151.0,5101137,5003540004,Westerleigh,2019,2019-06,POINT (-74.131808 40.624827),0.4,1988.0,S9,2.0,2.0,VICO HOLDING CORP,2593.0,post-war,primarily_res_with_mixed_use,False,low-rise,"1970–1993, deregularization","1981–2000, Post-Modernism","1976–1990, fiscal crisis and recovery",2-unit,False,small,Q3 (50-75%),1980-1989,10302,18567.0,0.9427,0.7846,0.9301,0.7692,0.9163,23.9,7.8,13.5,8.7,10.2,23.7,11.6,3.6,22.7,6.7,35.3,21.7,41.9,5.6,0.0,0.0,1.8,0.4,71.4,28.6,False,Q1 (Low),low,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,4.0,0.0,5.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,14.0
70875,R53669/18B_96725,R53669/18B,96725,495 JEWETT AVENUE,REAR APT,2019-06-12,STATEN ISLAND,10302,Not an Ejectment,Possession,40.624827,-74.131808,1.0,49.0,151.0,5101137,5003540004,Westerleigh,2019,2019-06,POINT (-74.131808 40.624827),0.4,1988.0,S9,2.0,2.0,VICO HOLDING CORP,2593.0,post-war,primarily_res_with_mixed_use,False,low-rise,"1970–1993, deregularization","1981–2000, Post-Modernism","1976–1990, fiscal crisis and recovery",2-unit,False,small,Q3 (50-75%),1980-1989,10302,18567.0,0.9427,0.7846,0.9301,0.7692,0.9163,23.9,7.8,13.5,8.7,10.2,23.7,11.6,3.6,22.7,6.7,35.3,21.7,41.9,5.6,0.0,0.0,1.8,0.4,71.4,28.6,False,Q1 (Low),low,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,4.0,0.0,5.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,14.0
70879,R53784/18_95054,R53784/18,95054,32 SKINNER LANE,3A,2019-06-24,STATEN ISLAND,10310,Not an Ejectment,Possession,40.639662,-74.116434,1.0,49.0,97.0,5108656,5001690001,West New Brighton-New Brighton-St. George,2019,2019-06,POINT (-74.116434 40.639662),0.2,2007.0,C1,3.0,240.0,MARKHAM GARDENS L.P.,236523.0,post-war,walk-up,False,low-rise,"1994–Present, vacancy decontrol","2001-present, New Architecture","1991–2008, modern economic growth",100+ units,False,mega,Q4 (largest 25%),2000-2009,10310,26239.0,0.7569,0.7972,0.8975,0.8991,0.8928,21.0,4.2,12.6,4.5,12.6,25.7,8.7,6.0,21.8,4.4,28.0,20.0,29.0,5.3,0.1,0.0,3.0,0.2,57.6,42.4,False,Q1 (Low),low,0.0,2.0,19.0,1.0,10.0,12.0,0.0,16.0,8.0,0.0,28.0,0.0,61.0,0.0,33.0,22.0,0.0,0.0,0.0,68.0,30.0,310.0


In [63]:
zero_bbl_count = (bbl_evictions_svi_311['bbl'] == 0).sum()
zero_bbl_count
# no bbl == 0 rows

np.int64(0)

In [64]:
# # see if these rows to see if there's a pattern
# zero_bbl_rows = bbl_evictions_svi_311[bbl_evictions_svi_311['bbl'] == 0]
# display(zero_bbl_rows.head())

In [65]:
all_columns = list(bbl_evictions_svi_311.columns),
# len(all_columns)
# all_columns
type(all_columns), len(all_columns[0]) # wierd, have to use list comprehension, as remove() does not work

(tuple, 92)

In [66]:
# bbl_evictions_svi_311

In [67]:
# the goal is to move "bbl" to the front of the dataframe
# all_columns = merged_with_complaints.columns.tolist()
# print(all_columns)
# if 'court_index_number' in all_columns:
#     print("yes, court_index_number")
#     all_columns.remove('court_index_number')
# if 'bbl' in all_columns:
#     print("yes, bbl")
#     all_columns.remove('bbl')
# all_columns
remaining_columns = [col for col in all_columns if col not in ['primary_key', 'bbl']]
remaining_columns = remaining_columns[0]
print(len(remaining_columns))
remaining_columns.remove('primary_key')
remaining_columns.remove('bbl')

92


In [68]:
len(remaining_columns)
# good

90

In [69]:
new_column_order = ['primary_key', 'bbl'] + remaining_columns

In [70]:
# new order in place
bbl_evictions_svi_311 = bbl_evictions_svi_311[new_column_order]

In [71]:
display(bbl_evictions_svi_311.head())
# amazing

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints
0,*308072/22_5865,3037420029,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0
1,*313639/23_5202,3057940012,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0
2,*324973/22_5308,3057820030,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6,1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,10.0
3,*53336/16_170279,2032510420,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),0.8,1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,medium-high,6.0,0.0,0.0,0.0,9.0,0.0,2.0,5.0,5.0,0.0,23.0,0.0,145.0,0.0,2.0,41.0,0.0,0.0,1.0,5.0,4.0,248.0
4,*5990/17_2703,2025770038,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1.6,1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929,10455,44380.0,0.9971,0.9909,0.9972,0.9499,0.9971,48.5,12.5,32.1,9.5,10.1,28.1,19.5,17.9,75.1,14.5,51.9,21.1,74.1,1.1,0.0,0.0,1.4,1.0,98.6,1.4,False,Q4 (High),high,0.0,5.0,8.0,0.0,21.0,8.0,34.0,10.0,9.0,0.0,89.0,0.0,78.0,0.0,5.0,41.0,1.0,0.0,3.0,31.0,13.0,356.0


In [72]:
bbl_evictions_svi_311.shape

(66397, 92)

In [73]:
# remove rows with BBL = 0
bbl_evictions_svi_311 = bbl_evictions_svi_311[bbl_evictions_svi_311['bbl'] != 0] # good
len(bbl_evictions_svi_311)

66397

In [74]:
bbl_evictions_svi_311.isna().sum().sum(), bbl_evictions_svi_311.duplicated().sum() # all clean

(np.int64(0), np.int64(0))

In [75]:
bbl_evictions_svi_311.shape
# final shape

(66397, 92)

In [76]:
bbl_evictions_svi_311.info(), \
bbl_evictions_svi_311.shape

<class 'pandas.core.frame.DataFrame'>
Index: 66397 entries, 0 to 70880
Data columns (total 92 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   primary_key                  66397 non-null  object 
 1   bbl                          66397 non-null  int64  
 2   court_index_number           66397 non-null  object 
 3   docket_number                66397 non-null  int64  
 4   eviction_address             66397 non-null  object 
 5   eviction_apartment_number    66397 non-null  object 
 6   executed_date                66397 non-null  object 
 7   borough                      66397 non-null  object 
 8   zipcode                      66397 non-null  int64  
 9   ejectment                    66397 non-null  object 
 10  eviction/legal_possession    66397 non-null  object 
 11  latitude                     66397 non-null  float64
 12  longitude                    66397 non-null  float64
 13  community_board      

(None, (66397, 92))

In [77]:
complaint_cols = ['bbl'] + all_categories + ['total_complaints']
existing_cols = [col for col in complaint_cols if col in bbl_evictions_svi_311.columns]
existing_cols

['bbl',
 'heat_hot_water',
 'plumbing_issues',
 'electrical_issues',
 'elevator_issues',
 'doors_windows',
 'walls_ceilings',
 'floors_stairs',
 'building_exterior',
 'appliances',
 'sanitation_issues',
 'pest_issues',
 'air_quality',
 'noise_complaints',
 'homeless_issues',
 'graffiti_posting',
 'public_nuisance',
 'safety_concerns',
 'animal_issues',
 'police_matters',
 'general_complaints',
 'other_issues',
 'total_complaints']

In [78]:
# just take a look at the ones related to the 311 complaint part
display(bbl_evictions_svi_311[['primary_key'] + existing_cols].head())

Unnamed: 0,primary_key,bbl,heat_hot_water,plumbing_issues,electrical_issues,elevator_issues,doors_windows,walls_ceilings,floors_stairs,building_exterior,appliances,sanitation_issues,pest_issues,air_quality,noise_complaints,homeless_issues,graffiti_posting,public_nuisance,safety_concerns,animal_issues,police_matters,general_complaints,other_issues,total_complaints
0,*308072/22_5865,3037420029,3.0,3.0,2.0,0.0,1.0,5.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,19.0
1,*313639/23_5202,3057940012,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
2,*324973/22_5308,3057820030,2.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0
3,*53336/16_170279,2032510420,23.0,41.0,0.0,2.0,9.0,4.0,5.0,0.0,0.0,5.0,2.0,6.0,145.0,0.0,0.0,0.0,1.0,0.0,0.0,5.0,0.0,248.0
4,*5990/17_2703,2025770038,89.0,41.0,8.0,34.0,21.0,13.0,10.0,0.0,8.0,31.0,5.0,0.0,78.0,0.0,0.0,0.0,3.0,5.0,1.0,9.0,0.0,356.0


In [79]:
# count how many buildings have each type of complaint
buildings_with_complaints_clean = {col: (bbl_evictions_svi_311[col] > 0).sum() for col in existing_cols[1:]}
# sorted_counts = sorted(buildings_with_complaints.items(), key=lambda x: x[1], reverse=True)
# this is just a list
complaint_counts_df = pd.DataFrame(list(buildings_with_complaints_clean.items()),
                                  columns=['complaint_category', 'building_count'])

In [80]:
complaint_counts_df = complaint_counts_df.sort_values('building_count', ascending=False)
complaint_counts_df = complaint_counts_df.reset_index(drop=True)
complaint_counts_df

Unnamed: 0,complaint_category,building_count
0,total_complaints,66397
1,noise_complaints,58843
2,plumbing_issues,55621
3,heat_hot_water,55073
4,sanitation_issues,52460
5,doors_windows,46416
6,walls_ceilings,45833
7,general_complaints,40646
8,electrical_issues,39856
9,pest_issues,38441


In [81]:
# add ave evi per unit year for each building
bbl_evictions_svi_311['average_year_eviction_unit_count'] = bbl_evictions_svi_311.average_year_eviction_count / bbl_evictions_svi_311.unitsres
bbl_evictions_svi_311['average_year_eviction_unit_count'].head()

Unnamed: 0,average_year_eviction_unit_count
0,0.266667
1,0.3
2,0.15
3,0.002273
4,0.024615


In [82]:
bbl_evictions_svi_311[['primary_key', 'bin', 'nta', 'borough', 'eviction_address',
                       'average_year_eviction_count', 'unitsres',
                       'average_year_eviction_unit_count']].sort_values('average_year_eviction_unit_count', ascending=False).head()

Unnamed: 0,primary_key,bin,nta,borough,eviction_address,average_year_eviction_count,unitsres,average_year_eviction_unit_count
51181,78989/17Q_75847,4000000,South Jamaica,QUEENS,110-20 169TH STREET,17.8,1.0,17.8
35313,56293/18_16922,4000000,Bayside-Bayside Hills,QUEENS,40-15 217TH STREET,17.8,1.0,17.8
21325,319313/22_112656,3000000,Crown Heights North,BROOKLYN,1520 PROSPECT PLACE,35.6,3.0,11.866667
66923,K93840/18B_94324,3000000,Dyker Heights,BROOKLYN,1261 70TH STREET,35.6,3.0,11.866667
66922,K93840/18_94323,3000000,Dyker Heights,BROOKLYN,1261 70TH STREET,35.6,3.0,11.866667


In [83]:
bbl_evictions_svi_311.eviction_address.isna().sum()

np.int64(0)

In [84]:
bin_4000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000000]
bin_1000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 1000000]
bin_2000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 2000000]
bin_3000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 3000000]
bin_5000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 5000000]
bin_6000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 6000000]
bin_7000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 7000000]
bin_8000000 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 8000000]
bin_4000000.eviction_address.unique(), bin_1000000.eviction_address.unique(), bin_2000000.eviction_address.unique(), bin_3000000.eviction_address.unique(), \
bin_5000000.eviction_address.unique(), bin_6000000.eviction_address.unique(), bin_7000000.eviction_address.unique(), bin_8000000.eviction_address.unique(),

(array(['136-43 37TH AVENUE', '55-16 VAN  DOREN STREET 3RD FLOOR',
        '153-11 90TH AVENUE', '103-16 CORONA AVENUE', '148-37 88TH AVENUE',
        '19-14 20TH AVENUE', '176 WOODWARD AVENUE APARTMENT #129',
        '1401 BROADWAY', '31-19 37TH STREET', '197-03 48TH AVENUE F1',
        '40-15 217TH STREET', '95-17 50TH AVENUE     2ND FLOOR',
        '110-20 169TH STREET', '84-02 143RD STREET', '84-02 143 STREET'],
       dtype=object),
 array(['447-448 CENTRAL PARK  WEST', '517 WEST 134TH   STR EET',
        '100 WEST 131ST STREET APARTMENT 3C',
        '7 DEY STREET A/K/A 185 BROADWAY', '222 EAST 44TH STREET',
        '626 FIRST AVENUE', '172 WEST 127TH    ST REET',
        '172 WEST 127TH   STR EET',
        '407 LENOX AVENUE A/K/A 100 WEST 131 STREET',
        '112-114 WEST 116TH ST APT. 7', '111 VARICK STREET',
        '180 BROOME STREET APARTMENT 2312', '540 WEST 53RD STREET',
        '251 WEST 117TH STREET APT. 2-F', '172 WEST 127TH STREE T',
        '161 EAST 28TH STREET APT 3

In [85]:
# bin_1000000

In [86]:
problematic_ones = pd.concat([bin_1000000, bin_2000000, bin_3000000, bin_4000000, bin_5000000])
problematic_ones.columns

Index(['primary_key', 'bbl', 'court_index_number', 'docket_number',
       'eviction_address', 'eviction_apartment_number', 'executed_date',
       'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'nta', 'year', 'month_year', 'geometry',
       'average_year_eviction_count', 'yearbuilt', 'bldgclass', 'numfloors',
       'unitsres', 'ownername', 'bldgarea', 'building_type',
       'building_category', 'is_condo', 'floor_category', 'rent_era',
       'architectural_style', 'economic_period', 'residential_units_category',
       'is_llc', 'building_size_category', 'size_quartile', 'decade', 'fips',
       'e_totpop', 'rpl_theme1', 'rpl_theme2', 'rpl_theme3', 'rpl_theme4',
       'rpl_themes', 'ep_pov150', 'ep_unemp', 'ep_nohsdp', 'ep_uninsur',
       'ep_age65', 'ep_age17', 'ep_disabl', 'ep_limeng', 'ep_noveh',
       'ep_crowd', 'ep_hburd', 'ep_afam', 'ep_hisp', 'ep_asian',

In [87]:
unique_addresses = problematic_ones.eviction_address.unique()
unique_addresses, len(unique_addresses)

(array(['447-448 CENTRAL PARK  WEST', '517 WEST 134TH   STR EET',
        '100 WEST 131ST STREET APARTMENT 3C',
        '7 DEY STREET A/K/A 185 BROADWAY', '222 EAST 44TH STREET',
        '626 FIRST AVENUE', '172 WEST 127TH    ST REET',
        '172 WEST 127TH   STR EET',
        '407 LENOX AVENUE A/K/A 100 WEST 131 STREET',
        '112-114 WEST 116TH ST APT. 7', '111 VARICK STREET',
        '180 BROOME STREET APARTMENT 2312', '540 WEST 53RD STREET',
        '251 WEST 117TH STREET APT. 2-F', '172 WEST 127TH STREE T',
        '161 EAST 28TH STREET APT 3', '350 WEST 45TH STREET',
        '140 ESSEX STREET', '180 BROOME STREET', '1553 GLEBE AVENUE',
        '2028 DAVIDSON AVENUE', '1071 TINTON AVE', '1071 TINTON AVENUE',
        '486 EAST 165TH ST', '2885 MARION AVENUE', '735 CAULDWELL AVENUE',
        '3230 RADCLIFF AVENUE', '540 EAST 142ND STREET',
        '2302 MORRIS AVENUE A /K/A 2300-2302 MORRIS AVENUE',
        '540 EAST 142ND ST', '246 ECHO PLACE',
        '1395 NELSON AVENUE APT.

# **Important: Why there is address that has such high per unit count eviction per year? Because they share the same bin (source error), and we get the average eviction per year per building by grouping by the bins. Therefore, each of these addresses (different ones) has the total evictions grouped under them under the same bin (namely, 400000, 100000, 200000, 300000, 500000, 600000 etc)**

## **So is there any bin numbers for us to take an empty place to insert this duplicated bin but unique address into it??**

In [88]:
# bins_to_occupy = list(range(4000001, 4000153))
# bins_to_occupy[-5:], type(bins_to_occupy)

In [89]:
unique_bins = bbl_evictions_svi_311.bin.unique()
len(unique_bins), min(unique_bins), max(unique_bins)

(30230, np.int64(1000000), np.int64(5171959))

In [90]:
bins_to_occupy = list(range(5171959, 5171959+152))
bins_to_occupy[-5:], type(bins_to_occupy)

([5172106, 5172107, 5172108, 5172109, 5172110], list)

In [91]:
# bins_to_occupy = [
#     [b] if np.isscalar(b) else b
#     for b in bins_to_occupy
# ]
# type(bins_to_occupy)

In [92]:
# any(bin in unique_bins for bins in bins_to_occupy for bin in bins)

In [93]:
# # so is there any bin that is 4000001 for us to take an empty place to insert this duplicated bin but unique address into it??
# bin_4000001 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000001]
# bin_4000002 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000002]
# bin_4000003 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000003]
# bin_4000004 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000004]
# bin_4000005 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000005]
# bin_4000006 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000006]
# bin_4000007 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000007]
# bin_4000008 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000008]
# bin_4000009 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000009]
# bin_4000010 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000010]
# bin_4000011 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000011]
# bin_4000012 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000012]
# bin_4000013 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000013]
# bin_4000014 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000014]
# bin_4000015 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 4000015]
# bin_4000009.shape, bin_4000010.shape, bin_4000011.shape, bin_4000012.shape, bin_4000013.shape, bin_4000014.shape, bin_4000015.shape
# # so, yes there is, and therefore, we can manually move those address to those empty slots

In [94]:
# bin_4000001.shape, bin_4000002.shape, bin_4000003.shape, bin_4000004.shape, bin_4000005.shape, bin_4000006.shape, bin_4000007.shape, bin_4000008.shape,

In [95]:
problematic_ones.shape

(277, 93)

In [96]:
# use mapping to do this, so unique addresses can have a new bin
new_bins = range(5171959, 5171959 + 152)
# bin_mapping = (
#     problematic_ones.groupby('eviction_address')['bin']
#     .first()
#     .reset_index()
#     .assign(new_bin=new_bins)
#     .set_index('eviction_address')
#     ['new_bin']
#     .to_dict()
# )
unique_addresses = problematic_ones['eviction_address'].unique()
bin_mapping = {
    address: new_bin
    for address, new_bin in zip(unique_addresses, new_bins)
}

In [97]:
problematic_ones.shape

(277, 93)

In [98]:
bin_mapping
# mapped each address with a new unique bin

{'447-448 CENTRAL PARK  WEST': 5171959,
 '517 WEST 134TH   STR EET': 5171960,
 '100 WEST 131ST STREET APARTMENT 3C': 5171961,
 '7 DEY STREET A/K/A 185 BROADWAY': 5171962,
 '222 EAST 44TH STREET': 5171963,
 '626 FIRST AVENUE': 5171964,
 '172 WEST 127TH    ST REET': 5171965,
 '172 WEST 127TH   STR EET': 5171966,
 '407 LENOX AVENUE A/K/A 100 WEST 131 STREET': 5171967,
 '112-114 WEST 116TH ST APT. 7': 5171968,
 '111 VARICK STREET': 5171969,
 '180 BROOME STREET APARTMENT 2312': 5171970,
 '540 WEST 53RD STREET': 5171971,
 '251 WEST 117TH STREET APT. 2-F': 5171972,
 '172 WEST 127TH STREE T': 5171973,
 '161 EAST 28TH STREET APT 3': 5171974,
 '350 WEST 45TH STREET': 5171975,
 '140 ESSEX STREET': 5171976,
 '180 BROOME STREET': 5171977,
 '1553 GLEBE AVENUE': 5171978,
 '2028 DAVIDSON AVENUE': 5171979,
 '1071 TINTON AVE': 5171980,
 '1071 TINTON AVENUE': 5171981,
 '486 EAST 165TH ST': 5171982,
 '2885 MARION AVENUE': 5171983,
 '735 CAULDWELL AVENUE': 5171984,
 '3230 RADCLIFF AVENUE': 5171985,
 '540 E

In [102]:
bbl_evictions_svi[['bin', 'average_year_eviction_count']].sort_values('average_year_eviction_count', ascending=False).head()
# before mapping, the problematic ones. They didn't drastically change the trend or the big picture, but they artifacially
# inflated the number, and became evident when averaging over units.

Unnamed: 0,bin,average_year_eviction_count
67828,3000000,35.6
11805,3000000,35.6
67810,3000000,35.6
69612,3000000,35.6
69611,3000000,35.6


In [100]:
# stop

In [103]:
# bbl_evictions_svi_311['bin'] = bbl_evictions_svi_311['eviction_address'].map(bin_mapping)
# bbl_evictions_svi_311.shape, bbl_evictions_svi_311.bin.isna().sum().sum()
# 66397 - 66120
bbl_evictions_svi_311['bin'] = bbl_evictions_svi_311['bin'].where(
    ~bbl_evictions_svi_311['eviction_address'].isin(bin_mapping.keys()),
    bbl_evictions_svi_311['eviction_address'].map(bin_mapping)
)
# only changing the ones had errors so the good ones do not turn nan

In [104]:
bbl_evictions_svi_311.shape, bbl_evictions_svi_311.bin.isna().sum().sum()
# good

((66397, 93), np.int64(0))

In [105]:
bin_5172110 = bbl_evictions_svi_311[bbl_evictions_svi_311['bin'] == 5172110]
bin_5172110

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count
70726,R51363/17_81087,5000520133,R51363/17,81087,228 JERSEY STREET,A,2018-01-09,STATEN ISLAND,10301,Not an Ejectment,Possession,40.642178,-74.086624,1.0,49.0,81.0,5172110,West New Brighton-New Brighton-St. George,2018,2018-01,POINT (-74.086624 40.642178),2.6,1984.0,C9,1.0,20.0,"FAIRWAY RICHMOND HOUSING DEVEL FUND CO. , INC",9450.0,post-war,walk-up,False,low-rise,"1970–1993, deregularization","1981–2000, Post-Modernism","1976–1990, fiscal crisis and recovery",6-20 units,False,medium,Q4 (largest 25%),1980-1989,10301,40331.0,0.8784,0.7487,0.8992,0.9869,0.9329,20.4,7.1,13.7,5.2,15.6,20.7,11.6,6.5,25.5,8.1,32.2,19.9,26.3,7.6,0.7,0.0,3.7,0.3,58.6,41.4,False,Q2,medium-low,0.0,0.0,10.0,0.0,4.0,5.0,0.0,5.0,2.0,0.0,8.0,0.0,1.0,0.0,0.0,7.0,0.0,0.0,1.0,17.0,7.0,67.0,0.13


In [106]:
bbl_evictions_svi_311.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count
0,*308072/22_5865,3037420029,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,East New York,2024,2024-12,POINT (-73.891105 40.672121),0.8,1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0,0.266667
1,*313639/23_5202,3057940012,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),0.6,1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0,0.3
2,*324973/22_5308,3057820030,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),0.6,1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,10.0,0.15
3,*53336/16_170279,2032510420,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),0.8,1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,medium-high,6.0,0.0,0.0,0.0,9.0,0.0,2.0,5.0,5.0,0.0,23.0,0.0,145.0,0.0,2.0,41.0,0.0,0.0,1.0,5.0,4.0,248.0,0.002273
4,*5990/17_2703,2025770038,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1.6,1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929,10455,44380.0,0.9971,0.9909,0.9972,0.9499,0.9971,48.5,12.5,32.1,9.5,10.1,28.1,19.5,17.9,75.1,14.5,51.9,21.1,74.1,1.1,0.0,0.0,1.4,1.0,98.6,1.4,False,Q4 (High),high,0.0,5.0,8.0,0.0,21.0,8.0,34.0,10.0,9.0,0.0,89.0,0.0,78.0,0.0,5.0,41.0,1.0,0.0,3.0,31.0,13.0,356.0,0.024615


In [107]:
bbl_evictions_svi_311.isna().sum().sum(), bbl_evictions_svi_311.duplicated().sum()
# nice

(np.int64(0), np.int64(0))

In [108]:
bbl_evictions_svi_311.sort_values('average_year_eviction_unit_count', ascending=True).head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,average_year_eviction_count,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count
27384,38722/17_168488,2051410120,38722/17,168488,140 DARROW PLACE,6E,2018-02-22,BRONX,10475,Not an Ejectment,Possession,40.878445,-73.833014,10.0,12.0,46201.0,2128849,Co-op City,2018,2018-02,POINT (-73.833014 40.878445),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,medium-high,26.0,24.0,8.0,0.0,99.0,46.0,35.0,96.0,58.0,1.0,178.0,2.0,431.0,0.0,11.0,282.0,4.0,0.0,8.0,384.0,230.0,1923.0,1.8e-05
28442,43988/18_170882,2051410120,43988/18,170882,20A COOPER PLACE,unknown,2019-02-15,BRONX,10475,Not an Ejectment,Possession,40.877354,-73.824798,10.0,12.0,46201.0,2124132,Co-op City,2019,2019-02,POINT (-73.824798 40.877354),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,medium-high,26.0,24.0,8.0,0.0,99.0,46.0,35.0,96.0,58.0,1.0,178.0,2.0,431.0,0.0,11.0,282.0,4.0,0.0,8.0,384.0,230.0,1923.0,1.8e-05
42092,65339/16_167003,2051410120,65339/16,167003,100 ALCOTT PLACE,17H,2017-05-10,BRONX,10475,Not an Ejectment,Possession,40.871673,-73.830392,10.0,12.0,46201.0,2095392,Co-op City,2017,2017-05,POINT (-73.830392 40.871673),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,medium-high,26.0,24.0,8.0,0.0,99.0,46.0,35.0,96.0,58.0,1.0,178.0,2.0,431.0,0.0,11.0,282.0,4.0,0.0,8.0,384.0,230.0,1923.0,1.8e-05
54962,9105/18_170233,2051410120,9105/18,170233,900 CO-OP CITY BLVD,24H,2018-11-02,BRONX,10475,Not an Ejectment,Possession,40.878284,-73.829767,10.0,12.0,46201.0,2095387,Co-op City,2018,2018-11,POINT (-73.829767 40.878284),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,medium-high,26.0,24.0,8.0,0.0,99.0,46.0,35.0,96.0,58.0,1.0,178.0,2.0,431.0,0.0,11.0,282.0,4.0,0.0,8.0,384.0,230.0,1923.0,1.8e-05
24245,32806/17_168478,2051410120,32806/17,168478,920 CO-OP CITY BLVD,16C,2018-01-30,BRONX,10475,Not an Ejectment,Possession,40.878472,-73.830429,10.0,12.0,46201.0,2128851,Co-op City,2018,2018-01,POINT (-73.830429 40.878472),0.2,1969.0,D4,33.0,10914.0,RIVERBAY CORPORATION,13540113.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10475,43517.0,0.9054,0.9778,0.9837,0.988,0.9799,21.8,9.0,14.8,4.7,23.8,19.7,16.1,4.1,44.5,5.6,34.4,60.0,30.6,2.3,0.0,0.0,2.8,0.0,95.7,4.3,False,Q3,medium-high,26.0,24.0,8.0,0.0,99.0,46.0,35.0,96.0,58.0,1.0,178.0,2.0,431.0,0.0,11.0,282.0,4.0,0.0,8.0,384.0,230.0,1923.0,1.8e-05


In [109]:
bbl_evictions_svi_311.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,66397.0
mean,1.167607
std,2.263121
min,0.2
25%,0.2
50%,0.6
75%,1.2
max,35.6


In [110]:
bbl_evictions_svi_311.drop(columns=['average_year_eviction_count'], inplace=True)

In [111]:
# groupby
evictions_per_building = bbl_evictions_svi_311.groupby('bin').size().reset_index(name='total_evictions')
bin_5172098 = evictions_per_building[evictions_per_building['bin'] == 5172098]
bin_5172098

Unnamed: 0,bin,total_evictions
30363,5172098,1


In [112]:
evictions_per_building['average_year_eviction_count'] = evictions_per_building['total_evictions']/5
bbl_evictions_svi_311 = bbl_evictions_svi_311.merge(evictions_per_building[['bin', 'average_year_eviction_count']], on='bin', how='left')
bbl_evictions_svi_311.sort_values('average_year_eviction_count', ascending=False)[['average_year_eviction_count', 'unitsres']].head()
# good

Unnamed: 0,average_year_eviction_count,unitsres
46067,14.8,1654.0
46066,14.8,1654.0
24820,14.8,1654.0
18546,14.8,1654.0
18545,14.8,1654.0


In [113]:
bbl_evictions_svi_311.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,66397.0
mean,1.059828
std,1.421173
min,0.2
25%,0.2
50%,0.6
75%,1.2
max,14.8


In [114]:
# Stop

# **average_year_eviction_unit_count**

In [115]:
# get the correct average_year_eviction_unit_count
bbl_evictions_svi_311['average_year_eviction_unit_count'] = bbl_evictions_svi_311.average_year_eviction_count / bbl_evictions_svi_311.unitsres
bbl_evictions_svi_311[['bin', 'average_year_eviction_unit_count']].sort_values('average_year_eviction_unit_count', ascending=False).head()
# good

Unnamed: 0,bin,average_year_eviction_unit_count
65731,4458489,1.6
65730,4458489,1.6
65733,4458489,1.6
65401,4458489,1.6
65735,4458489,1.6


In [116]:
bbl_evictions_svi_311.columns

Index(['primary_key', 'bbl', 'court_index_number', 'docket_number',
       'eviction_address', 'eviction_apartment_number', 'executed_date',
       'borough', 'zipcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'nta', 'year', 'month_year', 'geometry',
       'yearbuilt', 'bldgclass', 'numfloors', 'unitsres', 'ownername',
       'bldgarea', 'building_type', 'building_category', 'is_condo',
       'floor_category', 'rent_era', 'architectural_style', 'economic_period',
       'residential_units_category', 'is_llc', 'building_size_category',
       'size_quartile', 'decade', 'fips', 'e_totpop', 'rpl_theme1',
       'rpl_theme2', 'rpl_theme3', 'rpl_theme4', 'rpl_themes', 'ep_pov150',
       'ep_unemp', 'ep_nohsdp', 'ep_uninsur', 'ep_age65', 'ep_age17',
       'ep_disabl', 'ep_limeng', 'ep_noveh', 'ep_crowd', 'ep_hburd', 'ep_afam',
       'ep_hisp', 'ep_asian', 'ep_aian', 'ep_nhpi', 'ep_twom

# **average_year_eviction_nta_count**

In [117]:
evictions_per_nta = bbl_evictions_svi_311['nta'].value_counts().reset_index()
evictions_per_nta.columns = ['nta', 'total_evictions']
evictions_per_nta

Unnamed: 0,nta,total_evictions
0,Central Harlem North-Polo Grounds,1632
1,Crown Heights North,1599
2,Bedford Park-Fordham North,1530
3,East Concourse-Concourse Village,1438
4,Williamsbridge-Olinville,1337
...,...,...
182,Glen Oaks-Floral Park-New Hyde Park,17
183,Annadale-Huguenot-Prince's Bay-Eltingville,17
184,Arden Heights,17
185,Rossville-Woodrow,17


In [118]:
evictions_per_nta.sort_values('total_evictions', ascending=False).head()

Unnamed: 0,nta,total_evictions
0,Central Harlem North-Polo Grounds,1632
1,Crown Heights North,1599
2,Bedford Park-Fordham North,1530
3,East Concourse-Concourse Village,1438
4,Williamsbridge-Olinville,1337


In [119]:
population_per_nta = bbl_evictions_svi.drop_duplicates('nta')[['nta', 'e_totpop']]
population_per_nta.shape

(187, 2)

In [120]:
nta_rates = pd.merge(
    evictions_per_nta,
    population_per_nta,
    on='nta',
    how='left'
)
nta_rates

Unnamed: 0,nta,total_evictions,e_totpop
0,Central Harlem North-Polo Grounds,1632,29887.0
1,Crown Heights North,1599,83125.0
2,Bedford Park-Fordham North,1530,81397.0
3,East Concourse-Concourse Village,1438,88575.0
4,Williamsbridge-Olinville,1337,71862.0
...,...,...,...
182,Glen Oaks-Floral Park-New Hyde Park,17,19733.0
183,Annadale-Huguenot-Prince's Bay-Eltingville,17,63473.0
184,Arden Heights,17,63473.0
185,Rossville-Woodrow,17,34740.0


In [121]:
num_years = 5
nta_rates['average_year_eviction_nta_count'] = (
    nta_rates['total_evictions'] / num_years
)
nta_rates = nta_rates[['nta', 'average_year_eviction_nta_count']]
nta_rates

Unnamed: 0,nta,average_year_eviction_nta_count
0,Central Harlem North-Polo Grounds,326.4
1,Crown Heights North,319.8
2,Bedford Park-Fordham North,306.0
3,East Concourse-Concourse Village,287.6
4,Williamsbridge-Olinville,267.4
...,...,...
182,Glen Oaks-Floral Park-New Hyde Park,3.4
183,Annadale-Huguenot-Prince's Bay-Eltingville,3.4
184,Arden Heights,3.4
185,Rossville-Woodrow,3.4


In [122]:
bbl_evictions_svi_311 = pd.merge(
    bbl_evictions_svi_311,
    nta_rates,
    on='nta',
    how='left'
)

In [123]:
bbl_evictions_svi_311[['nta', 'average_year_eviction_nta_count']].head(30)

Unnamed: 0,nta,average_year_eviction_nta_count
0,East New York,266.0
1,Sunset Park East,38.0
2,Sunset Park West,39.6
3,Van Cortlandt Village,172.8
4,Mott Haven-Port Morris,158.8
5,Bedford Park-Fordham North,306.0
6,Bedford Park-Fordham North,306.0
7,Claremont-Bathgate,125.6
8,East New York,266.0
9,Brighton Beach,44.0


In [124]:
bbl_evictions_svi_311[['nta', 'average_year_eviction_nta_count']].sort_values('average_year_eviction_nta_count', ascending=False)

Unnamed: 0,nta,average_year_eviction_nta_count
8967,Central Harlem North-Polo Grounds,326.4
8970,Central Harlem North-Polo Grounds,326.4
8973,Central Harlem North-Polo Grounds,326.4
8983,Central Harlem North-Polo Grounds,326.4
9018,Central Harlem North-Polo Grounds,326.4
...,...,...
66318,Arden Heights,3.4
2003,Arden Heights,3.4
30939,Rossville-Woodrow,3.4
29821,Arden Heights,3.4


In [125]:
bbl_evictions_svi_311.shape
# 94 is correct

(66397, 94)

# **Step 4: Save the final bbl_evictions_svi_311_merged dataset to the cloud for later use.**

### This should be considered a thoroughly cleaned merged df that's good for any analysis with no nans or duplicates.

In [126]:
bbl_evictions_svi_311.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count,average_year_eviction_count,average_year_eviction_nta_count
0,*308072/22_5865,3037420029,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,East New York,2024,2024-12,POINT (-73.891105 40.672121),1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0,0.266667,0.8,266.0
1,*313639/23_5202,3057940012,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0,0.3,0.6,38.0
2,*324973/22_5308,3057820030,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,10.0,0.15,0.6,39.6
3,*53336/16_170279,2032510420,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,medium-high,6.0,0.0,0.0,0.0,9.0,0.0,2.0,5.0,5.0,0.0,23.0,0.0,145.0,0.0,2.0,41.0,0.0,0.0,1.0,5.0,4.0,248.0,0.002273,0.8,172.8
4,*5990/17_2703,2025770038,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929,10455,44380.0,0.9971,0.9909,0.9972,0.9499,0.9971,48.5,12.5,32.1,9.5,10.1,28.1,19.5,17.9,75.1,14.5,51.9,21.1,74.1,1.1,0.0,0.0,1.4,1.0,98.6,1.4,False,Q4 (High),high,0.0,5.0,8.0,0.0,21.0,8.0,34.0,10.0,9.0,0.0,89.0,0.0,78.0,0.0,5.0,41.0,1.0,0.0,3.0,31.0,13.0,356.0,0.024615,1.6,158.8


In [127]:
bbl_evictions_svi_311.to_csv('/content/drive/My Drive/X999/bbl_evictions_311_svi_normal_times.csv', index=False)
# good, not too big, with all the necessary information
# great for analysis.
# if only for retrival purposes, we could have kept some of the rows that had nans for completeness.

In [128]:
df = pd.read_csv('/content/drive/My Drive/X999/bbl_evictions_311_svi_normal_times.csv')
df.head()

Unnamed: 0,primary_key,bbl,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,zipcode,ejectment,eviction/legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,nta,year,month_year,geometry,yearbuilt,bldgclass,numfloors,unitsres,ownername,bldgarea,building_type,building_category,is_condo,floor_category,rent_era,architectural_style,economic_period,residential_units_category,is_llc,building_size_category,size_quartile,decade,fips,e_totpop,rpl_theme1,rpl_theme2,rpl_theme3,rpl_theme4,rpl_themes,ep_pov150,ep_unemp,ep_nohsdp,ep_uninsur,ep_age65,ep_age17,ep_disabl,ep_limeng,ep_noveh,ep_crowd,ep_hburd,ep_afam,ep_hisp,ep_asian,ep_aian,ep_nhpi,ep_twomore,ep_otherrace,ep_minrty,ep_white,invalid_zip,svi_quartile,svi_group,air_quality,animal_issues,appliances,building_exterior,doors_windows,electrical_issues,elevator_issues,floors_stairs,general_complaints,graffiti_posting,heat_hot_water,homeless_issues,noise_complaints,other_issues,pest_issues,plumbing_issues,police_matters,public_nuisance,safety_concerns,sanitation_issues,walls_ceilings,total_complaints,average_year_eviction_unit_count,average_year_eviction_count,average_year_eviction_nta_count
0,*308072/22_5865,3037420029,*308072/22,5865,356 MILLER AVE,1 AND BASEMENT,2024-12-04,BROOKLYN,11207,Not an Ejectment,Possession,40.672121,-73.891105,5.0,37.0,1152.0,3083989,East New York,2024,2024-12,POINT (-73.891105 40.672121),1930.0,C0,3.0,3.0,356 MILLER LLC,2700.0,pre-war,walk-up,False,low-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","1930-1945, great depression and WWII",3-5 units,True,small,Q3 (50-75%),1930-1939,11207,96801.0,0.9788,0.914,0.9808,0.9812,0.9839,33.9,11.1,19.1,6.0,13.8,22.5,13.8,5.3,57.8,9.1,44.7,55.9,32.8,1.5,0.0,0.0,2.9,1.6,94.7,5.3,False,Q3,medium-high,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,5.0,19.0,0.266667,0.8,266.0
1,*313639/23_5202,3057940012,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,-74.011883,7.0,38.0,118.0,3143881,Sunset Park East,2024,2024-03,POINT (-74.011883 40.635941),1920.0,B2,2.0,2.0,"A.R.M. PARKING, LLC",1204.0,pre-war,two-family,False,low-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",2-unit,True,very small,Q1 (smallest 25%),1920-1929,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0,0.3,0.6,38.0
2,*324973/22_5308,3057820030,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,-74.017068,7.0,38.0,122.0,3143435,Sunset Park West,2024,2024-08,POINT (-74.017068 40.640008),1907.0,C3,4.0,4.0,"LIN, RONG LAN",4800.0,pre-war,walk-up,False,mid-rise,"Pre-1947, pre-rent-control","1900–1920, Beaux-Arts","Pre-1929, pre-great depression",3-5 units,False,medium-small,Q4 (largest 25%),1900-1909,11220,93008.0,0.9885,0.7635,0.9594,0.9179,0.9662,37.5,7.5,37.9,11.6,13.1,25.4,8.4,40.2,61.7,23.7,43.6,1.7,40.9,40.7,0.4,0.0,1.2,0.2,85.0,15.0,False,Q3,medium-high,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,10.0,0.15,0.6,39.6
3,*53336/16_170279,2032510420,*53336/16,170279,3400 PAUL AVENUE,15D,2018-10-17,BRONX,10468,Not an Ejectment,Possession,40.87719,-73.889569,7.0,11.0,409.0,2015444,Van Cortlandt Village,2018,2018-10,POINT (-73.889569 40.87719),1967.0,D4,21.0,352.0,SCOTT TOWER HOUSING CO INC,381213.0,post-war,condo-co-op,True,high-rise,"1947–1969, rent-control","1951–1980, the International Style, Alternative Modernism","1946–1975, pst war economic boom",100+ units,False,mega,Q4 (largest 25%),1960-1969,10468,81397.0,0.9954,0.9407,0.987,0.947,0.9874,39.5,11.6,28.3,9.2,11.2,26.4,12.2,26.9,71.8,19.2,56.7,15.6,78.0,2.3,0.0,0.0,0.5,0.5,96.9,3.1,False,Q3,medium-high,6.0,0.0,0.0,0.0,9.0,0.0,2.0,5.0,5.0,0.0,23.0,0.0,145.0,0.0,2.0,41.0,0.0,0.0,1.0,5.0,4.0,248.0,0.002273,0.8,172.8
4,*5990/17_2703,2025770038,*5990/17,2703,480 CONCORD AVENUE,4E,2019-08-30,BRONX,10455,Not an Ejectment,Possession,40.811197,-73.90881,1.0,8.0,35.0,2003900,Mott Haven-Port Morris,2019,2019-08,POINT (-73.90881 40.811197),1928.0,D7,6.0,65.0,480 CONCORD AVE OWNER LLC,69102.0,pre-war,elevator,False,mid-rise,"Pre-1947, pre-rent-control","1921–1930, Art Deco Skyscrapers","Pre-1929, pre-great depression",21-100 units,True,very large,Q4 (largest 25%),1920-1929,10455,44380.0,0.9971,0.9909,0.9972,0.9499,0.9971,48.5,12.5,32.1,9.5,10.1,28.1,19.5,17.9,75.1,14.5,51.9,21.1,74.1,1.1,0.0,0.0,1.4,1.0,98.6,1.4,False,Q4 (High),high,0.0,5.0,8.0,0.0,21.0,8.0,34.0,10.0,9.0,0.0,89.0,0.0,78.0,0.0,5.0,41.0,1.0,0.0,3.0,31.0,13.0,356.0,0.024615,1.6,158.8


In [129]:
df[['bin', 'average_year_eviction_count']].sort_values('average_year_eviction_count', ascending=False)

Unnamed: 0,bin,average_year_eviction_count
46067,2127134,14.8
46066,2127134,14.8
24820,2127134,14.8
18546,2127134,14.8
18545,2127134,14.8
...,...,...
66382,5000930,0.2
66381,5155506,0.2
66378,5001041,0.2
66377,5000779,0.2


In [130]:
evictions_per_building = bbl_evictions_svi_311.groupby('bin').size().reset_index(name='total_evictions')
evictions_per_building

Unnamed: 0,bin,total_evictions
0,1000793,9
1,1000810,2
2,1000816,1
3,1000826,1
4,1000828,8
...,...,...
30371,5172106,1
30372,5172107,1
30373,5172108,1
30374,5172109,1


In [134]:
evictions_per_building['average_year_eviction_count'] = evictions_per_building['total_evictions'] / 5
evictions_per_building[['bin', 'average_year_eviction_count']].sort_values('average_year_eviction_count', ascending=False)

Unnamed: 0,bin,average_year_eviction_count
13457,2127134,14.8
12866,2113629,13.4
22089,3326600,13.2
12451,2093973,12.2
21613,3253907,9.8
...,...,...
10,1001014,0.2
7,1000844,0.2
6,1000831,0.2
3,1000826,0.2


In [132]:
# merge the average_year_eviction_count back to the evictions_df based on bin
# bbl_evictions_svi_311 = bbl_evictions_svi_311.merge(evictions_per_building[['bin', 'average_year_eviction_count']], on='bin', how='left')
# bbl_evictions_svi_311[['bin', 'average_year_eviction_count']].sort_values('average_year_eviction_count', ascending=False)

In [136]:
# bbl_evictions_svi_311['average_year_eviction_unit_count'] = bbl_evictions_svi_311.average_year_eviction_count / bbl_evictions_svi_311.unitsres
# bbl_evictions_svi_311[['bin', 'average_year_eviction_count', 'unitsres','average_year_eviction_unit_count']].sort_values('average_year_eviction_unit_count', ascending=False).head()