# **Introduction**

This notebook included statistical fun facts that were left out from previous analysis. We put them together here for write-up. Will periodically update whenever we need extra information. We will try to add back these tiny fun facts to the original analysis for better flow, but if they stand alone fine, we will keep adding
them to this file.

In [None]:
# !pip install geopandas folium matplotlib seaborn scipy
# !pip install esda
# !pip install splot
# # for google colab, had to reinstall some pacakges.

In [None]:
import pandas as pd
import geopandas as gpd
import numpy as np
import datetime as dt
import scipy

from sklearn.cluster import DBSCAN
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# visualization
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import seaborn as sns
import folium
from folium.plugins import HeatMap
from folium import Marker
from folium.plugins import MarkerCluster
import plotly.express as px
import plotly.io as pio

# system and utility
import warnings
import os
import io
from IPython.display import IFrame
from google.colab import files

import matplotlib.pyplot as plt

# suppress warnings
warnings.filterwarnings('ignore')

# inline
%matplotlib inline

In [None]:
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_row', 1000)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# save cleaned evictions data to my drive
link1 = '/content/drive/My Drive/X999/evictions_pre_post_covid.csv'
link2 = '/content/drive/My Drive/X999/evictions_covid.csv'

In [None]:
normal_df = pd.read_csv(link1)
covid_df = pd.read_csv(link2)

In [None]:
brooklyn = normal_df[normal_df['borough'] == 'BROOKLYN']
manhattan = normal_df[normal_df['borough'] == 'MANHATTAN']
queens = normal_df[normal_df['borough'] == 'QUEENS']
bronx = normal_df[normal_df['borough'] == 'BRONX']
staten_island = normal_df[normal_df['borough'] == 'STATEN ISLAND']

In [None]:
brooklyn.shape, manhattan.shape, queens.shape, bronx.shape, staten_island.shape

((21713, 22), (12060, 22), (13725, 22), (26701, 22), (2519, 22))

## **What are the average eviction counts per building per year in different boroughs**

In [None]:
brooklyn.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,21713.0
mean,1.087118
std,3.356285
min,0.2
25%,0.2
50%,0.4
75%,0.8
max,35.6


In [None]:
manhattan.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,12060.0
mean,0.974411
std,1.517442
min,0.2
25%,0.2
50%,0.6
75%,1.0
max,13.8


In [None]:
staten_island.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,2519.0
mean,0.836284
std,1.197151
min,0.2
25%,0.2
50%,0.4
75%,0.6
max,5.6


In [None]:
queens.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,13725.0
mean,1.137647
std,2.522823
min,0.2
25%,0.2
50%,0.4
75%,0.8
max,20.6


In [None]:
bronx.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,26701.0
mean,1.576795
std,2.553449
min,0.2
25%,0.4
50%,1.0
75%,1.8
max,27.8


In [None]:
manhattan.average_year_eviction_count.describe()

Unnamed: 0,average_year_eviction_count
count,12060.0
mean,0.974411
std,1.517442
min,0.2
25%,0.2
50%,0.6
75%,1.0
max,13.8


In [None]:
borough_dict = {'Brooklyn': brooklyn.average_year_eviction_count.mean(), 'Manhattan': manhattan.average_year_eviction_count.mean(),
                'Queens': queens.average_year_eviction_count.mean(), 'Bronx': bronx.average_year_eviction_count.mean(),
                'Staten Island': staten_island.average_year_eviction_count.mean()}

In [None]:
sorted_list = sorted(borough_dict)

In [None]:
sorted_list

['Bronx', 'Brooklyn', 'Manhattan', 'Queens', 'Staten Island']

### **In short, Bronx has the most average eviction counts, followed by brooklyn, manhattan, queens, and the Staten island.**

## **Check out the exact total records before 2025**

In [None]:
link3 = '/content/drive/My Drive/X999/Evictions.csv'
raw_evictions = pd.read_csv(link3)

In [None]:
raw_evictions.shape

(104457, 20)

In [None]:
raw_evictions.columns

Index(['Court Index Number', 'Docket Number ', 'Eviction Address',
       'Eviction Apartment Number', 'Executed Date', 'Marshal First Name',
       'Marshal Last Name', 'Residential/Commercial', 'BOROUGH',
       'Eviction Postcode', 'Ejectment', 'Eviction/Legal Possession',
       'Latitude', 'Longitude', 'Community Board', 'Council District',
       'Census Tract', 'BIN', 'BBL', 'NTA'],
      dtype='object')

In [None]:
raw_evictions['Executed Date'] = pd.to_datetime(raw_evictions['Executed Date'], errors='coerce')

In [None]:
raw_evictions['year'] = raw_evictions['Executed Date'].dt.year
# need to explicitly declaire year

In [None]:
raw_evictions.year.value_counts()

Unnamed: 0_level_0,count
Executed Date,Unnamed: 1_level_1
2017,22522
2018,21830
2019,18715
2024,16755
2023,13447
2022,5067
2020,3422
2025,2431
2021,268


In [None]:
raw_evictions.columns

Index(['Court Index Number', 'Docket Number ', 'Eviction Address',
       'Eviction Apartment Number', 'Executed Date', 'Marshal First Name',
       'Marshal Last Name', 'Residential/Commercial', 'BOROUGH',
       'Eviction Postcode', 'Ejectment', 'Eviction/Legal Possession',
       'Latitude', 'Longitude', 'Community Board', 'Council District',
       'Census Tract', 'BIN', 'BBL', 'NTA', 'year'],
      dtype='object')

In [None]:
raw_evictions = raw_evictions[raw_evictions['year'] != 2025]

In [None]:
raw_evictions.shape
# used in write up

(102026, 21)

# **Clustering Patterns**

## **SVI Stuff**