## Introduction

In this notebook, we want to raise these questions:

1) Which neighborhoods are most prone to evictions? 
2) What are the influences in those neighborhoods? Societal? Policy? Individuals? Corporate investments? Historic reasons? 

One hypothesis we would like to test (with additional literature): 

1) The neighborhoods that oversaw the most evictions also went through drastic economic improvements/business investments.

In [2]:
# !pip install geopandas folium matplotlib seaborn scipy
# !pip install esda
# !pip install splot
# # for google colab, had to reinstall some pacakges.

In [None]:
# !pip install geopandas folium matplotlib seaborn scipy esda splot

In [3]:
import pandas as pd
import geopandas as gpd
import numpy as np
import datetime as dt
import scipy

from sklearn.cluster import DBSCAN
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# visualization
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import seaborn as sns
import folium
from folium.plugins import HeatMap
from folium import Marker
from folium.plugins import MarkerCluster
import plotly.express as px
import plotly.io as pio

# spatial statistics
from esda.moran import Moran
from esda import Moran_Local
from esda.getisord import G_Local
from shapely.geometry import Point
from libpysal.weights import Queen, Rook

# system and utility
import warnings
import os
import io
from IPython.display import IFrame
from google.colab import files

from libpysal.weights import Queen, Rook
from esda.moran import Moran
import matplotlib.pyplot as plt
from splot.esda import moran_scatterplot

# suppress warnings
warnings.filterwarnings('ignore')

# inline
%matplotlib inline

# Part 1: Get the data

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
# data source:
file_path = '/content/drive/My Drive/X999/top_evictions_bbl_merged_df.csv'

In [6]:
# data source:
file_path2 = '/content/drive/My Drive/X999/evictions_cleaned_lisa.csv'

In [7]:
# data source:
file_path3 = '/content/drive/My Drive/X999/BBL.csv'

In [8]:
# data source:
file_path4 = '/content/drive/My Drive/X999/evictions_df_cleaned.csv'

In [9]:
evictions_df_raw = pd.read_csv(file_path4)

In [10]:
evictions_df = evictions_df_raw.copy()

In [11]:
bbl_df_raw = pd.read_csv(file_path3)

In [12]:
bbl_df = bbl_df_raw.copy()

In [13]:
evictions_cleaned_raw = pd.read_csv(file_path2)
evictions_cleaned = evictions_cleaned_raw.copy()

In [14]:
top_evictions_bbl_merged_df_raw = pd.read_csv(file_path)

In [15]:
top_evictions_bbl_merged_df = top_evictions_bbl_merged_df_raw.copy()

In [16]:
top_evictions_bbl_merged_df.head(2)

Unnamed: 0,bbl,average_year_eviction_count,borough,block,lot,community board,census tract 2010,ownername,ownertype,lotarea,...,unitstotal,assessland,assesstot,landuse,yearbuilt,latitude,longitude,zonedist1,zonedist2,postcode
0,2028820229,36.285714,BX,2882,229,205.0,53.0,"RIVER PARK BRONX APARTMENTS, INC.",X,856800.0,...,1660.0,11566800.0,55282050.0,4.0,1973.0,40.85187,-73.922649,M2-1,,10453.0
1,2051410120,27.428571,BX,5141,120,210.0,462.01,RIVERBAY CORPORATION,X,5048550.0,...,10948.0,25285500.0,218048850.0,3.0,1969.0,40.875013,-73.828362,R6,,10475.0


In [17]:
top_evictions_bbl_merged_df.columns

Index(['bbl', 'average_year_eviction_count', 'borough', 'block', 'lot',
       'community board', 'census tract 2010', 'ownername', 'ownertype',
       'lotarea', 'bldgarea', 'numbldgs', 'numfloors', 'unitsres',
       'unitstotal', 'assessland', 'assesstot', 'landuse', 'yearbuilt',
       'latitude', 'longitude', 'zonedist1', 'zonedist2', 'postcode'],
      dtype='object')

In [18]:
evictions_cleaned.head(2)

Unnamed: 0,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,borough,eviction_postcode,ejectment,eviction/legal_possession,latitude,...,eviction_count,year,average_year_eviction_count,cluster,cluster_k,same_cluster,lisa_cluster_rook,lisa_pvalue_rook,lisa_cluster_queen,lisa_pvalue_queen
0,*313639/23,5202,710 61ST STREET,2ND FLOOR,2024-03-04,BROOKLYN,11220,Not an Ejectment,Possession,40.635941,...,3,2024,3.0,0,0,True,4,0.241,4,0.24
1,*324973/22,5308,462 60TH STREET,FOURTH FLOOR APT AKA,2024-08-13,BROOKLYN,11220,Not an Ejectment,Possession,40.640008,...,3,2024,3.0,0,0,True,4,0.201,4,0.211


In [19]:
evictions_cleaned.columns

Index(['court_index_number', 'docket_number', 'eviction_address',
       'eviction_apartment_number', 'executed_date', 'borough',
       'eviction_postcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'bbl', 'nta', 'geometry', 'eviction_count',
       'year', 'average_year_eviction_count', 'cluster', 'cluster_k',
       'same_cluster', 'lisa_cluster_rook', 'lisa_pvalue_rook',
       'lisa_cluster_queen', 'lisa_pvalue_queen'],
      dtype='object')

# Part 2 Check out the top 20 buildings that have the most evictions and their boroughs

But borough is too broad and we want nta (neighborhood names)

In [20]:
top_borough_counts = top_evictions_bbl_merged_df['borough'].value_counts()
top_borough_counts


Unnamed: 0_level_0,count
borough,Unnamed: 1_level_1
BX,11
QN,3
MN,2
BK,2
SI,1


In [21]:
borough_counts = evictions_cleaned['borough'].value_counts()
borough_counts

Unnamed: 0_level_0,count
borough,Unnamed: 1_level_1
BRONX,26659
BROOKLYN,22184
QUEENS,13481
MANHATTAN,11570
STATEN ISLAND,2590


In [22]:
evictions_cleaned.columns

Index(['court_index_number', 'docket_number', 'eviction_address',
       'eviction_apartment_number', 'executed_date', 'borough',
       'eviction_postcode', 'ejectment', 'eviction/legal_possession',
       'latitude', 'longitude', 'community_board', 'council_district',
       'census_tract', 'bin', 'bbl', 'nta', 'geometry', 'eviction_count',
       'year', 'average_year_eviction_count', 'cluster', 'cluster_k',
       'same_cluster', 'lisa_cluster_rook', 'lisa_pvalue_rook',
       'lisa_cluster_queen', 'lisa_pvalue_queen'],
      dtype='object')

In [23]:
neighborhood_counts = evictions_cleaned['nta'].value_counts()
neighborhood_counts

Unnamed: 0_level_0,count
nta,Unnamed: 1_level_1
East New York,1737
Crown Heights North,1714
Central Harlem North-Polo Grounds,1588
Bedford Park-Fordham North,1534
East Concourse-Concourse Village,1530
...,...
Windsor Terrace,40
Douglas Manor-Douglaston-Little Neck,38
Rossville-Woodrow,36
Glen Oaks-Floral Park-New Hyde Park,33


In [24]:
bbl_df.postcode

Unnamed: 0,postcode
0,10454.0
1,10007.0
2,11223.0
3,11379.0
4,10018.0
...,...
858184,11433.0
858185,11433.0
858186,11214.0
858187,11214.0


In [25]:
len(evictions_cleaned['nta'].unique()), len(bbl_df.postcode.unique())

(190, 216)

#### short summary:

In terms of number/sizes, nta (neighborhood names) and zipcodes do not vary that much. Therefore, we decided to go with nta

# Part 3 Get the 20 neighborhood names that have the most evictions per neighborhood in NYC area

### goal: to get the annual eviction counts per nta, and anually eviction counts per bbl per nta, and also have the borough info. This means, we need to start with eviction_cleaned df and groupby bbl, year and define new columns, and merge dfs

In [30]:
evictions_cleaned.executed_date = pd.to_datetime(evictions_cleaned.executed_date)
# did not do it previously

In [31]:
# as below:
# group by bbl and year, and count the number of evictions for each year for each building
# group by bbl and calculate the average eviction count per year for each building
# Merge the nta information back into the dataframe

evictions_cleaned['year'] = evictions_cleaned['executed_date'].dt.year
eviction_counts = evictions_cleaned.groupby(['bbl', 'year']).size().reset_index(name='eviction_count')
yearly_avg_evictions_per_bbl = eviction_counts.groupby('bbl')['eviction_count'].mean().reset_index(name='average_year_eviction_count')
yearly_avg_evictions_per_bbl = yearly_avg_evictions_per_bbl.merge(evictions_cleaned[['bbl', 'nta']].drop_duplicates(), on='bbl', how='left')

In [32]:
# group by nta and calculate the average eviction count per building in each neighborhood (per building)
# get the total yearly eviction count per nta, like sum up of all evictions per year per nta
# calculate the average eviction count per year for each nta, just overall count, not per bbl

average_evictions_per_nta_bbl = yearly_avg_evictions_per_bbl.groupby('nta')['average_year_eviction_count'].mean().reset_index(name='average_eviction_count_per_bbl_in_nta')
total_yearly_evictions_per_nta = evictions_cleaned.groupby(['nta', 'year']).size().reset_index(name='total_eviction_count_per_year')
average_yearly_evictions_per_nta = total_yearly_evictions_per_nta.groupby('nta')['total_eviction_count_per_year'].mean().reset_index(name='average_eviction_count_per_year_nta')


In [33]:
# meerge back borough information
nta_borough = evictions_cleaned[['nta', 'borough']].drop_duplicates()

In [34]:
# mmerge all the results together
merged_eviction_data = average_evictions_per_nta_bbl.merge(average_yearly_evictions_per_nta, on='nta', how='left')
merged_eviction_data = merged_eviction_data.merge(nta_borough, on='nta', how='left')


In [48]:
# sort by both the average eviction count per bbl
# in the nta and the average eviction count per nta

sorted_merged_eviction_data = merged_eviction_data.sort_values(by=['average_eviction_count_per_bbl_in_nta', 'average_eviction_count_per_year_nta'], ascending=False).reset_index(drop=True)


In [49]:
sorted_merged_eviction_data.head(20)

# the top

Unnamed: 0,nta,average_eviction_count_per_bbl_in_nta,average_eviction_count_per_year_nta,borough
0,Stuyvesant Town-Cooper Village,5.5,9.166667,MANHATTAN
1,Starrett City,4.602976,27.714286,BROOKLYN
2,Parkchester,2.938384,72.428571,BRONX
3,park-cemetery-etc-Bronx,2.5625,7.666667,BRONX
4,Co-op City,1.795908,52.142857,BRONX
5,Fresh Meadows-Utopia,1.779206,15.714286,QUEENS
6,Corona,1.642485,93.285714,QUEENS
7,University Heights-Morris Heights,1.61736,193.142857,BRONX
8,Oakwood-Oakwood Beach,1.603535,18.0,STATEN ISLAND
9,Seagate-Coney Island,1.584612,54.0,BROOKLYN


In [50]:
sorted_merged_eviction_data.head(20).borough.value_counts()

Unnamed: 0_level_0,count
borough,Unnamed: 1_level_1
BRONX,9
QUEENS,5
BROOKLYN,3
STATEN ISLAND,2
MANHATTAN,1


In [51]:
sorted_merged_eviction_data.head(20).nta.to_list()

['Stuyvesant Town-Cooper Village',
 'Starrett City',
 'Parkchester',
 'park-cemetery-etc-Bronx',
 'Co-op City',
 'Fresh Meadows-Utopia',
 'Corona',
 'University Heights-Morris Heights',
 'Oakwood-Oakwood Beach',
 'Seagate-Coney Island',
 'Springfield Gardens North',
 'West Brighton',
 'West Concourse',
 'Bronxdale',
 'Claremont-Bathgate',
 'Hammels-Arverne-Edgemere',
 'Grymes Hill-Clifton-Fox Hills',
 'Kew Gardens Hills',
 'Mount Hope',
 'Van Cortlandt Village']

In [52]:
sorted_merged_eviction_data.borough.value_counts()

Unnamed: 0_level_0,count
borough,Unnamed: 1_level_1
QUEENS,57
BROOKLYN,51
BRONX,40
MANHATTAN,30
STATEN ISLAND,18


# Part 4 Answers to the questions and hypothesis:


1) Which neighborhoods are most prone to evictions? 

The top 20 are:
- Stuyvesant Town-Cooper Village
- Starrett City
- Parkchester
- Park-Cemetery-etc-Bronx
- Co-op City
- Fresh Meadows-Utopia
- Corona
- University Heights-Morris Heights
- Oakwood-Oakwood Beach
- Seagate-Coney Island
- Springfield Gardens North
- West Brighton
- West Concourse
- Bronxdale
- Claremont-Bathgate
- Hammels-Arverne-Edgemere
- Grymes Hill-Clifton-Fox Hills
- Kew Gardens Hills
- Mount Hope
- Van Cortlandt Village

They are mostly in Bronx (9) and Queens (5)

2) What are the influences in those neighborhoods? Societal? Policy? Individuals? Corporate investments? Historic reasons? 

source:
https://www.nyc.gov/office-of-the-mayor/news/638-24/mayor-adams-celebrates-city-council-approval-plan-create-approximately-7-000-new-homes-and

https://furmancenter.org/stateofthecity/view/the-geography-of-new-housing


One hypothesis we would like to test (with additional literature): 

1) The neighborhoods that oversaw the most evictions also went through drastic economic improvements/business investments.

## Additionally, can we get the list of bbls and their ownernames? (will need additional resources later)

In [53]:
top_20_neighborhoods = sorted_merged_eviction_data.head(20)['nta']
# get the bbls
top_20_neighborhoods_bbls = evictions_cleaned[evictions_cleaned['nta'].isin(top_20_neighborhoods)][['bbl', 'nta']].drop_duplicates()


In [54]:
# back with bbl.csv
# merge with the bbl to get the owner names
top_20_neighborhoods_bbls_owners = top_20_neighborhoods_bbls.merge(bbl_df[['bbl', 'ownername']], on='bbl', how='left')

In [57]:
top_20_neighborhoods_bbls_owners.shape

(2384, 3)

In [56]:
top_20_neighborhoods_bbls_owners

Unnamed: 0,bbl,nta,ownername
0,2.032510e+09,Van Cortlandt Village,SCOTT TOWER HOUSING CO INC
1,2.030500e+09,Claremont-Bathgate,2235 BASSFORD PARTNERS LLC
2,2.028910e+09,Mount Hope,J R BRONZE CORP
3,2.032560e+09,Van Cortlandt Village,2753 EAST KINGSBRIDGE TERRACE INC.
4,4.160010e+09,Hammels-Arverne-Edgemere,NYC HOUSING AUTHORITY
...,...,...,...
2379,5.028670e+09,Grymes Hill-Clifton-Fox Hills,"YANG, HSIO-SUNG"
2380,5.042650e+09,Oakwood-Oakwood Beach,"SONITIS, DINO"
2381,5.006150e+09,Grymes Hill-Clifton-Fox Hills,MAXIE CT VENTURES LLC
2382,5.028820e+09,Grymes Hill-Clifton-Fox Hills,RPM CONCORD HOLDING CORP.
