---
format: 
  html:
    toc: true
    page-layout: full
execute:
    warning: false
    echo: true
    eval: true
---

## **Clustering Risk Factors: Choosing 3 Clusters**

***


While the **Elbow Method** suggested that **2 clusters** might be the optimal choice based on the inertia plot, we opted to use **3 clusters** for the following key reasons:

1.	**Avoiding Underclassification**: By selecting 3 clusters, we aim to avoid oversimplifying the data. Limiting the clustering to just two groups might overlook important distinctions within neighborhoods, especially when capturing the full complexity of urban dynamics. Using three clusters allows us to better reflect the diversity present in the dataset, especially in terms of risk factors.

2.	**Nuanced Insights**: Clustering with 3 groups offers a more detailed exploration of the data. Urban environments like Chicago are marked by diverse socioeconomic, demographic, and spatial characteristics. By incorporating more clusters, we can capture this variability more effectively, providing a more nuanced understanding of crime, socioeconomic conditions, and other risk factors that affect different areas of the city.

3.	**Flexibility in Analysis**: Opting for 3 clusters provides more flexibility in subsequent analyses and interpretations. With more clusters, we can identify specific trends or patterns that may be particularly relevant for targeted policy-making or community development. It allows for a more detailed examination of local conditions, which could be overlooked with fewer clusters.

4.	**Consensus Among Indices**: Although the Elbow Method and other indices indicated that 2 clusters were the most optimal in terms of inertia reduction, there was still notable support (6 indices) for 3 clusters. This consensus from different indices suggests that three clusters could still be a valid and effective choice for segmenting the risk factors.

5.	**Contextual Relevance**: Our decision to select 3 clusters was informed by the context of Chicago’s neighborhoods. There are well-documented differences in socioeconomic factors, crime rates, and housing characteristics across different parts of the city. Given this, choosing 3 clusters aligns with our understanding of the urban landscape, where more than two categories might be necessary to capture the full range of neighborhood variations.

6.	**Exploratory Nature of Clustering**: Clustering is inherently exploratory, and it’s common practice to test multiple cluster numbers to see how the results differ. By choosing 3 clusters, we open the door to explore various groupings, uncovering new patterns or insights that could be important for understanding the spatial distribution of risk factors.


In [None]:
#| code-fold: true

import geopandas as gpd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from geopy.distance import geodesic
import numpy as np

#variable_net = variable_net.to_crs(epsg=26971) 
#Assault21 = Assault21.to_crs(epsg=26971)

variable_net['x'] = variable_net.geometry.x
variable_net['y'] = variable_net.geometry.y
features = variable_net[['x', 'y']]

#features



inertia = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(features)
    inertia.append(kmeans.inertia_)
    
    
    
#inertia



# Plot Elbow Method
plt.figure(figsize=(10, 6))
plt.plot(range(1, 11), inertia, marker='o', color='#777181', linewidth = 2.5)
plt.grid(axis='y', linestyle='-', alpha=0.1)
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method')
plt.xticks(rotation=0, ha='center', fontsize=8)
plt.yticks(fontsize=8)
plt.gca().set_facecolor('white')
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['left'].set_color('grey')
plt.gca().spines['bottom'].set_color('grey')
plt.show()

![](../images/elbow.jpeg){width=75%}