---
format: 
  html:
    toc: true
    page-layout: full
execute:
    warning: false
    echo: true
    eval: true
---

## **K-Means Clustering of Risk Factors**

***


Following the determination of the optimal number of clusters (3), we applied the **K-Means clustering** algorithm to partition the dataset into distinct groups based on spatial features. The key steps and results from this clustering analysis are as follows:

1. **Fitting K-Means Clustering**
We first fit the K-Means algorithm to the data using the 3 clusters identified earlier. The algorithm was applied to the spatial coordinates (x, y) of the points, which represent different risk factors (e.g., graffiti, street light outages, liquor retail stores, etc.). The K-Means algorithm groups the data into clusters, with each point assigned to the nearest cluster centroid.

2. **Calculating Cluster Centroids**
Once the clusters were formed, we calculated the centroids for each cluster. These centroids represent the mean position of all points within a given cluster. We then converted the centroid coordinates into a GeoDataFrame to enable visualization and further spatial analysis.

3. **Analyzing Distance to Cluster Centroids**
To analyze how assault incidents relate to the clusters, we calculated the Euclidean distance from each assault point to the nearest cluster centroid. This step is essential for understanding the proximity of assault cases to the identified risk clusters.

4. **Cluster Analysis**
We grouped the assault incidents by the nearest cluster and counted the number of assaults in each cluster. This helped us understand how assault incidents are distributed across the different risk clusters.
The result shows the distribution of assaults across the three clusters:
- Cluster 0: 6,089 assaults
- Cluster 1: 8,622 assaults
- Cluster 2: 5,542 assaults
This indicates that assaults are not evenly distributed across the clusters, with Cluster 1 having the highest number of incidents.

5. **Visualization**
Finally, we visualized the K-Means clustering results by plotting the clusters and their centroids on a map. Each cluster is represented by a different color, and the centroids are marked with black "X" symbols. The assault incidents are also visualized in the plot to observe their proximity to the clusters.


In [None]:
#| code-fold: true

optimal_k = 3  # Replace with your chosen k based on the Elbow Method
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
variable_net['cluster'] = kmeans.fit_predict(features)


centroids = pd.DataFrame(kmeans.cluster_centers_, columns=['x_centroid', 'y_centroid'])
centroids_gdf = gpd.GeoDataFrame(centroids, geometry=gpd.points_from_xy(centroids['x_centroid'], centroids['y_centroid']), crs=variable_net.crs)


Assault21 = Assault21[~Assault21.geometry.is_empty].copy()
centroids_gdf = centroids_gdf[~centroids_gdf.geometry.is_empty].copy()

# Step 2: Calculate distances using valid geometries
def calculate_nearest_cluster_projected(row, centroids):
    # Calculate Euclidean distances using projected CRS coordinates
    distances = centroids.apply(lambda c: ((row.geometry.x - c.geometry.x) ** 2 + (row.geometry.y - c.geometry.y) ** 2) ** 0.5, axis=1)
    return distances.idxmin(), distances.min()


Assault21[['nearest_cluster', 'distance_to_cluster']] = Assault21.apply(
    lambda row: calculate_nearest_cluster_projected(row, centroids_gdf), axis=1, result_type='expand'
)

# Step 4: Analyze results
assaults_per_cluster = Assault21.groupby('nearest_cluster').size()
print(assaults_per_cluster)

In [None]:
#| code-fold: true

custom_colors3 = ['#d2e673', '#d0c7e1', '#777181']  # Add as many colors as clusters

# Visualization
plt.figure(figsize=(10, 8))
for cluster_id, color in zip(range(optimal_k), custom_colors3):
    cluster_points = variable_net[variable_net['cluster'] == cluster_id]
    plt.scatter(cluster_points['x'], cluster_points['y'], label=f'Cluster {cluster_id}', color=color)

plt.scatter(centroids_gdf.geometry.x, centroids_gdf.geometry.y, color='#27232e', label='Centroids', marker='X', s=100)

plt.xlabel('X Coordinate (meters)')
plt.ylabel('Y Coordinate (meters)')
plt.title('K-Means Clustering with Assault Cases', fontsize=20)
plt.legend()
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['left'].set_visible(False)
plt.gca().spines['bottom'].set_visible(False)
plt.xticks([])
plt.yticks([])
plt.show()

![](../images/kmeans.jpeg){width=75%}