# Module 6 – Episode 23: Clustering for Hotspot Detection (DBSCAN)

In this episode, you'll learn how to detect accident **hotspots** using the DBSCAN clustering algorithm.
We’ll cluster accident points in Lisbon that are within **50 meters** of each other and visualize the results on an interactive `Folium` map.

## Install Required Libraries

In [None]:
!pip install folium geopandas scikit-learn --quiet

In [None]:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import folium
from sklearn.cluster import DBSCAN
import numpy as np

## Load and Convert the Dataset

In [None]:
# URL to the raw CSV file
url = "https://raw.githubusercontent.com/tamagusko/geospatial-data-science-course/main/data/Dummy_Accident_Dataset_Lisbon.csv"

# Load CSV file (dummy data)
df = pd.read_csv(url)

# Convert to GeoDataFrame
geometry = [Point(xy) for xy in zip(df["Longitude"], df["Latitude"])]
gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")

# Preview
gdf.head()

Unnamed: 0,ID,Latitude,Longitude,Severity,geometry
0,1,38.731971,-9.127767,1,POINT (-9.12777 38.73197)
1,3,38.713751,-9.14639,2,POINT (-9.14639 38.71375)
2,4,38.711161,-9.153781,3,POINT (-9.15378 38.71116)
3,5,38.736824,-9.121711,1,POINT (-9.12171 38.73682)
4,6,38.733835,-9.146536,2,POINT (-9.14654 38.73384)


## Apply DBSCAN to Detect Clusters

In [None]:
# Project to meters (UTM for Lisbon)
gdf_meters = gdf.to_crs(epsg=3763)
coords = np.array(list(zip(gdf_meters.geometry.x, gdf_meters.geometry.y)))

# Run DBSCAN: 50m radius, min 2 points per cluster
db = DBSCAN(eps=50, min_samples=2, metric='euclidean').fit(coords)
gdf["Cluster"] = db.labels_.astype(str)

# Check result
gdf["Cluster"].value_counts()

Unnamed: 0_level_0,count
Cluster,Unnamed: 1_level_1
-1,15
2,5
1,3
0,2


## Plot Clusters on a `Folium` Map

In [None]:
# Generate base map centered on Lisbon
lisbon_center = [gdf["Latitude"].mean(), gdf["Longitude"].mean()]
map = folium.Map(location=lisbon_center, zoom_start=14, tiles="CartoDB Positron")

# Define color palette
import matplotlib.cm as cm
import matplotlib.colors as colors
unique_clusters = sorted(gdf["Cluster"].unique())
cmap = cm.get_cmap('Set1', len(unique_clusters))
cluster_colors = {cluster: colors.rgb2hex(cmap(i)) for i, cluster in enumerate(unique_clusters)}

# Add points to map
for _, row in gdf.iterrows():
    folium.CircleMarker(
        location=[row["Latitude"], row["Longitude"]],
        radius=6,
        color=cluster_colors[row["Cluster"]],
        fill=True,
        fill_opacity=0.7,
        popup=f"ID: {row['ID']}<br>Severity: {row['Severity']}<br>Cluster: {row['Cluster']}"
    ).add_to(map)

# Show the map
map

  cmap = cm.get_cmap('Set1', len(unique_clusters))


## Summary
In this episode, you learned how to detect spatial **accident hotspots** using the `DBSCAN` clustering algorithm.

You projected geographic coordinates to meters, applied `DBSCAN` to group nearby accident points within 50 meters, and assigned cluster labels to each observation.  
Then, using `Folium`, you visualized the results as an interactive map with color-coded markers representing different clusters.