# 04. Optimization Model: Weighted K-Means

## Objective
We want to identify the **Optimal Locations** for new EV Charging Hubs.

**Method: Weighted K-Means Clustering (`scikit-learn`)**
*   **Data Points**: Neighborhood centroids.
*   **Weights**: The `Demand_Score` we calculated (High demand = stronger gravitational pull).
*   **Result**: The algorithm will find $N$ centers that centrally cover the highest demand areas.

## Steps
1.  **Load Data**: Get the processed `barrios_with_demand.geojson`.
2.  **Feature Prep**: Extract Latitude/Longitude and Demand Score.
3.  **Optimization**: Run K-Means with `n_clusters=10`.
4.  **Analysis**: See which neighborhoods these new hubs serve.
5.  **Visualization**: Plot the proposed network.

In [None]:
import pandas as pd
import geopandas as gpd
import folium
from sklearn.cluster import KMeans
import numpy as np
import os

# 1. Load Processed Data
DATA_PATH = '../data/processed/barrios_with_demand.geojson'

# Robust path check
if not os.path.exists(DATA_PATH):
    DATA_PATH = 'data/processed/barrios_with_demand.geojson'
    if not os.path.exists(DATA_PATH):
         print(f"Error: Could not find {DATA_PATH}")

gdf = gpd.read_file(DATA_PATH)

# Ensure we have the Demand_Score
print(f"Loaded {len(gdf)} neighborhoods.")
display(gdf[['NOM', 'Demand_Score', 'EV_Count']].sort_values(by='Demand_Score', ascending=False).head())

## 2. Prepare Data for Scikit-Learn
We calculate the **Centroid** of each neighborhood and then adjust for **Unmet Demand**.
We want to prioritize areas with high demand but LOW supply.

In [None]:
from sklearn.preprocessing import MinMaxScaler

# Calculate Centroids (in Lat/Lon)
# Note: Our CRS is EPSG:4326 (Lat/Lon), so geometry.centroid works directly
gdf['centroid'] = gdf.geometry.centroid
gdf['lat'] = gdf.centroid.y
gdf['lng'] = gdf.centroid.x

# --- NEW: Calculate Unmet Demand ---
# We want to target areas with High Demand but Low Supply.
# 1. Normalize existing Charger Count (0 to 1)
scaler = MinMaxScaler()
gdf['Norm_Supply'] = scaler.fit_transform(gdf[['Charger_Count']].fillna(0))

# 2. Calculate Unmet Demand
# Logic: Unmet = Total_Demand - (Supply * Impact_Factor)
# We determine that max supply (Norm=1) reduces priority by 80 points (heuristic).
SUPPLY_IMPACT = 80 

gdf['Unmet_Demand'] = gdf['Demand_Score'] - (gdf['Norm_Supply'] * SUPPLY_IMPACT)
gdf['Unmet_Demand'] = gdf['Unmet_Demand'].clip(lower=0) # No negative demand

print("Top Neighborhoods by UNMET Demand:")
display(gdf[['NOM', 'Unmet_Demand', 'Demand_Score', 'Charger_Count']].sort_values(by='Unmet_Demand', ascending=False).head())

# Create the Feature Matrix (X)
X = gdf[['lat', 'lng']].values

# Create the Weights (Sample Weight)
# Use UNMET Demand now!
weights = gdf['Unmet_Demand'].fillna(0).values

print(f"Feature matrix shape: {X.shape}")

## 3. Run Optimization (Weighted K-Means)
Let's propose **10 New Hubs**.
The `sample_weight` is now `Unmet_Demand`.

In [None]:
N_HUBS = 50

# Initialize KMeans
kmeans = KMeans(n_clusters=N_HUBS, random_state=42, n_init=10)

# Run the algorithm
# We pass 'sample_weight' to bias the centers towards high demand
kmeans.fit(X, sample_weight=weights)

# Extract the optimized locations
new_hubs = kmeans.cluster_centers_

# Convert to DataFrame for easy viewing
df_hubs = pd.DataFrame(new_hubs, columns=['Lat', 'Lng'])
df_hubs['Hub_ID'] = range(1, N_HUBS + 1)

print("Optimization Complete! Proposed Locations:")
display(df_hubs)

## 4. Visualize Results
Let's see where these hubs landed.

In [None]:
# Base Map
m = folium.Map(location=[41.3851, 2.1734], zoom_start=12)

# Cleanup: Drop 'centroid' column as it confuses Folium/JSON serialization
# Folium expects only one geometry column (the active one)
gdf_map = gdf.drop(columns=['centroid'], errors='ignore').copy()

# 1. Add Neighborhoods (Choropleth by UNMET Demand Score)
folium.Choropleth(
    geo_data=gdf_map,
    data=gdf_map,
    columns=['Barri_ID', 'Unmet_Demand'],
    key_on='feature.properties.Barri_ID',
    fill_color='YlOrRd', # Yellow to Red (Red = High Demand)
    fill_opacity=0.6,
    line_opacity=0.2,
    legend_name='Unmet Demand Score'
).add_to(m)

# 2. Add New Optimized Hubs (Black Stars)
for idx, row in df_hubs.iterrows():
    folium.Marker(
        location=[row['Lat'], row['Lng']],
        popup=f"<b>Proposed Hub {int(row['Hub_ID'])}</b>",
        icon=folium.Icon(color='black', icon='star', prefix='fa')
    ).add_to(m)

# Save Map
map_file = f"optimization_results_{N_HUBS}_hubs.html"
m.save(map_file)
print(f"Map saved to {map_file}")
display(m)

## 5. Reverse Geocoding (Optional/Heuristic)
Find which Neighborhood each New Hub belongs to.

In [None]:
# We can use a spatial join again to see which polygon contains the point
gdf_hubs = gpd.GeoDataFrame(
    df_hubs,
    geometry=gpd.points_from_xy(df_hubs.Lng, df_hubs.Lat),
    crs="EPSG:4326"
)

result = gpd.sjoin(gdf_hubs, gdf[['NOM', 'geometry', 'Unmet_Demand']], how='left', predicate='within')

print("Proposed Locations by Neighborhood:")
display(result[['Hub_ID', 'NOM', 'Unmet_Demand']].sort_values(by='Unmet_Demand', ascending=False))