# AI-Powered Site Selection for Solar Farms

**Copyright (c) 2026 Shrikara Kaudambady. All rights reserved.**

This notebook demonstrates an advanced workflow for identifying optimal solar farm locations using Multi-Criteria Decision Analysis (MCDA) and K-Means clustering. We will score potential sites based on solar irradiance, terrain slope, grid proximity, and land use constraints.

### 1. Setup and Library Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
from scipy.ndimage import distance_transform_edt

sns.set_theme(style="white")

### 2. Data Simulation
We generate synthetic data layers to model a real-world landscape. Each layer represents a key factor in site selection.

In [None]:
MAP_SIZE = 500
np.random.seed(42)

# Layer 1: Solar Irradiance (higher is better)
x, y = np.mgrid[0:MAP_SIZE, 0:MAP_SIZE]
irradiance = (x + y) + np.random.normal(0, 50, (MAP_SIZE, MAP_SIZE))

# Layer 2: Terrain Slope (lower is better)
slope = np.random.rand(MAP_SIZE, MAP_SIZE) * 20 # Slope in degrees
slope[100:200, 300:400] = np.random.rand(100, 100) * 45 + 20 # A hilly area

# Layer 3: Grid Proximity (lower distance is better)
grid_line = np.zeros((MAP_SIZE, MAP_SIZE), dtype=bool)
grid_line[np.arange(MAP_SIZE), np.arange(MAP_SIZE)] = True # A diagonal power line
grid_proximity = distance_transform_edt(np.logical_not(grid_line))

# Layer 4: Land Use (1:Forest, 2:Urban, 3:Barren, 4:Water)
land_use = np.full((MAP_SIZE, MAP_SIZE), 3) # Default to barren
land_use[400:, :200] = 1 # Forest
land_use[:100, :100] = 2 # Urban
land_use[250:350, 250:350] = 4 # Water body

# Layer 5: Protected Areas (True where development is forbidden)
protected_areas = np.full((MAP_SIZE, MAP_SIZE), False, dtype=bool)
protected_areas[350:, 350:] = True

### 3. Multi-Criteria Analysis: Factors, Constraints, and Weights
Here, we define the core logic of our analysis.

In [None]:
# Step 1: Define Exclusionary Constraints
# We cannot build on forests, urban areas, water, or protected land.
land_use_exclusions = {1, 2, 4} # Corresponds to Forest, Urban, Water

exclusion_mask = np.isin(land_use, list(land_use_exclusions)) | protected_areas

# Step 2: Normalize Suitability Factors
# All factors are scaled from 0 to 1, where 1 is always the most suitable.
scaler = MinMaxScaler()

# Higher irradiance is better (no inversion needed)
norm_irradiance = scaler.fit_transform(irradiance.reshape(-1, 1)).reshape(MAP_SIZE, MAP_SIZE)

# Lower slope is better (invert the relationship)
norm_slope = 1 - scaler.fit_transform(slope.reshape(-1, 1)).reshape(MAP_SIZE, MAP_SIZE)

# Lower distance to grid is better (invert the relationship)
norm_grid = 1 - scaler.fit_transform(grid_proximity.reshape(-1, 1)).reshape(MAP_SIZE, MAP_SIZE)

# Step 3: Define Weights
# These represent the relative importance of each factor.
weights = {
    'irradiance': 0.4,
    'slope': 0.3,
    'grid': 0.3
}

### 4. Calculate Final Suitability Score
We combine the normalized factors using our weights and apply the exclusion mask to get the final suitability map.

In [None]:
suitability_score = (
    norm_irradiance * weights['irradiance'] + 
    norm_slope * weights['slope'] + 
    norm_grid * weights['grid']
)

# Apply the exclusion mask
final_suitability = np.copy(suitability_score)
final_suitability[exclusion_mask] = np.nan # Use NaN for excluded areas

### 5. AI for Site Identification: K-Means Clustering
We use K-Means to automatically identify the top 5 candidate sites from the most suitable regions.

In [None]:
# Find the locations of the top 5% most suitable pixels
threshold = np.nanpercentile(final_suitability, 95)
top_locations = np.argwhere(final_suitability >= threshold)

# Run K-Means to find 5 distinct clusters (candidate sites)
kmeans = KMeans(n_clusters=5, random_state=42, n_init=10)
kmeans.fit(top_locations)
cluster_centers = kmeans.cluster_centers_

### 6. Visualization
Let's visualize the input data, the exclusion mask, and the final suitability map with the identified candidate sites.

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18, 11), sharex=True, sharey=True)
fig.suptitle('Solar Farm Site Selection Analysis', fontsize=20)

# Input Layers
axes[0, 0].imshow(irradiance, cmap='viridis')
axes[0, 0].set_title('1. Solar Irradiance')
axes[0, 1].imshow(slope, cmap='plasma')
axes[0, 1].set_title('2. Terrain Slope')
axes[0, 2].imshow(grid_proximity, cmap='magma')
axes[0, 2].set_title('3. Grid Proximity')

# Land Use and Exclusions
axes[1, 0].imshow(land_use, cmap='tab10')
axes[1, 0].set_title('4. Land Use')
axes[1, 1].imshow(exclusion_mask, cmap='binary_r')
axes[1, 1].set_title('5. Exclusion Mask (White = Usable)')

# Final Map
im = axes[1, 2].imshow(final_suitability, cmap='YlGn', vmin=0, vmax=1)
axes[1, 2].scatter(cluster_centers[:, 1], cluster_centers[:, 0], c='red', marker='X', s=100, label='Top 5 Sites')
axes[1, 2].set_title('6. Final Suitability Map & Candidate Sites')
axes[1, 2].legend()

for ax in axes.flat:
    ax.set_xticks([])
    ax.set_yticks([])

fig.colorbar(im, ax=axes, orientation='horizontal', fraction=0.03, pad=0.04).set_label('Suitability Score (1 = Best)')
plt.tight_layout(rect=[0, 0.05, 1, 0.96])
plt.show()