# AI-Powered Viability Analysis for Windmill Sites

**Copyright (c) 2026 Shrikara Kaudambady. All rights reserved.**

This notebook uses time-series forecasting to predict the viability of potential windmill installation sites. We will analyze synthetic wind data for several candidate locations, forecast their future energy production, and rank them based on key performance metrics like Annual Energy Production (AEP) and Capacity Factor.

### 1. Setup and Library Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

sns.set_theme(style="whitegrid", palette="viridis")

### 2. Simulation of Site Wind Data and Turbine Power Curve

First, we simulate hourly wind speed data for 5 candidate sites over 3 years. Each site has a different wind profile. Then, we define a function to model a generic wind turbine's power output.

In [None]:
np.random.seed(42)

# Create a 3-year hourly time series
time_index = pd.date_range(start='2023-01-01', end='2025-12-31 23:00:00', freq='h')
num_hours = len(time_index)
df = pd.DataFrame(index=time_index)

# Simulate wind speed for 5 sites with different characteristics
# Base seasonal variation
seasonal_variation = 10 * (1 - np.cos(2 * np.pi * df.index.dayofyear / 365.25))
df['site_1'] = seasonal_variation * 0.5 + np.random.normal(0, 2, num_hours) + 8 # Low but steady
df['site_2'] = seasonal_variation * 1.2 + np.random.normal(0, 4, num_hours) + 5 # High but volatile
df['site_3'] = seasonal_variation * 0.9 + np.random.normal(0, 1, num_hours) + 7 # Very consistent
df['site_4'] = seasonal_variation * 0.2 + np.random.normal(0, 5, num_hours) + 6 # Erratic
df['site_5'] = seasonal_variation * 1.0 + np.random.normal(0, 3, num_hours) + 6 # Good average
df = df.clip(lower=0) # Wind speed can't be negative

# Define a realistic power curve for a generic 2.5 MW turbine
def calculate_power(wind_speed, cut_in_speed=3.5, rated_speed=12, cut_out_speed=25, rated_power_kw=2500):
    power = np.zeros_like(wind_speed)
    # Power production between cut-in and rated speed (cubic relationship)
    in_range = (wind_speed >= cut_in_speed) & (wind_speed < rated_speed)
    power[in_range] = rated_power_kw * ((wind_speed[in_range] - cut_in_speed) / (rated_speed - cut_in_speed)) ** 3
    # Rated power between rated and cut-out speed
    at_rated = (wind_speed >= rated_speed) & (wind_speed < cut_out_speed)
    power[at_rated] = rated_power_kw
    return power

# Calculate power output for each site
for i in range(1, 6):
    df[f'site_{i}_power_kw'] = calculate_power(df[f'site_{i}'].values)

print("Simulated Wind Speed (m/s) and Power Output (kW):")
df.head()

### 3. Feature Engineering and Model Training

We will train a separate Gradient Boosting model for each site to learn its unique wind patterns and predict future power output.

In [None]:
models = {}

# Create time-based features
df['hour'] = df.index.hour
df['month'] = df.index.month
df['dayofweek'] = df.index.dayofweek

features = ['hour', 'month', 'dayofweek']

for i in range(1, 6):
    print(f"--- Training Model for Site {i} ---")
    site_df = df[[f'site_{i}_power_kw'] + features].copy()
    
    X = site_df[features]
    y = site_df[f'site_{i}_power_kw']
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False) # Time-series split
    
    model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=5, random_state=42)
    model.fit(X_train, y_train)
    
    # Evaluate model
    y_pred = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    print(f"Model for Site {i} trained. Test RMSE: {rmse:.2f} kW\n")
    models[f'site_{i}'] = model

### 4. Forecasting and Viability Analysis

Now we use our trained models to forecast the energy production for the next year and calculate key viability metrics.

In [None]:
forecast_horizon_days = 365
future_index = pd.date_range(start='2026-01-01', periods=forecast_horizon_days * 24, freq='h')
forecast_df = pd.DataFrame(index=future_index)

forecast_df['hour'] = forecast_df.index.hour
forecast_df['month'] = forecast_df.index.month
forecast_df['dayofweek'] = forecast_df.index.dayofweek

results = []

for i in range(1, 6):
    model = models[f'site_{i}']
    forecast_power = model.predict(forecast_df[features])
    forecast_df[f'site_{i}_forecast_kw'] = forecast_power.clip(lower=0)
    
    # Calculate metrics
    annual_energy_gwh = forecast_df[f'site_{i}_forecast_kw'].sum() / 1_000_000 # kWh to GWh
    rated_power_kw = 2500
    max_possible_gwh = (rated_power_kw * 24 * forecast_horizon_days) / 1_000_000
    capacity_factor = (annual_energy_gwh / max_possible_gwh) * 100
    
    results.append({
        'Site': f'Site {i}',
        'Annual Energy (GWh)': annual_energy_gwh,
        'Capacity Factor (%)': capacity_factor
    })

results_df = pd.DataFrame(results).sort_values(by='Annual Energy (GWh)', ascending=False).reset_index(drop=True)
print("Forecasted Viability Ranking:")
results_df

### 5. Comparative Analysis and Visualization

Finally, we visualize the results to make a clear, data-driven recommendation.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.suptitle('Windmill Site Viability Comparison', fontsize=16)

# Bar chart for Annual Energy Production
sns.barplot(x='Annual Energy (GWh)', y='Site', data=results_df, ax=axes[0], orient='h')
axes[0].set_title('Forecasted Annual Energy Production (AEP)')

# Bar chart for Capacity Factor
sns.barplot(x='Capacity Factor (%)', y='Site', data=results_df, ax=axes[1], orient='h')
axes[1].set_title('Forecasted Capacity Factor')

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

# Plotting the power profile of the best and worst sites for a week
best_site = results_df.loc[0, 'Site'].replace(' ', '_').lower()
worst_site = results_df.loc[4, 'Site'].replace(' ', '_').lower()

plt.figure(figsize=(16, 6))
forecast_df[f'{best_site}_forecast_kw'].head(24*7).plot(label=f'Best: {best_site.replace("_", " ").title()}', legend=True)
forecast_df[f'{worst_site}_forecast_kw'].head(24*7).plot(label=f'Worst: {worst_site.replace("_", " ").title()}', legend=True, linestyle='--')
plt.title('Forecasted Power Output for One Week (Best vs. Worst Site)')
plt.ylabel('Power (kW)')
plt.xlabel('Date')
plt.show()

print(f"\nRecommendation: {results_df.loc[0, 'Site']} is the most promising location due to its high forecasted energy production and superior capacity factor.")