# 🌬️ Wind Energy Forecasting using SARIMA and SARIMAX (NZ)
This notebook models weekly wind energy generation in New Zealand using SARIMA and SARIMAX approaches.
We include meteorological features and lagged effects to answer:
1. **RQ1**: Which model gives more accurate forecasts?
2. **RQ2**: Do lagged climate variables improve accuracy?

All analyses use **weekly-aggregated data** for **both South and North Islands**.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load weekly-aggregated wind generation and climate data (merged by Island)
weekly_df = pd.read_csv('merged_weekly_wind_climate.csv', parse_dates=['Date'], index_col='Date')

# Filter for total wind generation across NZ (both islands)
weekly_df = weekly_df.groupby('Date').sum()

# Rename generation column if needed
weekly_df.rename(columns={"GENERATION": "GENERATION"}, inplace=True)
weekly_df.head()

## 🔍 Correlation and Lagged Climate Features
We evaluate the correlation of wind generation with climate variables (WS50M, T2M, PS, RH2M, PRECTOTCORR), and generate 1-week lagged versions for SARIMAX.

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(weekly_df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Between Weekly Wind Generation and Climate Variables')
plt.show()

In [None]:
# Create 1-week lagged climate features (excluding wind direction)
weekly_df['T2M_lag1'] = weekly_df['T2M'].shift(1)
weekly_df['PS_lag1'] = weekly_df['PS'].shift(1)
weekly_df['WS50M_lag1'] = weekly_df['WS50M'].shift(1)
weekly_df['RH2M_lag1'] = weekly_df['RH2M'].shift(1)
weekly_df['PRECTOTCORR_lag1'] = weekly_df['PRECTOTCORR'].shift(1)

# Drop NA values caused by shifting
weekly_df.dropna(inplace=True)

## 📈 SARIMA Model (Univariate - RQ1)
This model uses only past wind generation data to forecast future values. It serves as the baseline model for comparison.

In [None]:
# 80/20 split
split_index = int(len(weekly_df) * 0.8)
train_sarima = weekly_df['GENERATION'].iloc[:split_index]
test_sarima = weekly_df['GENERATION'].iloc[split_index:]

# Fit SARIMA (manual order here, should be selected via AIC grid search ideally)
sarima_model = SARIMAX(train_sarima, order=(1,1,1), seasonal_order=(1,1,1,52))
sarima_result = sarima_model.fit(disp=False)

# Forecast
forecast_sarima = sarima_result.forecast(steps=len(test_sarima))

# Evaluation
mape = np.mean(np.abs((test_sarima - forecast_sarima) / test_sarima)) * 100
mae = mean_absolute_error(test_sarima, forecast_sarima)
rmse = np.sqrt(mean_squared_error(test_sarima, forecast_sarima))
print(f"SARIMA - MAPE: {mape:.2f}%, MAE: {mae:.2f}, RMSE: {rmse:.2f}")

## 🌦️ SARIMAX Model with Lagged Climate Features (RQ2)
This model includes lagged meteorological features to evaluate their influence on wind forecasting accuracy.

In [None]:
# Define endogenous and exogenous variables
endog = weekly_df['GENERATION']
exog = weekly_df[['T2M_lag1', 'PS_lag1', 'WS50M_lag1', 'RH2M_lag1', 'PRECTOTCORR_lag1']]

# Split
split_index = int(len(endog) * 0.8)
train_endog, test_endog = endog[:split_index], endog[split_index:]
train_exog, test_exog = exog[:split_index], exog[split_index:]

# Fit SARIMAX
sarimax_model = SARIMAX(train_endog, exog=train_exog, order=(1,1,1), seasonal_order=(1,1,1,52))
sarimax_result = sarimax_model.fit(disp=False)

# Forecast
forecast_sarimax = sarimax_result.forecast(steps=len(test_endog), exog=test_exog)

# Evaluation
mape_sx = np.mean(np.abs((test_endog - forecast_sarimax) / test_endog)) * 100
mae_sx = mean_absolute_error(test_endog, forecast_sarimax)
rmse_sx = np.sqrt(mean_squared_error(test_endog, forecast_sarimax))
print(f"SARIMAX (Lagged Climate) - MAPE: {mape_sx:.2f}%, MAE: {mae_sx:.2f}, RMSE: {rmse_sx:.2f}")