# 🌬️ Wind Forecasting by Island: SARIMA & SARIMAX (NZ)
This notebook forecasts weekly wind generation separately for **South Island** and **North Island** using SARIMA and SARIMAX models.
Both RQ1 and RQ2 are addressed using climate features (excluding wind direction).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load weekly wind and climate data by island
df = pd.read_csv('merged_weekly_wind_climate_by_island.csv', parse_dates=['Date'], index_col='Date')

# Drop wind direction if present
df = df.drop(columns=[col for col in df.columns if 'WD50M' in col], errors='ignore')

# Create South Island dataset
south = df[[col for col in df.columns if 'SOUTH' in col or 'Date' in col]].copy()
south.rename(columns=lambda x: x.replace('_SOUTH', ''), inplace=True)

# Create North Island dataset
north = df[[col for col in df.columns if 'NORTH' in col or 'Date' in col]].copy()
north.rename(columns=lambda x: x.replace('_NORTH', ''), inplace=True)

print('South Island features:', south.columns.tolist())
print('North Island features:', north.columns.tolist())

## 🧭 South Island Forecasting
### SARIMA (RQ1) + SARIMAX with Lagged Climate Features (RQ2)

## 🔍 Correlation Analysis
To understand which climate features influence wind generation, we compute correlation matrices for each island.

In [None]:
# Correlation matrix for South Island
plt.figure(figsize=(8, 5))
sns.heatmap(south.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('South Island: Correlation with Wind Generation')
plt.show()

In [None]:
# Correlation matrix for North Island
plt.figure(figsize=(8, 5))
sns.heatmap(north.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('North Island: Correlation with Wind Generation')
plt.show()

## ✅ Feature Selection Based on Correlation
We select only those lagged climate features that have a correlation coefficient with wind generation greater than **|0.3|**.
This step reduces noise and focuses on more relevant predictors.

In [None]:
# Compute correlation with generation for South
corr_south = south.corr()['GENERATION'].drop('GENERATION')
selected_features_south = corr_south[abs(corr_south) > 0.3].index.tolist()
print('Selected features for SARIMAX (South):', selected_features_south)

In [None]:
# SARIMAX South with filtered features
exog = south[selected_features_south]
train_exog = exog[:split_s]
test_exog = exog[split_s:]
sarimax_model = SARIMAX(train_endog, exog=train_exog, order=(1,1,1), seasonal_order=(1,1,1,52))
sarimax_res = sarimax_model.fit(disp=False)
pred_sarimax = sarimax_res.forecast(steps=len(test_endog), exog=test_exog)
mape_sarimax = np.mean(np.abs((test_endog - pred_sarimax) / test_endog)) * 100
print(f"South Island SARIMAX (Filtered Features) MAPE: {mape_sarimax:.2f}%")

## 🧭 North Island Forecasting
### SARIMA (RQ1) + SARIMAX with Lagged Climate Features (RQ2)

In [None]:
# Compute correlation with generation for North
corr_north = north.corr()['GENERATION'].drop('GENERATION')
selected_features_north = corr_north[abs(corr_north) > 0.3].index.tolist()
print('Selected features for SARIMAX (North):', selected_features_north)

In [None]:
# SARIMAX North with filtered features
exog_n = north[selected_features_north]
train_exog_n = exog_n[:split_n]
test_exog_n = exog_n[split_n:]
sarimax_model_n = SARIMAX(train_endog_n, exog=train_exog_n, order=(1,1,1), seasonal_order=(1,1,1,52))
sarimax_res_n = sarimax_model_n.fit(disp=False)
pred_sarimax_n = sarimax_res_n.forecast(steps=len(test_endog_n), exog=test_exog_n)
mape_sarimax_n = np.mean(np.abs((test_endog_n - pred_sarimax_n) / test_endog_n)) * 100
print(f"North Island SARIMAX (Filtered Features) MAPE: {mape_sarimax_n:.2f}%")

## 📊 Visualizing Selected Features vs. Wind Generation
These plots help interpret the relationship between selected climate features and weekly wind generation.

In [None]:
# Plot selected features vs. wind generation (South)
for col in selected_features_south:
    plt.figure(figsize=(6, 4))
    plt.scatter(south[col], south['GENERATION'], alpha=0.6)
    plt.xlabel(col)
    plt.ylabel('Wind Generation')
    plt.title(f'South Island: {col} vs. Generation')
    plt.grid(True)
    plt.tight_layout()
    plt.show()

In [None]:
# Plot selected features vs. wind generation (North)
for col in selected_features_north:
    plt.figure(figsize=(6, 4))
    plt.scatter(north[col], north['GENERATION'], alpha=0.6)
    plt.xlabel(col)
    plt.ylabel('Wind Generation')
    plt.title(f'North Island: {col} vs. Generation')
    plt.grid(True)
    plt.tight_layout()
    plt.show()