# Forecasting Weekly Hydro Energy Generation in New Zealand
**Modeling with SARIMA and SARIMAX**

This notebook addresses:
- **RQ1**: How well does a SARIMA model forecast weekly hydro energy generation?
- **RQ2**: Does incorporating weekly-aggregated climate variables into SARIMAX improve forecasting accuracy?

The analysis uses weekly-summed and weekly-averaged NASA climate data, aligned with operational hydro planning cycles.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-vintage')

## 📆 Weekly Aggregation of Hydro Generation and Climate Data
For both RQ1 and RQ2, we now aggregate all data to a **weekly** level. This reduces daily noise and aligns with common operational planning cycles.

In [None]:
# Re-aggregate hydro generation data to weekly frequency
hydro_weekly = hydro_df[['GENERATION']].resample('W').sum()
hydro_weekly.plot(title='Weekly Hydro Energy Generation', figsize=(12,4))
plt.ylabel('MWh/week')
plt.show()

In [None]:
# Weekly mean for average-based variables
t2m_weekly = t2m.resample('W').mean()
ps_weekly = ps.resample('W').mean()
ws_weekly = ws.resample('W').mean()
rh2m_weekly = rh2m.resample('W').mean()

# Weekly sum for precipitation and evapotranspiration
precip_weekly = precip.resample('W').sum()
evland_weekly = evland.resample('W').sum()

# Merge all climate features with weekly hydro
weekly_climate = hydro_weekly.copy()
weekly_climate['T2M'] = t2m_weekly['T2M']
weekly_climate['PS'] = ps_weekly['PS']
weekly_climate['WS50M'] = ws_weekly['WS50M']
weekly_climate['RH2M'] = rh2m_weekly['RH2M']
weekly_climate['PRECTOTCORR'] = precip_weekly['PRECTOTCORR']
weekly_climate['EVLAND'] = evland_weekly['EVLAND']

# Drop NA values
weekly_climate = weekly_climate.dropna()
weekly_climate.head()

## 📊 RQ1: SARIMA Model on Weekly Hydro Generation (No Exogenous Variables)
This serves as the baseline univariate time series model for evaluating forecasting accuracy.

In [None]:
# Train-test split
train_wk = weekly_climate['GENERATION'][:-10]
test_wk = weekly_climate['GENERATION'][-10:]

# SARIMA (weekly data, no exogenous vars)
sarima_model_wk = SARIMAX(train_wk, order=(1,1,1), seasonal_order=(1,1,1,52), 
                          enforce_stationarity=False, enforce_invertibility=False)
sarima_result_wk = sarima_model_wk.fit(disp=False)

# Forecast
sarima_forecast_wk = sarima_result_wk.get_forecast(steps=10)
sarima_pred_wk = sarima_forecast_wk.predicted_mean

# Evaluation
mae_wk = mean_absolute_error(test_wk, sarima_pred_wk)
rmse_wk = np.sqrt(mean_squared_error(test_wk, sarima_pred_wk))
mape_wk = np.mean(np.abs((test_wk - sarima_pred_wk) / test_wk)) * 100

print(f'SARIMA (Weekly) MAE: {mae_wk:.2f}')
print(f'SARIMA (Weekly) RMSE: {rmse_wk:.2f}')
print(f'SARIMA (Weekly) MAPE: {mape_wk:.2f}%')

## 🌦️ RQ2: SARIMAX Model with Weekly Climate Features
This evaluates whether integrating weekly climate variables improves forecasting performance.

In [None]:
# Weekly SARIMAX with climate features
exog_train_wk = weekly_climate.drop(columns='GENERATION')[:-10]
exog_test_wk = weekly_climate.drop(columns='GENERATION')[-10:]

sarimax_model_wk = SARIMAX(train_wk, exog=exog_train_wk, 
                           order=(1,1,1), seasonal_order=(1,1,1,52), 
                           enforce_stationarity=False, enforce_invertibility=False)
sarimax_result_wk = sarimax_model_wk.fit(disp=False)

# Forecast
sarimax_forecast_wk = sarimax_result_wk.get_forecast(steps=10, exog=exog_test_wk)
sarimax_pred_wk = sarimax_forecast_wk.predicted_mean

# Evaluation
mae_sx_wk = mean_absolute_error(test_wk, sarimax_pred_wk)
rmse_sx_wk = np.sqrt(mean_squared_error(test_wk, sarimax_pred_wk))
mape_sx_wk = np.mean(np.abs((test_wk - sarimax_pred_wk) / test_wk)) * 100

print(f'SARIMAX (Weekly) MAE: {mae_sx_wk:.2f}')
print(f'SARIMAX (Weekly) RMSE: {rmse_sx_wk:.2f}')
print(f'SARIMAX (Weekly) MAPE: {mape_sx_wk:.2f}%')

### 📌 Summary: Weekly SARIMA vs SARIMAX
By working with **weekly-aggregated data**, we address both research questions more robustly:
- **RQ1**: SARIMA captures autoregressive patterns but lacks meteorological context.
- **RQ2**: SARIMAX improves forecasting accuracy by integrating exogenous climate variables.

These results will be compared with ANN-based forecasts in the next modeling phase.