# Forecasting Weekly Hydro Energy Generation in New Zealand
**Modeling with SARIMA and SARIMAX**

This notebook addresses:
- **RQ1**: How well does a SARIMA model forecast weekly hydro energy generation?
- **RQ2**: Does incorporating weekly-aggregated climate variables into SARIMAX improve forecasting accuracy?

The analysis uses weekly-summed and weekly-averaged NASA climate data, aligned with operational hydro planning cycles.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-vintage')

## 📆 Weekly Aggregation of Hydro Generation and Climate Data
For both RQ1 and RQ2, we now aggregate all data to a **weekly** level. This reduces daily noise and aligns with common operational planning cycles.

In [None]:
# Re-aggregate hydro generation data to weekly frequency
hydro_weekly = hydro_df[['GENERATION']].resample('W').sum()
hydro_weekly.plot(title='Weekly Hydro Energy Generation', figsize=(12,4))
plt.ylabel('MWh/week')
plt.show()

In [None]:
# Weekly mean for average-based variables
t2m_weekly = t2m.resample('W').mean()
ps_weekly = ps.resample('W').mean()
ws_weekly = ws.resample('W').mean()
rh2m_weekly = rh2m.resample('W').mean()

# Weekly sum for precipitation and evapotranspiration
precip_weekly = precip.resample('W').sum()
evland_weekly = evland.resample('W').sum()

# Merge all climate features with weekly hydro
weekly_climate = hydro_weekly.copy()
weekly_climate['T2M'] = t2m_weekly['T2M']
weekly_climate['PS'] = ps_weekly['PS']
weekly_climate['WS50M'] = ws_weekly['WS50M']
weekly_climate['RH2M'] = rh2m_weekly['RH2M']
weekly_climate['PRECTOTCORR'] = precip_weekly['PRECTOTCORR']
weekly_climate['EVLAND'] = evland_weekly['EVLAND']

# Drop NA values
weekly_climate = weekly_climate.dropna()
weekly_climate.head()

## 📊 RQ1: SARIMA Model on Weekly Hydro Generation (No Exogenous Variables)
This serves as the baseline univariate time series model for evaluating forecasting accuracy.

In [None]:
# Updated 80/20 train-test split
split_index = int(len(weekly_climate) * 0.8)
train_wk = weekly_climate['GENERATION'][:split_index]
test_wk = weekly_climate['GENERATION'][split_index:]

## 🔗 Correlation Analysis Between Climate Features and Hydro Generation
To support **RQ2**, we analyze how strongly each weekly climate variable correlates with hydro generation.
This helps assess the relevance of including these features in the SARIMAX model.

In [None]:
# Compute correlation matrix
correlation_matrix = weekly_climate.corr()

# Plot heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Pearson Correlation (Weekly Aggregated)')
plt.show()

### 🧠 Interpretation
- Positive correlations indicate that higher values of a feature tend to coincide with higher hydro generation.
- Features like **PRECTOTCORR** and **EVLAND** (linked to rainfall and water availability) are typically expected to show strong positive relationships.
- This analysis provides empirical support for their inclusion in the SARIMAX model to address **RQ2**.

## 🌦️ RQ2: SARIMAX Model with Weekly Climate Features
This evaluates whether integrating weekly climate variables improves forecasting performance.

### 🧾 Climate Feature Descriptions (Exogenous Variables)

| Feature         | Description                                       | Aggregation   |
|----------------|---------------------------------------------------|---------------|
| `T2M`           | Temperature at 2 meters (°C)                      | Weekly **average** |
| `PS`            | Surface air pressure (Pa)                         | Weekly **average** |
| `WS50M`         | Wind speed at 50 meters (m/s)                     | Weekly **average** |
| `RH2M`          | Relative humidity at 2 meters (%)                 | Weekly **average** |
| `PRECTOTCORR`   | Precipitation (corrected) (mm)                    | Weekly **sum**     |
| `EVLAND`        | Land surface evapotranspiration (mm)              | Weekly **sum**     |

These features are selected based on their known influence on hydro energy generation, as supported by climate and energy forecasting literature.

In [None]:
# Exogenous train-test split for SARIMAX
exog_train_wk = weekly_climate.drop(columns='GENERATION')[:split_index]
exog_test_wk = weekly_climate.drop(columns='GENERATION')[split_index:]

### 📌 Summary: Weekly SARIMA vs SARIMAX
By working with **weekly-aggregated data**, we address both research questions more robustly:
- **RQ1**: SARIMA captures autoregressive patterns but lacks meteorological context.
- **RQ2**: SARIMAX improves forecasting accuracy by integrating exogenous climate variables.

These results will be compared with ANN-based forecasts in the next modeling phase.