# SARIMA Modeling for Hydro Energy Forecasting
**Objective**: Evaluate SARIMA model performance for forecasting hydro energy generation in New Zealand.

This notebook contributes to **RQ1**: _Which model (SARIMA or ANN) provides the most accurate forecast for renewable energy generation in New Zealand?_

We focus on univariate SARIMA modeling using historical hydro generation data.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-vintage')

## Load Hydro Generation Data

In [None]:
# Load dataset
hydro_df = pd.read_csv('hydro_data.csv', parse_dates=['DATE'])
hydro_df = hydro_df.sort_values('DATE')
hydro_df.set_index('DATE', inplace=True)
hydro_df = hydro_df.asfreq('D')  # Ensure daily frequency

# Preview
hydro_df['GENERATION'].plot(title='Daily Hydro Energy Generation', figsize=(12,4))
plt.ylabel('MWh')
plt.show()

## Stationarity Check using Augmented Dickey-Fuller Test

In [None]:
result = adfuller(hydro_df['GENERATION'].dropna())
print(f'ADF Statistic: {result[0]:.4f}')
print(f'p-value: {result[1]:.4f}')
if result[1] < 0.05:
    print('✅ Series is stationary')
else:
    print('⚠️ Series is non-stationary — differencing required')