# ANN Forecasting of Weekly Hydro Energy Generation in New Zealand

This notebook addresses:
- **RQ1**: How accurately can ANN forecast weekly hydro energy compared to SARIMA?
- **RQ2**: Do lagged climate features improve ANN forecasting performance?

Hydro generation data is merged with NASA climate variables and aggregated to weekly frequency.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-vintage')

## Load and Prepare Weekly Aggregated Hydro + Climate Dataset

In [None]:
# Load weekly dataset (assumes already prepared like SARIMA)
weekly_df = pd.read_csv('weekly_hydro_climate.csv', parse_dates=['DATE'], index_col='DATE')
weekly_df = weekly_df.dropna()
weekly_df.head()

## Feature Scaling and Train-Test Split

In [None]:
# Define features and target
X = weekly_df.drop(columns='GENERATION')
y = weekly_df['GENERATION']

# Scale features
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Split into train/test (last 10 weeks for test)
X_train, X_test = X_scaled[:-10], X_scaled[-10:]
y_train, y_test = y[:-10], y[-10:]

## ANN Model Training

In [None]:
# Build simple ANN
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

model.compile(optimizer='adam', loss='mse')
es = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)

# Train model
history = model.fit(X_train, y_train, validation_split=0.2, epochs=200, batch_size=8, callbacks=[es], verbose=0)

## Forecast Evaluation and RQ1 Analysis

In [None]:
# Predict and evaluate
y_pred = model.predict(X_test).flatten()
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100

print(f'ANN MAE: {mae:.2f}')
print(f'ANN RMSE: {rmse:.2f}')
print(f'ANN MAPE: {mape:.2f}%')

In [None]:
plt.figure(figsize=(12,4))
plt.plot(y_test.values, label='Actual')
plt.plot(y_pred, label='Predicted')
plt.title('ANN Prediction vs Actual (Weekly Hydro Generation)')
plt.legend()
plt.show()

### 🔍 Interpretation (RQ1)
The ANN model's accuracy metrics (MAPE, MAE, RMSE) are used to compare against SARIMA and SARIMAX models.
Lower errors here support ANN as a better forecasting tool under similar data conditions.