# Day 4, Session 2 · Student Notebook
### LSTM drought forecasting with Google Earth Engine NDVI

## How to use this notebook

1. Work through the parts in order. Cells marked **TODO** need your implementation.
2. Connect to Google Earth Engine (GEE) to fetch real Sentinel-2 NDVI. If GEE is offline, fall back to the cached CSV.
3. Check your answers against the instructor notebook after attempting each section.

## Learning objectives

- Retrieve NDVI time series for Mindanao provinces using Google Earth Engine.
- Convert the NDVI archive into supervised sliding-window data ready for LSTMs.
- Build and train a PyTorch LSTM with temporal validation.
- Evaluate forecasts and issue drought alerts using NDVI thresholds.

## Part 1 · Environment setup

In [None]:
# Optional: install the Earth Engine API if it is not available yet
# !pip install earthengine-api --quiet

Authenticate with Earth Engine. Run `ee.Authenticate()` only if you have not done so on this machine before.

In [None]:
import os
import ee

# TODO: replace 'your-ee-project-id' with the project that has Earth Engine enabled, or set EE_PROJECT env var.
GEE_PROJECT = os.environ.get('EE_PROJECT') or os.environ.get('GEE_PROJECT') or 'your-ee-project-id'

if GEE_PROJECT == 'your-ee-project-id':
    raise ValueError('Set GEE_PROJECT to your Earth Engine project ID before proceeding.')

try:
    ee.Initialize(project=GEE_PROJECT)
    print(f'Connected to Earth Engine project: {GEE_PROJECT}')
except Exception:
    print('Authentication required. Uncomment the line below after confirming GEE_PROJECT.')
    # ee.Authenticate(auth_mode='notebook', project=GEE_PROJECT)
    raise


## Part 2 · Core imports

In [None]:
import math
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader

plt.style.use('seaborn-v0_8-whitegrid')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

## Part 3 · Define Mindanao regions (GEE geometries)

In [None]:
REGION_GEOMETRIES = {
    "Bukidnon": ee.Geometry.Polygon([
        [124.36, 8.84],
        [124.36, 7.05],
        [125.63, 7.05],
        [125.63, 8.84],
        [124.36, 8.84]
    ]),
    "South Cotabato": ee.Geometry.Polygon([
        [124.28, 6.88],
        [124.28, 5.68],
        [125.30, 5.68],
        [125.30, 6.88],
        [124.28, 6.88]
    ])
}

REGION_GEOMETRIES

## Part 4 · TODO – build a helper that fetches monthly NDVI from GEE

In [None]:
def fetch_monthly_ndvi(regions, start_date='2018-01-01', end_date='2023-12-31', cloud_pct=35, scale=20):
    """Return a wide DataFrame with monthly NDVI columns (NDVI_<region>)."""
    # TODO: implement the following steps
    # 1. Build a monthly date range between start/end using pandas (freq='MS').
    # 2. Construct a Sentinel-2 SR image collection filtered by date and CLOUDY_PIXEL_PERCENTAGE <= cloud_pct.
    # 3. Apply Scene Classification (SCL) masking to drop clouds, shadows, snow/ice, then add an NDVI band.
    # 4. For each region/month, filter to [month_start, next_month_start) and reduce the masked composite to mean NDVI.
    # 5. Assemble the results into a tidy DataFrame, pivot to wide format, and fill small gaps via interpolation.
    raise NotImplementedError("Implement fetch_monthly_ndvi using Earth Engine.")


## Part 5 · Load NDVI data (live from GEE or cached sample)

In [None]:
CACHE_DIR = Path('day4/data')
CACHE_DIR.mkdir(exist_ok=True)
SAMPLE_PATH = CACHE_DIR / 'mindanao_ndvi_sample.csv'
LATEST_EXPORT_PATH = CACHE_DIR / 'mindanao_ndvi_gee.csv'

USE_GEE = True  # Keep True to pull live Sentinel-2 NDVI once your helper is implemented

try:
    if USE_GEE:
        ndvi_df = fetch_monthly_ndvi(
            REGION_GEOMETRIES,
            start_date='2018-01-01',
            end_date='2023-12-31',
            cloud_pct=35,
            scale=20
        )
        ndvi_df.to_csv(LATEST_EXPORT_PATH, index=False)
        print(f'Fetched {len(ndvi_df)} monthly observations from Earth Engine (project: {GEE_PROJECT}).')
    else:
        print('USE_GEE is False – reading cached sample export. Set back to True for live data.')
        ndvi_df = pd.read_csv(SAMPLE_PATH)
except NotImplementedError:
    raise
except Exception as exc:
    print(f'Earth Engine fetch failed ({exc}). Falling back to cached sample CSV...')
    ndvi_df = pd.read_csv(SAMPLE_PATH)

ndvi_df['month'] = pd.to_datetime(ndvi_df['month'])
ndvi_df.head()


## Part 6 · TODO – explore NDVI trends

Tasks:
- Plot NDVI series for each province with a 0.40 drought threshold line.
- Comment on any obvious drought periods.

In [None]:
# TODO: visualise NDVI series per province and highlight the drought threshold.
raise NotImplementedError("Plot NDVI time series with drought threshold annotations.")


## Part 7 · TODO – seasonal statistics

1. Compute mean NDVI for wet vs dry seasons per province.
2. Identify the lowest NDVI value and month per province.

In [None]:
# TODO: build a summary DataFrame with dry_mean, wet_mean, lowest_ndvi, lowest_month.
raise NotImplementedError("Create seasonal summary statistics for each province.")


## Part 8 · Prepare sliding-window sequences

In [None]:
LOOKBACK = 12  # months of history
HORIZON = 1    # predict 1 month ahead

Create the supervised dataset:
- For each province, collect 12 months of NDVI history.
- Predict the NDVI of the following month.
- Store location, target_month, sequence (np.array), and target NDVI.

In [None]:
# TODO: populate sequence_df with location, target_month, sequence, target columns.
raise NotImplementedError("Create sequence_df using LOOKBACK/HORIZON configuration.")


## Part 9 · TODO – temporal train/validation/test split

In [None]:
train_end = pd.Timestamp('2021-12-31')
val_end = pd.Timestamp('2022-12-31')

# TODO: define boolean masks train_mask, val_mask, test_mask based on target_month and print sample counts.
raise NotImplementedError("Create temporal masks for train/val/test splits.")


## Part 10 · TODO – scaling using the training period only

In [None]:
# TODO: compute ndvi_min and ndvi_max from the training period and define scale()/invert() helpers.
raise NotImplementedError("Derive scaling functions from the training NDVI range.")


In [None]:
# TODO: implement a helper similar to stack_split(mask) that returns scaled features/targets and metadata.
raise NotImplementedError("Transform raw sequences into scaled arrays for train/val/test splits.")


## Part 11 · TODO – create PyTorch datasets and data loaders

In [None]:
# TODO: implement a SequenceDataset class and build DataLoader objects for train/val splits.
raise NotImplementedError("Wrap the scaled arrays into PyTorch Dataset/DataLoader objects.")


## Part 12 · TODO – define the LSTM model

In [None]:
# TODO: create an NDVIForecaster nn.Module (2 LSTM layers + dense head) and choose loss/optimizer.
raise NotImplementedError("Instantiate the PyTorch LSTM model, loss, and optimizer.")


## Part 13 · TODO – training loop

In [None]:
# TODO: implement the training loop (e.g., using a helper run_epoch). Capture train/val loss & MAE history.
raise NotImplementedError("Train the model and store the learning history.")


## Part 14 · TODO – plot learning curves

In [None]:
# TODO: visualise train vs validation MSE/MAE using the recorded history.
raise NotImplementedError("Plot train/validation metrics across epochs.")


## Part 15 · TODO – evaluate on held-out months

In [None]:
# TODO: generate predictions on the test set, invert scaling, and assemble a results DataFrame.
raise NotImplementedError("Evaluate the model on held-out months and prepare results_df.")


In [None]:
# TODO: compute MAE and RMSE on the test period.
raise NotImplementedError("Report MAE and RMSE for the test set.")


In [None]:
# TODO: plot actual vs predicted NDVI for each province with the drought threshold line.
raise NotImplementedError("Visualise forecasts alongside actual NDVI trajectories.")


## Part 16 · TODO – drought alert metrics

Implement a binary drought alert (NDVI < 0.40) and report precision/recall.

In [None]:
# TODO: derive drought alerts from your predictions and compute confusion matrix + precision/recall.
raise NotImplementedError("Translate NDVI forecasts into drought alert metrics.")


## Reflection

- What challenges did you face when moving from synthetic to Earth Engine data?
- How sensitive was the model to the scaling range?
- What additional covariates would you add next (rainfall, ONI, soil moisture)?