# Time series sequential model (LSTM): classification and regression

This notebook provides a clean, end-to-end workflow for training **LSTM** models on county-level annual hydroclimate features and predicting **damage outcomes**.

We include two modeling strategies:

1. **Single-stage regression** (baseline): predict `log1p(damage)` directly.
2. **Two-part (hurdle) model** (recommended for zero-inflated damages):
   - **Classifier**: predict whether damage occurs (`damage > 0`).
   - **Regressor**: predict magnitude conditional on damage occurring (trained on positive-damage years only).
   - Final prediction: \(\hat D = P(D>0)\times E[D\mid D>0]\)

**Inputs (CSV):**
- `flood_model_table_clean_1996_2023.csv`
- `drought_model_table_clean_1996_2023.csv`
- `climate_for_LSTM_1996_2059.csv` (for future projections)

> **Note:** Paths are configured in the **Configuration** cell. Update `BASE` to your local directory.


In [None]:
# -------------------------
# Configuration
# -------------------------
import os

# Update to your local folder
BASE = r"C:\Users\adi10136\OneDrive - Iowa State University\CMIP6_Data\CMIP6_NEW"

FLOOD_HIST_CSV   = os.path.join(BASE, "flood_model_table_clean_1996_2023.csv")
DROUGHT_HIST_CSV = os.path.join(BASE, "drought_model_table_clean_1996_2023.csv")
CLIMATE_FUTURE_CSV = os.path.join(BASE, "climate_for_LSTM_1996_2059.csv")

SEQ_LEN = 5  # sequence/window length (years); must match training & prediction
RANDOM_SEED = 42

In [None]:
# -------------------------
# Imports & reproducibility
# -------------------------
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow.keras import layers, models

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from IPython.display import display

# Optional for plots/diagnostics
import matplotlib.pyplot as plt

tf.keras.utils.set_random_seed(RANDOM_SEED)

In [None]:
# -------------------------
# Load historical tables
# -------------------------
flood_df = pd.read_csv(FLOOD_HIST_CSV)
drought_df = pd.read_csv(DROUGHT_HIST_CSV)

display(flood_df.head())

## Data format

Each historical table is expected to include (at minimum) the following columns:

- `county`
- `year`
- `precip_ann` (annual precipitation)
- `runoff_ann` (annual runoff)
- `damage_property`
- `damage_crops`

The workflow builds sliding windows of length `SEQ_LEN` for each county, using `[precip_ann, runoff_ann]` as the model inputs.


In [None]:
def percent_error_and_accuracy(y_true, y_pred):
    """
    Percent error on TOTAL:
      PE = 100 * (pred_sum - obs_sum) / obs_sum
    Accuracy = 100 - |PE|
    """
    obs_sum = float(np.sum(y_true))
    pred_sum = float(np.sum(y_pred))
    if obs_sum == 0:
        return np.nan, np.nan
    pe = 100.0 * (pred_sum - obs_sum) / obs_sum
    acc = 100.0 - abs(pe)
    return pe, acc

In [None]:
def build_sequences_from_table(df, seq_len=5):
    """
    df columns:
      county, year, precip_ann, runoff_ann, damage_property, damage_crops

    Returns:
      X       : (N_samples, seq_len, 2)   [precip, runoff]
      y_prop  : (N_samples,)              property damage
      y_crops : (N_samples,)              crops damage
      meta    : DataFrame with ['county', 'year'] per sample (target year)
    """
    df = df.sort_values(["county", "year"]).copy()

    df["precip_ann"] = pd.to_numeric(df["precip_ann"], errors="coerce")
    df["runoff_ann"] = pd.to_numeric(df["runoff_ann"], errors="coerce")
    df["damage_property"] = pd.to_numeric(df["damage_property"], errors="coerce").fillna(0.0)
    df["damage_crops"]    = pd.to_numeric(df["damage_crops"], errors="coerce").fillna(0.0)

    X_list = []
    y_p_list = []
    y_c_list = []
    meta_rows = []

    for county, g in df.groupby("county"):
        g = g.sort_values("year").reset_index(drop=True)

        if len(g) < seq_len:
            continue

        for end_idx in range(seq_len - 1, len(g)):
            start_idx = end_idx - seq_len + 1
            window = g.iloc[start_idx:end_idx+1]

            if window["precip_ann"].isna().any() or window["runoff_ann"].isna().any():
                continue

            X_win = window[["precip_ann", "runoff_ann"]].values.astype(float)
            y_prop = float(g.iloc[end_idx]["damage_property"])
            y_crops = float(g.iloc[end_idx]["damage_crops"])
            year_t = int(g.iloc[end_idx]["year"])

            X_list.append(X_win)
            y_p_list.append(y_prop)
            y_c_list.append(y_crops)
            meta_rows.append({"county": county, "year": year_t})

    if not X_list:
        raise ValueError("No sequences constructed. Check seq_len and data.")

    X = np.stack(X_list, axis=0)
    y_p = np.array(y_p_list, dtype=float)
    y_c = np.array(y_c_list, dtype=float)
    meta = pd.DataFrame(meta_rows)

    return X, y_p, y_c, meta

In [None]:
# -------------------------
# Build sequences for flood / drought
# -------------------------
X_flood, y_p_flood, y_c_flood, meta_flood = build_sequences_from_table(flood_df, seq_len=SEQ_LEN)
X_drought, y_p_drought, y_c_drought, meta_drought = build_sequences_from_table(drought_df, seq_len=SEQ_LEN)

print("FLOOD sequences:", X_flood.shape, "| target years:", meta_flood["year"].min(), "-", meta_flood["year"].max())
print("DROUGHT sequences:", X_drought.shape, "| target years:", meta_drought["year"].min(), "-", meta_drought["year"].max())

## Baseline: single-stage regression LSTM

We predict `log1p(damage)` to stabilize variance and reduce sensitivity to outliers:

- Train on all samples (including zeros).
- Back-transform with `expm1()` for interpretation in dollars.

The metric below reports **percent error on the total** across the test split:
\[
PE = 100\times\frac{\sum\hat y - \sum y}{\sum y}
\]
and **Accuracy** is defined as \(100-|PE|\).


In [None]:
def build_lstm_model(seq_len, n_features=2):
    """
    LSTM for regression on log1p(damage).
    Input:  (seq_len, n_features)
    Output: scalar (log1p(damage))
    """
    inputs = layers.Input(shape=(seq_len, n_features))
    x = layers.LSTM(32, return_sequences=False)(inputs)
    x = layers.Dense(32, activation="relu")(x)
    x = layers.Dense(16, activation="relu")(x)
    outputs = layers.Dense(1, activation="linear")(x)

    model = models.Model(inputs=inputs, outputs=outputs)
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
        loss="mse"
    )
    return model


def train_lstm_for_target(X, y_raw,
                          test_size=0.2,
                          n_epochs=50,
                          batch_size=32,
                          random_state=42,
                          verbose=1):
    """
    Train LSTM for ONE target (property or crops).

    X: (N, seq_len, 2)
    y_raw: (N,) damage in dollars
    """
    y_log = np.log1p(y_raw)

    # scale features
    N, T, F = X.shape
    X_flat = X.reshape(N * T, F)

    scaler = StandardScaler()
    X_flat_scaled = scaler.fit_transform(X_flat)
    X_scaled = X_flat_scaled.reshape(N, T, F)

    X_train, X_test, y_train, y_test = train_test_split(
        X_scaled, y_log, test_size=test_size, random_state=random_state
    )

    seq_len = X.shape[1]
    model = build_lstm_model(seq_len=seq_len, n_features=F)

    history = model.fit(
        X_train, y_train,
        validation_data=(X_test, y_test),
        epochs=n_epochs,
        batch_size=batch_size,
        verbose=verbose
    )

    # predict
    y_pred_log = model.predict(X_test).ravel()
    y_pred = np.expm1(y_pred_log)
    y_true = np.expm1(y_test)

    pe, acc = percent_error_and_accuracy(y_true, y_pred)

    return model, scaler, (y_true, y_pred, pe, acc)

In [None]:
# -------------------------
# Flood baseline regression
# -------------------------
print("Training baseline LSTM (regression) — FLOOD property")
flood_prop_model, flood_prop_scaler, (yt_p, yp_p, pe_p, acc_p) = train_lstm_for_target(
    X_flood, y_p_flood,
    test_size=0.2, n_epochs=50, batch_size=32,
    random_state=123, verbose=1
)
print("FLOOD property — percent error:", pe_p, " | accuracy:", acc_p)

print("\nTraining baseline LSTM (regression) — FLOOD crops")
flood_crops_model, flood_crops_scaler, (yt_c, yp_c, pe_c, acc_c) = train_lstm_for_target(
    X_flood, y_c_flood,
    test_size=0.2, n_epochs=100, batch_size=32,
    random_state=456, verbose=1
)
print("FLOOD crops — percent error:", pe_c, " | accuracy:", acc_c)

In [None]:
# Quick diagnostic plot (baseline regression, FLOOD property)
plt.figure()
plt.scatter(yt_p, yp_p, s=10)
plt.xlabel("True damage (property)")
plt.ylabel("Predicted damage (property)")
plt.title("Baseline LSTM regression — FLOOD property (test split)")
plt.show()

## Two-part model: classification + regression (recommended)

Damages are often **zero-inflated** (many county-years have zero loss).  
A two-part model explicitly separates:

1) **Occurrence**: `damage > 0` (binary classification)  
2) **Magnitude**: predict `log1p(damage)` on **positive-damage** samples only

Final prediction: \(\hat D = p\times \mu\).


In [None]:
def build_lstm_classifier(seq_len, n_features=2):
    inputs = layers.Input(shape=(seq_len, n_features))
    x = layers.LSTM(32, return_sequences=False)(inputs)
    x = layers.Dense(32, activation="relu")(x)
    x = layers.Dense(16, activation="relu")(x)
    outputs = layers.Dense(1, activation="sigmoid")(x)  # probability

    model = models.Model(inputs=inputs, outputs=outputs)
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
        loss="binary_crossentropy",
        metrics=["accuracy"]
    )
    return model


def train_lstm_classifier(X, y_raw,
                          test_size=0.2,
                          n_epochs=40,
                          batch_size=32,
                          random_state=42,
                          verbose=1):
    """
    y_raw: damage in dollars; we convert to 0/1.
    """
    y_bin = (y_raw > 0).astype(int)

    # scale X
    N, T, F = X.shape
    X_flat = X.reshape(N * T, F)
    scaler = StandardScaler()
    X_flat_scaled = scaler.fit_transform(X_flat)
    X_scaled = X_flat_scaled.reshape(N, T, F)

    X_train, X_test, y_train, y_test = train_test_split(
        X_scaled, y_bin, test_size=test_size, random_state=random_state
    )

    seq_len = X.shape[1]
    model = build_lstm_classifier(seq_len=seq_len, n_features=F)

    history = model.fit(
        X_train, y_train,
        validation_data=(X_test, y_test),
        epochs=n_epochs,
        batch_size=batch_size,
        verbose=verbose
    )

    # predicted probabilities on test
    p_test = model.predict(X_test).ravel()

    return model, scaler, (y_test, p_test)

In [None]:
def train_lstm_regressor_positive(X, y_raw,
                                  test_size=0.2,
                                  n_epochs=50,
                                  batch_size=32,
                                  random_state=42,
                                  verbose=1):
    """
    Train LSTM regressor ONLY on positive damage years.
    """
    mask_pos = y_raw > 0
    X_pos = X[mask_pos]
    y_pos = y_raw[mask_pos]

    if X_pos.shape[0] < 10:
        raise ValueError("Too few positive samples to train regressor.")

    y_log = np.log1p(y_pos)

    # scale features
    N, T, F = X_pos.shape
    X_flat = X_pos.reshape(N * T, F)
    scaler = StandardScaler()
    X_flat_scaled = scaler.fit_transform(X_flat)
    X_scaled = X_flat_scaled.reshape(N, T, F)

    X_train, X_test, y_train, y_test = train_test_split(
        X_scaled, y_log, test_size=test_size, random_state=random_state
    )

    seq_len = X_pos.shape[1]
    model = build_lstm_model(seq_len=seq_len, n_features=F)

    history = model.fit(
        X_train, y_train,
        validation_data=(X_test, y_test),
        epochs=n_epochs,
        batch_size=batch_size,
        verbose=verbose
    )

    y_pred_log = model.predict(X_test).ravel()
    y_pred = np.expm1(y_pred_log)
    y_true = np.expm1(y_test)

    pe, acc = percent_error_and_accuracy(y_true, y_pred)

    return model, scaler, (y_true, y_pred, pe, acc)

In [None]:
def evaluate_two_part_lstm(X, y_raw,
                           clf_model, clf_scaler,
                           reg_model, reg_scaler):
    """
    Compute two-part LSTM prediction and overall accuracy for ALL samples:
      D_hat = P(damage>0) * E[damage | damage>0]

    X: (N, seq_len, 2)
    y_raw: (N,) true damage in dollars
    """
    N, T, F = X.shape

    # --- classifier branch ---
    X_flat = X.reshape(N * T, F)
    X_flat_clf = clf_scaler.transform(X_flat)
    X_scaled_clf = X_flat_clf.reshape(N, T, F)

    p_pos_all = clf_model.predict(X_scaled_clf).ravel()  # P(damage>0)

    # --- regressor branch ---
    X_flat_reg = reg_scaler.transform(X_flat)
    X_scaled_reg = X_flat_reg.reshape(N, T, F)

    mu_log_all = reg_model.predict(X_scaled_reg).ravel()
    mu_all = np.expm1(mu_log_all)
    mu_all = np.clip(mu_all, 0, None)

    # --- two-part prediction ---
    y_pred = p_pos_all * mu_all
    y_true = y_raw

    pe, acc = percent_error_and_accuracy(y_true, y_pred)
    return y_true, y_pred, pe, acc

### Train and evaluate two-part models (Flood, Drought; Property, Crops)

For each target variable:
- Train classifier on all samples
- Train regressor on positive years only
- Evaluate combined two-part prediction on all samples


In [None]:
# -------------------------
# FLOOD — property (two-part)
# -------------------------
print("FLOOD property — classifier")
clf_flood_prop_model, clf_flood_prop_scaler, (y_bin_fp_test, p_fp_test) = train_lstm_classifier(
    X_flood, y_p_flood,
    test_size=0.3, n_epochs=50, batch_size=5,
    random_state=123, verbose=1
)

print("\nFLOOD property — regressor (positive years only)")
reg_flood_prop_model, reg_flood_prop_scaler, (yt_fp_pos, yp_fp_pos, pe_fp_pos, acc_fp_pos) = train_lstm_regressor_positive(
    X_flood, y_p_flood,
    test_size=0.3, n_epochs=100, batch_size=5,
    random_state=123, verbose=1
)
print("FLOOD property positive-years — percent error:", pe_fp_pos, " | accuracy:", acc_fp_pos)

y_true_fp, y_pred_fp, pe_fp_all, acc_fp_all = evaluate_two_part_lstm(
    X_flood, y_p_flood,
    clf_flood_prop_model, clf_flood_prop_scaler,
    reg_flood_prop_model, reg_flood_prop_scaler
)
print("\nFLOOD property (two-part, all years) — percent error:", pe_fp_all, " | accuracy:", acc_fp_all)

# -------------------------
# FLOOD — crops (two-part)
# -------------------------
print("\nFLOOD crops — classifier")
clf_flood_crops_model, clf_flood_crops_scaler, (y_bin_fc_test, p_fc_test) = train_lstm_classifier(
    X_flood, y_c_flood,
    test_size=0.3, n_epochs=220, batch_size=3,
    random_state=456, verbose=1
)

print("\nFLOOD crops — regressor (positive years only)")
reg_flood_crops_model, reg_flood_crops_scaler, (yt_fc_pos, yp_fc_pos, pe_fc_pos, acc_fc_pos) = train_lstm_regressor_positive(
    X_flood, y_c_flood,
    test_size=0.3, n_epochs=100, batch_size=3,
    random_state=456, verbose=1
)
print("FLOOD crops positive-years — percent error:", pe_fc_pos, " | accuracy:", acc_fc_pos)

y_true_fc, y_pred_fc, pe_fc_all, acc_fc_all = evaluate_two_part_lstm(
    X_flood, y_c_flood,
    clf_flood_crops_model, clf_flood_crops_scaler,
    reg_flood_crops_model, reg_flood_crops_scaler
)
print("\nFLOOD crops (two-part, all years) — percent error:", pe_fc_all, " | accuracy:", acc_fc_all)

# -------------------------
# DROUGHT — property (two-part)
# -------------------------
print("\nDROUGHT property — classifier")
clf_drought_prop_model, clf_drought_prop_scaler, (y_bin_dp_test, p_dp_test) = train_lstm_classifier(
    X_drought, y_p_drought,
    test_size=0.3, n_epochs=200, batch_size=3,
    random_state=123, verbose=1
)

print("\nDROUGHT property — regressor (positive years only)")
reg_drought_prop_model, reg_drought_prop_scaler, (yt_dp_pos, yp_dp_pos, pe_dp_pos, acc_dp_pos) = train_lstm_regressor_positive(
    X_drought, y_p_drought,
    test_size=0.3, n_epochs=100, batch_size=3,
    random_state=123, verbose=1
)
print("DROUGHT property positive-years — percent error:", pe_dp_pos, " | accuracy:", acc_dp_pos)

y_true_dp, y_pred_dp, pe_dp_all, acc_dp_all = evaluate_two_part_lstm(
    X_drought, y_p_drought,
    clf_drought_prop_model, clf_drought_prop_scaler,
    reg_drought_prop_model, reg_drought_prop_scaler
)
print("\nDROUGHT property (two-part, all years) — percent error:", pe_dp_all, " | accuracy:", acc_dp_all)

# -------------------------
# DROUGHT — crops (two-part)
# -------------------------
print("\nDROUGHT crops — classifier")
clf_drought_crops_model, clf_drought_crops_scaler, (y_bin_dc_test, p_dc_test) = train_lstm_classifier(
    X_drought, y_c_drought,
    test_size=0.3, n_epochs=200, batch_size=3,
    random_state=123, verbose=1
)

print("\nDROUGHT crops — regressor (positive years only)")
reg_drought_crops_model, reg_drought_crops_scaler, (yt_dc_pos, yp_dc_pos, pe_dc_pos, acc_dc_pos) = train_lstm_regressor_positive(
    X_drought, y_c_drought,
    test_size=0.3, n_epochs=100, batch_size=3,
    random_state=123, verbose=1
)
print("DROUGHT crops positive-years — percent error:", pe_dc_pos, " | accuracy:", acc_dc_pos)

y_true_dc, y_pred_dc, pe_dc_all, acc_dc_all = evaluate_two_part_lstm(
    X_drought, y_c_drought,
    clf_drought_crops_model, clf_drought_crops_scaler,
    reg_drought_crops_model, reg_drought_crops_scaler
)
print("\nDROUGHT crops (two-part, all years) — percent error:", pe_dc_all, " | accuracy:", acc_dc_all)

In [None]:
# Diagnostic plot (two-part prediction vs truth — example: DROUGHT crops)
plt.figure()
plt.scatter(y_true_dc, y_pred_dc, s=10)
plt.xlabel("True damage (drought crops)")
plt.ylabel("Predicted damage (drought crops)")
plt.title("Two-part LSTM — DROUGHT crops (all sequences)")
plt.show()

## Future predictions (2024–2059)

We use `climate_for_LSTM_1996_2059.csv` to build sequences for years beyond 2023.  
Damage columns are merged from historical tables (and set to 0 for future years).

Outputs:
- `future_LSTM_flood_damage_2024_2059.csv`
- `future_LSTM_drought_damage_2024_2059.csv`


In [None]:
# -------------------------
# Load climate projections (1996–2059) and historical damage tables (1996–2023)
# -------------------------
climate_future = pd.read_csv(CLIMATE_FUTURE_CSV)

flood_hist   = pd.read_csv(FLOOD_HIST_CSV)
drought_hist = pd.read_csv(DROUGHT_HIST_CSV)

display(climate_future.head())

# -------------------------
# Helper to merge climate + historical damages and build sequences
# -------------------------
def build_sequences_for_future(climate_df, damage_df, seq_len=5):
    """
    climate_df: county, year, precip_ann, runoff_ann (1996–2059)
    damage_df : county, year, damage_property, damage_crops (1996–2023)

    Returns:
        X_all, y_prop_all, y_crops_all, meta_all, future_mask
    where future_mask is True for sequences whose target year >= 2024.
    """
    merged = climate_df.merge(
        damage_df[["county", "year", "damage_property", "damage_crops"]],
        on=["county", "year"],
        how="left"
    )

    merged[["damage_property", "damage_crops"]] = merged[
        ["damage_property", "damage_crops"]
    ].fillna(0.0)

    # reuse your existing function
    X_all, y_p_all, y_c_all, meta_all = build_sequences_from_table(
        merged, seq_len=seq_len
    )

    future_mask = meta_all["year"] >= 2024
    return X_all, y_p_all, y_c_all, meta_all, future_mask

# -------------------------
# Two-part prediction helper (no y needed)
# -------------------------
from sklearn.preprocessing import StandardScaler
import tensorflow as tf

def two_part_predict_only(X,
                          clf_model, clf_scaler,
                          reg_model, reg_scaler):
    """
    Two-part prediction WITHOUT needing y:
      D_hat = P(damage>0) * E[damage | damage>0]
    """
    N, T, F = X.shape

    # classifier branch
    X_flat = X.reshape(N * T, F)
    X_flat_clf = clf_scaler.transform(X_flat)
    X_scaled_clf = X_flat_clf.reshape(N, T, F)
    p_pos = clf_model.predict(X_scaled_clf).ravel()

    # regressor branch
    X_flat_reg = reg_scaler.transform(X_flat)
    X_scaled_reg = X_flat_reg.reshape(N, T, F)
    mu_log = reg_model.predict(X_scaled_reg).ravel()
    mu = np.expm1(mu_log)
    mu = np.clip(mu, 0, None)

    return p_pos * mu

In [None]:
# Build future sequences for FLOOD and select target years >= 2024
X_all_flood, y_p_all_flood, y_c_all_flood, meta_all_flood, future_mask_flood = build_sequences_for_future(
    climate_future, flood_hist, seq_len=SEQ_LEN
)

X_flood_future = X_all_flood[future_mask_flood]
meta_flood_future = meta_all_flood[future_mask_flood].reset_index(drop=True)
print("Flood future target years:", meta_flood_future["year"].min(), "-", meta_flood_future["year"].max())

In [None]:
# Predict FLOOD damages (two-part models)
pred_flood_prop_future = two_part_predict_only(
    X_flood_future,
    clf_flood_prop_model, clf_flood_prop_scaler,
    reg_flood_prop_model, reg_flood_prop_scaler
)

pred_flood_crops_future = two_part_predict_only(
    X_flood_future,
    clf_flood_crops_model, clf_flood_crops_scaler,
    reg_flood_crops_model, reg_flood_crops_scaler
)

flood_future_pred = meta_flood_future.copy()
flood_future_pred["flood_property_LSTM"] = pred_flood_prop_future
flood_future_pred["flood_crops_LSTM"]    = pred_flood_crops_future

display(flood_future_pred.head())

In [None]:
out_flood = os.path.join(BASE, "future_LSTM_flood_damage_2024_2059.csv")
flood_future_pred.to_csv(out_flood, index=False)
print("Saved:", out_flood)

In [None]:
# Build future sequences for DROUGHT and select target years >= 2024
X_all_drought, y_p_all_drought, y_c_all_drought, meta_all_drought, future_mask_drought = build_sequences_for_future(
    climate_future, drought_hist, seq_len=SEQ_LEN
)

X_drought_future = X_all_drought[future_mask_drought]
meta_drought_future = meta_all_drought[future_mask_drought].reset_index(drop=True)
print("Drought future target years:", meta_drought_future["year"].min(), "-", meta_drought_future["year"].max())

In [None]:
# Predict DROUGHT damages (two-part models)
pred_drought_prop_future = two_part_predict_only(
    X_drought_future,
    clf_drought_prop_model, clf_drought_prop_scaler,
    reg_drought_prop_model, reg_drought_prop_scaler
)

pred_drought_crops_future = two_part_predict_only(
    X_drought_future,
    clf_drought_crops_model, clf_drought_crops_scaler,
    reg_drought_crops_model, reg_drought_crops_scaler
)

drought_future_pred = meta_drought_future.copy()
drought_future_pred["drought_property_LSTM"] = pred_drought_prop_future
drought_future_pred["drought_crops_LSTM"]    = pred_drought_crops_future

display(drought_future_pred.head())

In [None]:
out_drought = os.path.join(BASE, "future_LSTM_drought_damage_2024_2059.csv")
drought_future_pred.to_csv(out_drought, index=False)
print("Saved:", out_drought)

---  
### Notes for supplementary materials

- Report `SEQ_LEN`, model architecture (LSTM units + dense layers), and the two-part formulation.
- Document scaling (StandardScaler fit on training; applied consistently for prediction).
- If you want time-aware splits (e.g., train on 1996–2015 and test on 2016–2023), replace the random `train_test_split` with a year-based split using `meta_*["year"]`.
