Skip to content

Models ML

Felipe Santibañez-Leal edited this page Jun 17, 2026 · 2 revisions

Models — Analytical / ML Approaches (index)

A rigorous, skeptical survey of the analytical and machine-learning model family relevant to a conditional probabilistic seismic forecasting product. This page is an index: each method now lives on its own deep sub-page with intuition and history, governing equations and derivation, training/estimation, strengths and limitations, role in operational forecasting, a diagram, and a References section with DOIs.

The verdict, stated up front (honesty over hype). As of 2026, no machine-learning model has been shown to reliably beat a well-fit ETAS for short-term earthquake forecasting under fair, prospective, CSEP-style testing. This is not a vibe — it is the explicit conclusion of the most rigorous benchmark to date (EarthquakeNPP; see RECAST-and-FERN and Detection-vs-Forecasting). ML does add genuine value, but in specific, honest places: better catalogs upstream (detection), multivariate covariate ingestion, learned spatial anisotropy, and inference speed. This is why the product ships an ETAS-class core (see Models-Classical and Models-Employed) with any neural model gated as a challenger that must beat ETAS in our own prospective Evaluation-and-Tests and be calibrated before it reaches the public map.


The forecasting target (shared by every method)

A seismicity catalog is a realization of a marked spatio-temporal point process. Every method — statistical or neural — is ultimately estimating the conditional intensity function $\lambda^*(t,x,y,m \mid \mathcal{H}_t)$, the instantaneous expected rate of events given history. The log-likelihood over $[0,T]$,

$$\log \mathcal{L} = \sum_{i=1}^{n} \log \lambda^_(t_i) ;-; \int_0^T \lambda^_(\tau), d\tau,$$

contains a compensator / survival term (the integral) that makes a model probabilistic and calibratable rather than a regressor. A forecasting system that does not evaluate this term is not doing point-process forecasting — the structural root of most ML-forecasting failures. The full treatment is in Temporal-Point-Processes.

flowchart TD
    TPP[Temporal Point Processes<br/>conditional intensity + likelihood] --> RMTPP[RMTPP<br/>RNN intensity]
    TPP --> NHP[Neural Hawkes<br/>continuous-time LSTM]
    TPP --> THP[Transformer Hawkes<br/>attention]
    RMTPP --> EQ[Earthquake-specific neural TPPs]
    NHP --> EQ
    THP --> EQ
    EQ --> RECAST[RECAST and FERN]
    CNN[CNN spatial models<br/>DeVries cautionary tale] -. spatial .-> EQ
    GRN[Graph and Recurrent networks] -. structure .-> EQ
    DET[Detection vs Forecasting<br/>the hard line] -. upstream catalogs .-> TPP
    RECAST --> GATE{Beats ETAS in prospective CSEP<br/>AND calibrated?}
    GATE -- yes --> PUB[Reaches the public map]
    GATE -- no --> ETAS[ETAS-class core stays]
Loading

The methods — one line each

  • Temporal-Point-Processes — the unifying framework: conditional intensity $\lambda^*$, the compensator, the log-likelihood, thinning/simulation, and residual analysis. Every other method is a special case.
  • RMTPP — Recurrent Marked Temporal Point Process (Du et al., 2016): the first neural TPP, an RNN that embeds event history into a vector and parameterizes the intensity.
  • Neural-Hawkes-Process — Mei & Eisner (2017): a continuous-time LSTM whose hidden state decays between events, generalizing the Hawkes self-excitation to learned, non-additive dynamics.
  • Transformer-Hawkes-Process — Zuo et al. (2020) and self-attentive Hawkes: attention over the event history for long-range dependencies, with the same likelihood/compensator machinery.
  • RECAST-and-FERN — the two neural TPPs built specifically for earthquakes; the honest benchmark evidence (EarthquakeNPP) on where they match, and where they do not yet beat, ETAS.
  • CNN-Spatial-Models — CNN spatial forecasting and the canonical cautionary tale: DeVries (2018) vs. Mignan & Broccardo (2019) — why a single neuron matched a deep net, and the leakage/AUC lessons.
  • Graph-and-Recurrent-Networks — GNN/RNN/LSTM approaches; where graph structure and recurrence genuinely help (associations, upstream catalogs) and where they underwhelm for forecasting.
  • Detection-vs-Forecasting — the hard line: deep learning is transformative for detection and phase-picking (PhaseNet, EQTransformer) but that is not forecasting; the two must not be conflated.

Where ML helps, where it does not (honest synthesis)

ML genuinely helps ML has not beaten the classical baseline
Detection / phase-picking → better, more complete catalogs upstream (Detection-vs-Forecasting) Short-term rate forecasting vs. a well-fit ETAS under fair prospective CSEP testing
Ingesting multivariate covariates a parametric ETAS cannot easily absorb Any claim resting on AUC / classification framings (calibration-blind; banned as a primary metric)
Learned spatial anisotropy and flexible kernels Anything trained or evaluated with temporal leakage (the DeVries lesson)
Inference speed and scalable conditioning Deterministic "yes/no" prediction — impossible, never claimed

The discipline that keeps this honest is in Evaluation-and-Tests and Honest-Limits.


See also: Models-Classical · Models-Employed · Temporal-Point-Processes · Methodology-History · Evaluation-and-Tests · Honest-Limits · References · Glossary

Clone this wiki locally