Skip to content

Models Employed

Felipe Santibañez-Leal edited this page Jun 17, 2026 · 2 revisions

Models — Employed (Our Architecture)

The model stack this product actually runs: a calibrated ETAS reference as the defensible core, a smoothed-seismicity Poisson null as the mandatory baseline, a Reasenberg–Jones transparent fallback, and a gated, context-conditioned neural temporal point process (with a CNN as the spatial-context encoder, never a standalone classifier) as a challenger that only ships if it beats ETAS in our own prospective CSEP harness. This page records the architecture, the regime/tile partitioning, the target definition, and how calibration and bounds are produced.

Framing (non-negotiable). This is a forecaster, never a predictor. It outputs bounded, calibrated, conditional probabilities scoped to region × magnitude band × horizon, always shown next to a long-term baseline, evaluated CSEP-style. No deterministic call, no alarm, no countdown, no "safe" state. The product creed, carried verbatim into the app:

Earthquakes cannot be predicted, but their probability can be forecast — reported honestly, with uncertainty, evaluated against reality, never as an alarm and never as a promise of safety.

The building-block models referenced below are defined in full on the Models — Classical and Models — Analytical / ML pages; this page is about how they are assembled.


Table of contents

  1. The model stack at a glance
  2. Layer 1 — ETAS reference (the v0 product and the floor)
  3. Layer 2 — Smoothed-seismicity Poisson null (the mandatory baseline)
  4. Layer 3 — Reasenberg–Jones transparent fallback
  5. Layer 4 — The gated, context-conditioned neural TPP
  6. Regime and tile partitioning
  7. Target definition
  8. Calibration and bounds
  9. The daily-inference flow
  10. Why this beats a simplistic baseline, by design
  11. References

1. The model stack at a glance

The system is a layered conditional estimator. Every layer emits a conditional intensity $\lambda(t, x, y \mid \mathcal{H}_t)$ on the same fine grid; the layers are combined and gated so that nothing un-calibrated or un-beaten reaches the public map.

flowchart TD
    subgraph Inputs["Inputs (snapshotted at issue time t)"]
        CAT["Catalog spine<br/>ComCat + regional low-Mc network<br/>(full, un-declustered)"]
        DCAT["Declustered catalog<br/>(Zaliapin–Ben-Zion / Gardner–Knopoff)"]
        ENR["Static enrichers (gated)<br/>Slab2 · faults · GNSS strain · tidal ΔCFS"]
    end

    subgraph Hygiene["Hygiene pipeline"]
        MC["Time-varying Mc(x,y,t)<br/>MAXC+GFT / EMR"]
        MW["Mw homogenization<br/>(ISC-GEM / GCMT anchor)"]
        BV["b-value (Aki–Utsu, binned)"]
    end

    subgraph Models["Model layers (same 0.1 grid)"]
        NULL["L2 · Smoothed-seismicity<br/>Poisson NULL  (mu(x,y))"]
        ETAS["L1 · Space-time ETAS<br/>REFERENCE / v0 core"]
        RJ["L3 · Reasenberg–Jones<br/>transparent FALLBACK"]
        NPP["L4 · Gated neural TPP<br/>(Hawkes bias + CNN context encoder)<br/>FEATURE-FLAGGED"]
    end

    subgraph Gate["CSEP gate + calibration"]
        CSEP["Prospective CSEP harness<br/>N/M/S/CL consistency + IGPE vs Poisson AND ETAS"]
        CAL["Calibration (isotonic/Platt)<br/>reliability diagram — release blocker"]
    end

    OUT["Public artifact<br/>per-cell P(>=1 event >= M*) for 1d/2d/7d<br/>+ P10/median/P90 bounds + baseline"]

    CAT --> MC --> MW --> BV
    DCAT --> NULL
    BV --> ETAS
    BV --> RJ
    NULL --> ETAS
    ENR -.gated.-> NPP
    BV --> NPP
    ETAS --> CSEP
    NULL --> CSEP
    RJ --> CSEP
    NPP -.must beat ETAS.-> CSEP
    CSEP --> CAL --> OUT
Loading
Layer Model Status Role
L1 Space–time ETAS (region-refit) Ships in v0 The calibrated, defensible reference and primary estimator.
L2 Smoothed-seismicity Poisson Always on The time-independent spatial background $\mu(x,y)$ and the mandatory null every time-dependent model must beat.
L3 Reasenberg–Jones Always on Transparent operational fallback and sanity check (the shape USGS OAF runs).
L4 Context-conditioned neural TPP Gated / feature-flagged A challenger; reaches the public map only if it beats ETAS in our prospective CSEP harness and is calibrated.

2. Layer 1 — ETAS reference (the v0 product and the floor)

v0 ships ETAS-class only. A region-refit space–time ETAS with the full hygiene pipeline that simplistic baselines omit — per-region $M_c$, $M_w$ homogenization, background-only declustering, propagated parameter uncertainty, and full CSEP testability on a fine grid. This alone is materially stronger by design than any hand-binned Hawkes or coarse-rectangle approach, because it is likelihood-fit, calibrated, and S-testable.

The space–time conditional intensity (canonical Ogata-1998 separable kernels):

$$\lambda(t, x, y \mid \mathcal{H}_t) = \mu(x,y) + \sum_{i:,t_i &lt; t} k(m_i), g(t - t_i), f(x - x_i,, y - y_i \mid m_i)$$

  • Utsu productivity: $k(m) = K, e^{\alpha(m - M_0)}$
  • Omori–Utsu temporal decay: $g(t) = \dfrac{p-1}{c}\left(1 + \dfrac{t}{c}\right)^{-p}$
  • Spatial kernel (Ogata 1998): $f(r \mid m) = \dfrac{q-1}{\pi,\zeta(m)^2}\left(1 + \dfrac{r^2}{\zeta(m)^2}\right)^{-q}$, with $\zeta(m) = D, e^{\gamma(m - M_0)}$
  • Magnitudes follow Gutenberg–Richter, $f(m) = \beta e^{-\beta(m - M_c)}$, $\beta = b\ln 10$.

Two stability gates, kept separate (rejecting a fit on either):

  1. $\alpha &lt; \beta$ is required for the productivity × magnitude integral to converge (finite branching ratio $n$).
  2. Given that, $n &lt; 1$ is the subcritical / stationary condition. Reject any fit with $n \ge 1$ (supercritical = mis-fit).

The model is fit by maximizing the point-process log-likelihood

$$\ln L = \sum_i \ln \lambda(t_i, x_i, y_i \mid \mathcal{H}_{t_i})

  • \int_0^T!!\int_A \lambda(t, x, y), dx, dy, dt$$

on the full, un-declustered catalog (triggering is the predictable signal), with the background $\mu(x,y)$ recovered from Layer 2 / stochastic declustering. Fitting is MLE or Bayesian (INLAbru gives a fast posterior, used for bounds in §8). It runs in seconds-to-minutes of CPU — no GPU required for the reference layer.

(Full theory: Models — Classical §3. Ogata 1988, doi:10.1080/01621459.1988.10478560; Ogata 1998, doi:10.1023/A:1003403601725.)


3. Layer 2 — Smoothed-seismicity Poisson null (the mandatory baseline)

A stationary, time-independent estimate of where earthquakes occur, obtained by smoothing a declustered catalog with an adaptive kernel (Helmstetter–Kagan–Jackson):

$$\mu(x, y) = \sum_i K_{d_i}(r), \qquad K_d(r) = \frac{C(d)}{(r^2 + d^2)^{s}},$$

with bandwidth $d_i$ set to the distance from event $i$ to its $n$-th nearest neighbour ($n \approx 6$, a region-tuned hyperparameter).

This layer plays a dual role, which is why it is always on:

  1. It is the spatial background $\mu(x,y)$ that feeds the ETAS reference.
  2. It is the mandatory Poisson null — the time-independent reference that any time-dependent model (ETAS or neural) must beat in a comparison test before it can claim skill. It is also the principled cold-start floor: where recent seismicity is sparse or zero, the conditional rate floors to $\mu(x,y)$ rather than a hard-coded per-day constant. A blank cell never reads as "safe."

Implementation note (do not hard-code the exponent). The kernel exponent $s$ and normalization $C(d)$ vary across the Helmstetter–Kagan–Jackson family ($s = 1$ and $s = 3/2$ both appear). Pin $s$, $C(d)$, and the neighbour count to a specific reference implementation (HKJ 2007 / Werner 2011 or the pyCSEP/floatCSEP code) before coding; do not treat $s = 3/2$ as a verified universal constant.

(Full theory: Models — Classical §7. Helmstetter, Kagan & Jackson 2007, doi:10.1785/gssrl.78.1.78.)


4. Layer 3 — Reasenberg–Jones transparent fallback

The most transparent "tomorrow's earthquakes" model, kept as a sanity check and a transparent operational baseline (the shape USGS OAF runs):

$$\lambda(t, M) = \frac{10^{,a + b(M_m - M)}}{(t + c)^{p}}, \qquad N = \int \lambda, dt, \qquad P(\ge 1) = 1 - e^{-N}.$$

Its purpose in the stack is interpretability and robustness: when a felt mainshock occurs, R-J gives a directly auditable "probability of a larger event in the next N days" that any seismologist can reproduce by hand, which cross-checks the ETAS reference. The STEP production system (Gerstenberger et al. 2005) — which wraps R-J + a background term into gridded short-interval shaking-probability maps — is the closest analogue to this product's daily gridded output shape.

(Full theory: Models — Classical §4–5. Reasenberg & Jones 1989, doi:10.1126/science.243.4895.1173; Page et al. 2016, doi:10.1785/0120160073.)


5. Layer 4 — The gated, context-conditioned neural TPP

The "stronger model" is never the default. It is delivered as a feature-flagged challenger and reaches the public map only if it beats ETAS in our prospective CSEP harness (positive IGPE, T-test CI excluding zero) and is calibrated. Otherwise it stays behind the flag.

5.1 Architecture — a Hawkes inductive bias, not a black box

A conditional spatio-temporal Neural Point Process in the FERN spirit (Zlydenko et al. 2023): keep the additive background + summed-triggering skeleton of ETAS, and replace the fixed kernels with small learned components:

$$\lambda(x, y, t \mid \mathcal{H}_t) = \mu(x, y)

  • \sum_{i:,t_i < t} T_\theta(t - t_i); S_\phi(x - x_i, y - y_i;, m_i,, \mathbf{c}_i),$$

where $T_\theta$ (a small MLP / attention block) learns the temporal response and $S_\phi$ learns the spatial response conditioned on context $\mathbf{c}_i$. Crucially, this challenger models magnitude explicitly — a real gap in most NPPs flagged by EarthquakeNPP — rather than treating it as an afterthought.

5.2 The CNN is a spatial-context encoder, not a classifier

This is the load-bearing design decision that avoids the DeVries trap (see Models — Analytical / ML §5). A CNN is used only to encode spatial context $\mathbf{c}i$ — gridded fields such as Slab2 subduction geometry, distance-to-fault, GNSS strain rate, and smoothed background density — into a feature vector that conditions the point-process intensity $S\phi$. It is never a standalone per-cell "will there be an aftershock here?" binary classifier scored by AUC. The survival-term / point-process likelihood is retained end-to-end, so the output remains a proper, calibratable probability — exactly what a per-cell CNN classifier throws away.

flowchart LR
    GRID["Gridded spatial context<br/>Slab2 · faults · GNSS strain · mu(x,y)"] --> CNN["CNN spatial-context encoder"]
    CNN --> CVEC["context vector c_i"]
    HIST["Event history H_t<br/>(t_i, x_i, y_i, m_i)"] --> ENC["Triggering encoder<br/>T_theta, S_phi (Hawkes skeleton)"]
    CVEC --> ENC
    ENC --> LAM["Conditional intensity<br/>lambda(t,x,y | H_t)"]
    LAM --> LL["Point-process log-likelihood<br/>(survival term retained)"]
    LL --> PROB["Proper, calibratable probability"]
Loading

5.3 Where the neural value comes from (honest grounding)

FERN's reported 4–12% IGPE gain came mostly from (a) ingesting sub-$M_c$ events and (b) learned spatial anisotropy aligned with fault traces — the two real gaps in classical ETAS — not from network depth. So neural R&D targets exactly those two proven levers; covariate availability for the chosen region (not the architecture) is the binding resource. FERN's own caveats are release-blockers for us: it was not CSEP-tested, gave no uncertainty quantification, and its test period ended before Tohoku $M_w$ 9.0. RECAST (Dascher-Cousineau et al. 2023) is a valid alternative backbone but improves on temporal ETAS only when the training catalog is large ($\gtrsim 10^4$ events).

5.4 The gate (a hard release rule)

flowchart TD
    A["Neural challenger fit (temporal split)"] --> B{"Passes CSEP consistency?<br/>N / M / S / CL"}
    B -- No --> X["Stays behind feature flag"]
    B -- Yes --> C{"Beats ETAS on IGPE?<br/>paired T-test CI excludes 0"}
    C -- No --> X
    C -- Yes --> D{"Calibrated?<br/>reliability diagram / PIT"}
    D -- No --> X
    D -- Yes --> E["Eligible for the public map"]
Loading

(Full theory: Models — Analytical / ML §4–5, §9. Zlydenko et al. 2023, doi:10.1038/s41598-023-38033-9; Dascher-Cousineau et al. 2023, doi:10.1029/2023GL103909; Stockman, Lawson & Werner 2026, arXiv:2410.08226.)


6. Regime and tile partitioning

The map is fit fine, reported coarse, and parameters are regionalized rather than global.

  • Fine fit / score grid: regular 0.1° × 0.1° spatial cells with 0.1-magnitude bins (matching the CSEP California convention), so the S-test and M-test can resolve where and what size. This is the resolution at which ETAS is fit and CSEP-tested.
  • Coarse display: the UI aggregates the fine grid into H3 hexbins / tectonic polygons. Fit fine, report coarse.
  • Tectonic regionalization (the regime partition): parameters are not global. Subduction zones violate the isotropic-kernel / point-source assumptions of generic ETAS for great earthquakes, so a subduction build (e.g. Chile) needs Slab2 geometry, possibly anisotropic / finite-fault triggering, and its own network's $M_c$California generic parameters are never reused. Each tectonic regime gets its own fitted ETAS and its own region-appropriate $M^*$ / $M_{\max}$.
  • Borrow strength where data are sparse: rather than shrinking cells until they are empty, sparse regions borrow strength via the smoothed-seismicity background (Layer 2) and hierarchical / empirical-Bayes pooling with regionalized priors, so cells with few events inherit a sensible prior from their tectonic neighbourhood.

7. Target definition

The binding design decision. The model produces the full conditional magnitude distribution (via the GR / $b$-value term), and the public scalar is derived as an exceedance probability at one or more region-appropriate thresholds — a fixed global threshold is brittle (a Chile $M \ge 6$ is routine; a UK $M \ge 6$ is unheard-of).

Spatial cell: 0.1° × 0.1° (fit/score), aggregated to region for display (§6).

Magnitude: forecast the distribution, threshold for display. The exceedance probability — the public number — is

$$P(\ge 1 \text{ event} \ge M^_) = 1 - e^{-N_{\ge M^_}}, \qquad N_{\ge M^_} = \int!!\int \lambda, \Phi(M^_), dx, dy, dt, \qquad \Phi(M^_) = 10^{-b,(M^_ - M_c)}.$$

  • Forecasting the full distribution (not a single binary) keeps the GR information the model already computes, enables the M-test and CRPS, and lets the UI offer multiple thresholds.
  • $M_{\max}$ is specified per region (sourced from regional hazard models): it bounds the exceedance integral and sets the tail probability of the rare large events that dominate impact. It is an explicit, documented assumption with sensitivity reported.
  • $b$-value spatial/temporal variation propagates into the tail and is carried with uncertainty.

Horizon: three rolling horizons re-issued daily1 day, 2 days, 7 days — each at the region's $M^*$ band. The horizon selector is always visible and visibly recolors the field. Honest caveat baked into copy: long (1-week) horizons have near-zero gain outside active sequences; quiet days correctly read near-climatology.

The public formula $P = 1 - e^{-N}$ never changes. Only the quality of $\lambda$ improves as the model improves. The number is always shown next to its long-term baseline, so a user reads "X% vs. Y% baseline," never an unanchored figure.


8. Calibration and bounds

8.1 Calibration (a release blocker)

The public probability is recalibrated (isotonic / Platt) and validated with a reliability diagram per horizon ("when we said 5%, it happened ~5% of the time"). An uncalibrated probability does not ship. The number is always rendered next to the climatological / Poisson baseline. Because the reliability diagram is dominated by quiet cells, calibration is validated specifically in the cold-start regime, not only during active sequences. The public reliability diagram is the single most credibility-building artifact the product ships.

8.2 Bounds must be real (not a cosmetic Poisson interval)

The UI ships an optimistic (P10) / expected / pessimistic (P90) triad — the empirically best uncertainty design (Schneider et al. 2022). Those bounds are a genuine epistemic + aleatory decomposition, sourced from:

  1. ETAS parameter uncertainty — MLE covariance / bootstrap, or a Bayesian posterior (INLAbru gives a fast posterior, not just speed).
  2. $M_c$ and $b$-value estimation uncertainty, propagated through the exceedance integral.
  3. Structural / model-selection uncertainty (ETAS-variant choice).
  4. Over-dispersion. Regional seismicity is over-dispersed relative to Poisson (variance ≫ mean, from clustering), so the pessimistic bound must be wider than a naive Poisson quantile (negative-binomial behaviour; handled via the catalog-based number test in pyCSEP). A Poisson-only band systematically under-warns at the tail — and the Schneider result holds only if the bounds are real.

(Schneider et al. 2022, doi:10.5194/nhess-22-1499-2022; Kagan 2017, doi:10.1093/gji/ggx300. Scoring and over-dispersion handling: Evaluation.)


9. The daily-inference flow

A strict forecast clock makes temporal leakage structurally impossible: at each daily issue time $t$ the model is handed only the catalog slice $(-\infty, t)$, the forecast is sealed, then the clock advances.

flowchart TD
    S1["1 · Snapshot inputs at issue time t<br/>ComCat + regional network, slice (-inf, t)<br/>persist immutable, versioned artifact"]
    S2["2 · Hygiene<br/>Mc(x,y,t) -> Mw homogenize -> dual-catalog decluster<br/>+ post-mainshock incompleteness correction"]
    S3["3 · Fit/update region-refit ETAS<br/>0.1 grid; enforce alpha<beta; reject n>=1"]
    S4["4 · Generate >=10,000 synthetic next-day catalogs<br/>-> 1d/2d/7d probabilities at M* + P10/median/P90"]
    S5["5 · Calibrate (isotonic/Platt) + QA-gate"]
    S6["6 · Write one compact artifact<br/>per-cell rates + baseline + bounds + coverage mask + versions"]
    S7["7 · Log immutably (forecast + input snapshot)"]
    S8["8 · Serve stateless/read-only via a thin static viewer"]
    S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 --> S8
Loading

Two operational guards carried from the design:

  • Post-mainshock incompleteness. Right after a large mainshock — exactly when the forecast matters most — the catalog is grossly incomplete and naive ETAS underestimates productivity. The daily job uses an incompleteness-aware likelihood with a time-dependent $M_c(t)$, not a flat threshold. Under-forecasting here is both a credibility and an (indirect) safety-communication failure.
  • QA gating. A single bad / duplicated / retracted event near $M^*$ can swing a public probability. Auto-publish only if input-catalog sanity, artifact integrity, and a rolling-window N-test drift monitor on the forecaster itself all pass. On a data outage or a failed run, serve "unavailable" with a staleness banner — never silently serve a stale or corrupted artifact.

10. Why this beats a simplistic baseline, by design

A coarse-rectangle, hand-binned-Hawkes, AUC-scored classifier fails on the exact axes this design fixes:

Failure mode of a simplistic baseline This design
No $M_c$, flat thresholds, network-bias contamination Rolling per-region $M_c(x,y,t)$ (MAXC+GFT/EMR), incompleteness-aware post-mainshock
No declustering decision Explicit dual-catalog rule (Zaliapin–Ben-Zion background / full catalog for triggering)
Mixed $m_b$ / $M_s$ / $M_w$ $M_w$ homogenization anchored on ISC-GEM / GCMT
Binary ROC-AUC, no null, calibration-blind CSEP N/M/S/L/CL consistency + IGPE T/W comparison vs. Poisson and ETAS; reliability diagram; AUC banned as a primary metric
Coarse rectangles (un-S-testable) Fine 0.1° grid (S-testable); region-level only for display
Hand-set constants, no CIs MLE / Bayesian fit, propagated parameter + structural + over-dispersion uncertainty
Arbitrary per-day floor Principled smoothed-seismicity background + hierarchical pooling for cold-start
Soft feature/label leak Forecast-clock causal cutoff (features $&lt; t$, label $[t, t{+}H)$) + immutable input snapshot

Information gain over a Poisson reference is state-dependent (reported in nats, the CSEP unit, not bits): positive and large during active aftershock sequences (probability gains up to orders of magnitude on peak days), near zero in quiet periods, with a modest all-period average. For context, time-independent model-vs-smoothed-seismicity IGPE in prospective California CSEP is only about −0.7 to +0.5 nats. We report the gain honestly as state-dependent — never as a fabricated round figure — and always in nats. The public exceedance formula $1 - e^{-N}$ is unchanged; only the quality of $\lambda$ improves.


References

  1. Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. JASA 83(401), 9–27. doi:10.1080/01621459.1988.10478560
  2. Ogata, Y. (1998). Space–time point-process models for earthquake occurrences. Ann. Inst. Statist. Math. 50(2), 379–402. doi:10.1023/A:1003403601725
  3. Reasenberg, P.A. & Jones, L.M. (1989). Earthquake hazard after a mainshock in California. Science 243(4895), 1173–1176. doi:10.1126/science.243.4895.1173
  4. Page, M.T. et al. (2016). Three ingredients for improved global aftershock forecasts. BSSA 106(5), 2290–2301. doi:10.1785/0120160073
  5. Gerstenberger, M.C., Wiemer, S., Jones, L.M. & Reasenberg, P.A. (2005). Real-time forecasts of tomorrow's earthquakes in California. Nature 435, 328–331. doi:10.1038/nature03622
  6. Helmstetter, A., Kagan, Y.Y. & Jackson, D.D. (2007). High-resolution time-independent grid-based forecast for M ≥ 5 earthquakes in California. SRL 78(1), 78–86. doi:10.1785/gssrl.78.1.78
  7. Werner, M.J., Helmstetter, A., Jackson, D.D. & Kagan, Y.Y. (2011). High-resolution long-term and short-term earthquake forecasts for California. BSSA 101, 1630–1648. doi:10.1785/0120090340
  8. Zhuang, J., Ogata, Y. & Vere-Jones, D. (2002). Stochastic declustering of space–time earthquake occurrences. JASA 97(458), 369–380. doi:10.1198/016214502760046925
  9. Zaliapin, I. & Ben-Zion, Y. (2020). Earthquake declustering using the nearest-neighbor approach in space-time-magnitude domain. JGR Solid Earth 125, e2018JB017120. doi:10.1029/2018JB017120
  10. Zlydenko, O. et al. (2023). A neural encoder for earthquake rate forecasting. Scientific Reports 13. doi:10.1038/s41598-023-38033-9
  11. Dascher-Cousineau, K., Shchur, O., Brodsky, E.E. & Günnemann, S. (2023). Using deep learning for flexible and scalable earthquake forecasting (RECAST). GRL 50, e2023GL103909. doi:10.1029/2023GL103909
  12. Stockman, S., Lawson, D. & Werner, M.J. (2026, accepted). EarthquakeNPP: A Benchmark for Earthquake Forecasting with Neural Point Processes. TMLR. arXiv:2410.08226
  13. Hayes, G.P. et al. (2018). Slab2, a comprehensive subduction zone geometry model. Science 362:58–61. doi:10.1126/science.aat4723
  14. Schneider, M. et al. (2022). Bridging the gap between earthquake forecasts and uncertainty communication. NHESS 22(4), 1499–1518. doi:10.5194/nhess-22-1499-2022
  15. Kagan, Y.Y. (2017). Worldwide earthquake forecasts. GJI 211(1), 335–345. doi:10.1093/gji/ggx300
  16. Wiemer, S. & Wyss, M. (2000). Minimum magnitude of completeness in earthquake catalogs. BSSA 90(4), 859–869. doi:10.1785/0119990114
  17. Jordan, T.H. et al. (2011). Operational Earthquake Forecasting (ICEF Report). Annals of Geophysics 54(4), 315–391. doi:10.4401/ag-5350

See also: Models — Classical · Models — Analytical / ML · Evaluation · Methodology · Data & Pipelines.

Clone this wiki locally