-
Notifications
You must be signed in to change notification settings - Fork 0
RECAST and FERN
The two most important neural temporal-point-process (neural-TPP) models built specifically for earthquake-rate forecasting: RECAST (Dascher-Cousineau et al., 2023) and FERN (Zlydenko et al., 2023). Both are honest, peer-reviewed results that earn a place in this product's design reasoning — not because they "beat" the classical baseline, but because they show precisely where a neural model can add skill (covariate ingestion, learned spatial anisotropy, inference speed) and, just as importantly, where it cannot (calibration, uncertainty, prospective generalization). This page treats each model in depth and ends with the honest ETAS-vs-neural-point-process verdict that governs how — and whether — a neural challenger ships here.
The one-line framing (honesty over hype). As of 2026, no neural-TPP has been shown to reliably beat a well-fit ETAS for short-term earthquake forecasting under fair, prospective, CSEP-style testing. RECAST and FERN are the strongest, most honest pro-neural results in the literature, and even they win only in narrow, well-characterized settings. We treat them as evidence for a gated challenger, never as a replacement for the calibrated classical core.
- Why these two models matter
- The shared object: a conditional intensity, learned
- FERN — a neural encoder that generalizes ETAS
- RECAST — a recurrent encoder–decoder neural TPP
- Side-by-side: what each one actually buys you
- The honest caveats (their authors', not ours)
- The ETAS-vs-NPP verdict and what it means here
- Worked illustration: reading an IGPE gain honestly
- Role in operational earthquake forecasting
- References
Most "AI predicts earthquakes" headlines fail one of three tests: they abandon the point-process likelihood (and become un-calibratable), they evaluate on shuffled rather than chronological data (and leak the future into the past), or they never compare against a real ETAS baseline. RECAST and FERN fail none of these. They are:
- Genuine temporal point processes — they estimate a conditional intensity and evaluate the survival (compensator) term, so their outputs are proper probabilities, not regression guesses.
- Compared against ETAS — the right, defensible baseline, not a strawman.
- Honest about their limits — each paper states, in its own words, exactly where it does not yet replace the classical model.
That combination is rare, and it is why these two — not the louder CNN-classifier papers (see CNN Spatial Models) — are the neural results this product takes seriously.
A seismicity catalog is a realization of a marked spatio-temporal point process. Every method, classical or neural, ultimately estimates the conditional intensity function
the instantaneous expected rate of events at time
The second term — the integral, or compensator — is the survival penalty. It is what forces the model to spend probability mass honestly over time: a model cannot simply assign high intensity everywhere, because the integral term debits it for the rate it never "uses." A forecasting system that does not evaluate this term is not doing point-process forecasting, and is the structural root of most ML-forecasting failures discussed on the ML overview and CNN pages.
ETAS (Ogata 1988, 1998) is the parametric, physics-informed member of this family:
$$\lambda(t, x, y \mid \mathcal{H}_t) = \mu(x,y)
- \sum_{i:,t_i < t} \underbrace{A,e^{\alpha(m_i - m_0)}}{\text{Utsu productivity }\kappa(m_i)} ,\underbrace{\frac{K}{(t - t_i + c)^p}}{\text{Omori–Utsu }g(t-t_i)} ,\underbrace{h(x - x_i, y - y_i)}_{\text{spatial kernel}} .$$
RECAST and FERN are the learned members of the same family: they keep this conditional-intensity target and likelihood, but replace some or all of the hand-designed kernels with neural networks. That is exactly why they are testable against ETAS on equal footing — and why their results are interpretable as "did the learned kernel beat the hand-built kernel, on held-forward time?"
flowchart TD
H["Event history H_t<br/>(t_i, x_i, y_i, m_i)"]
subgraph ETAS["ETAS — parametric"]
K1["Fixed kernels:<br/>Omori g, Utsu kappa, spatial h"]
end
subgraph FERN["FERN — generalizes ETAS"]
K2["MLP over event features<br/>(Deep-Sets, permutation-invariant)<br/>+ learned location encoder"]
end
subgraph RECAST["RECAST — recurrent encoder-decoder"]
K3["GRU encoder over history<br/>+ neural-density decoder"]
end
H --> K1 --> L["Conditional intensity<br/>lambda*(t,x,y|H_t)"]
H --> K2 --> L
H --> K3 --> L
L --> LL["Point-process log-likelihood<br/>(survival term retained)"]
LL --> P["Proper, calibratable rate"]
Zlydenko, Elidan, Hassidim, Kukliansky, Matias, Meade, Molchanov, Nevo & Bar-Sinai (2023), A neural encoder for earthquake rate forecasting, Scientific Reports 13 (doi:10.1038/s41598-023-38033-9). FERN is the cleanest demonstration that the value of a neural model is concentrated in two specific ETAS gaps — not in network depth.
FERN keeps the ETAS skeleton — an additive stationary background plus a sum of triggering contributions from past events — but replaces the fixed response function with learned components:
$$\lambda(x, y, t \mid \mathcal{H}_t) = \mu(x,y)
- \sum_{i:,t_i < t} T(t - t_i); S(x - x_i,, y - y_i;, M_i).$$
The pieces:
- A Recent Earthquakes Encoder — a small multilayer perceptron (MLP) that maps each past event's features (magnitude, elapsed time, distance, depth) to its contribution. This replaces ETAS's hand-tuned Omori/Utsu/spatial kernels with a function the network learns from data.
- A long-term seismicity encoder and a learned location encoder for the background
$\mu(x,y)$ . - A permutation-invariant summation (Deep-Sets style): contributions from all past events are summed, so the model is invariant to the order in which past events are listed and naturally handles a variable number of them — no RNN and no transformer is used.
The Deep-Sets design is the elegant part: by summing per-event MLP outputs, FERN inherits ETAS's "each past event independently raises future rate" inductive bias, while letting the shape of that contribution be learned rather than assumed.
-
A 4–12% improvement in Information Gain Per Earthquake (IGPE) over ETAS — but only in the
FERN+ variant, which adds sub-completeness-magnitude events (events below the catalog's
nominal
$M_c$ ) as extra input. Reported region endpoints (e.g. Region A$2.278 \to 2.395$ , Region C$1.395 \to 1.803$ IGPE) are ETAS → FERN+ values; plain FERN, without the extra small events, is lower. The average gain is on the order of ~0.1 information bits per earthquake. - It learned anisotropic spatial structure aligned with fault traces — without being given any fault geometry as input. The MLP discovered that aftershocks cluster along strike, which ETAS's isotropic kernel cannot represent unless fault geometry is hand-coded.
- Runtime ~1 s vs. ~10 h for an equivalent ETAS simulation — roughly a 1000× inference speedup, which matters for a cheap daily public app.
FERN's gain is real but modest, retrospective, region-specific, and uncalibrated. Critically, the gain traces almost entirely to two things that are not "deep-learning magic":
- Ingesting more data — the sub-$M_c$ events that ETAS conventionally discards. This is a data win, available to any model that can consume the extra events.
- Learning spatial anisotropy — flexibility ETAS lacks by construction.
Both are exactly the ETAS limitations that motivate a challenger. This is the good news for the product: neural value is concentrated in covariate ingestion + spatial flexibility, and both can be added incrementally on top of an ETAS skeleton — which is precisely the design on the Models — Employed page.
Dascher-Cousineau, Shchur, Brodsky & Günnemann (2023), Using deep learning for flexible and scalable earthquake forecasting (RECAST), Geophysical Research Letters 50, e2023GL103909 (doi:10.1029/2023GL103909). RECAST takes the opposite architectural bet from FERN: instead of a permutation-invariant sum, it uses a recurrent encoder–decoder built from modern neural-TPP components.
- An encoder (a GRU, gated recurrent unit) reads the event history sequentially and compresses it into a hidden state that summarizes "where the sequence is" — analogous to how the RMTPP / Neural-Hawkes family encode history (see ML overview §3).
- A neural-density decoder turns that hidden state into a flexible conditional density for the time to the next event, from which the conditional intensity follows.
Where FERN's inductive bias is "sum of independent per-event triggers" (very ETAS-like), RECAST's is "a learned recurrent summary of the whole sequence" (very modern-TPP-like). This makes RECAST more flexible but also more data-hungry: the GRU has more capacity to fit, and therefore more capacity to overfit when the catalog is small.
RECAST's headline, honest finding:
RECAST improves on temporal ETAS only when the training catalog is large (
$\gtrsim 10^4$ events). On smaller catalogs, it merely matches ETAS.
This is one of the most useful sentences in the whole neural-forecasting literature, because it makes the trade-off explicit and falsifiable. The neural model's extra flexibility is an asset only once there are enough events to constrain it; below that, the parsimonious ETAS is at least as good and far more interpretable. RECAST is primarily a temporal model (rate through time), and its scalability claim is about handling very large catalogs efficiently — not about beating ETAS spatially.
RECAST is a valid alternative backbone to FERN if the product ever ships a neural challenger — it
is principled, peer-reviewed, and built on solid neural-TPP foundations. But its data-hunger means it
only earns its keep in data-rich regions (dense networks, low
| FERN (Zlydenko et al. 2023) | RECAST (Dascher-Cousineau et al. 2023) | |
|---|---|---|
| Family | Neural TPP, ETAS-generalizing | Neural TPP, recurrent encoder–decoder |
| Backbone | MLP + Deep-Sets (permutation-invariant sum); no RNN/transformer | GRU encoder + neural-density decoder |
| Inductive bias | Sum of independent per-event triggers (very ETAS-like) | Learned recurrent summary of the sequence |
| Headline win | 4–12% IGPE over ETAS — only as FERN+ (sub-$M_c$ events) | Beats temporal ETAS only on large catalogs ( |
| Where the gain comes from | Sub-$M_c$ data ingestion + learned anisotropy | Capacity to fit large, rich sequences |
| Spatial structure | Learns fault-aligned anisotropy without fault inputs | Primarily temporal |
| Runtime | ~1 s vs. ~10 h ETAS (≈1000×) | Scalable; efficient on large catalogs |
| Calibration / uncertainty | None provided (authors' own caveat) | Not the focus |
| CSEP-tested? | No | No |
| Honest verdict | Modest, retrospective, region-specific, uncalibrated | Data-hungry; matches ETAS below ~$10^4$ events |
The complementary lesson: FERN tells you the win is in data + anisotropy; RECAST tells you the cost is data-hunger. Together they bound the realistic expectation for a neural challenger here.
The strongest evidence that these are honest papers is that the caveats are stated by the authors themselves, not extracted by critics. For FERN:
- Not tested under CSEP. No N/M/S/CL consistency tests; the gain is measured by IGPE on a retrospective split, not by the field's prospective standard.
- No uncertainty quantification at all — in the authors' words, "we do not provide any uncertainty estimates." For a product that publishes a public probability, uncertainty bands are not optional (see Models — Employed §8), so this is a release-blocker as-is.
-
Test period ends before the 2011 Tohoku
$M_w$ 9.0 sequence — the single most stress-testing event in the modern catalog is excluded. - Parameters are not interpretable, and performance varies by region with possible train/test distribution shift.
For RECAST: the advantage is conditional on catalog size, and the model is primarily temporal — it does not, on its own, deliver the spatial calibration a gridded public map requires.
Why this matters for shipping. Each of these caveats maps onto a hard release rule in this product: CSEP-tested or it does not ship; calibrated with a reliability diagram or it does not ship; validated through a great earthquake, not around it. A neural model that cannot clear those rules stays behind a feature flag — exactly the gating architecture the product adopts.
The decisive context for both models is the field's first rigorous benchmark:
Stockman, Lawson & Werner (accepted TMLR 2026; arXiv:2410.08226), EarthquakeNPP — benchmarked five modern neural point processes (NSTPP, DeepSTPP, AutoSTPP, DSTPP, SMASH) against ETAS on California catalogs (1971–2021) with strict chronological splits and CSEP consistency tests, with each model generating 24-hour forecasts via 10,000 simulated catalogs per day. The headline:
None of the five NPPs outperformed ETAS. ETAS won spatial log-likelihood "consistently," and led the CSEP pass rates (on ComCat, ETAS passed at ~95.8–97.6% across the number / pseudo-likelihood tests, vs. the best NPP at ~86–88% — and only ~68.6% on the spatial test, which is exactly where forecasting value lives). The authors' conclusion: "current NPP implementations are not yet suitable for practical earthquake forecasting."
Crucially, EarthquakeNPP repaired a data-leakage flaw in earlier neural-TPP-for-earthquakes work
(non-chronological / alternating splits that "artificially inflate performance measures due to the
nature of earthquake triggering", plus the exclusion of the Tohoku
Where this leaves FERN and RECAST. They are the honest exceptions that prove the rule: each shows a real but conditional edge (FERN with sub-$M_c$ augmentation; RECAST on large catalogs), and each is candid that it is not yet CSEP-validated or calibrated. The blunt verdict for 2026:
A well-fit ETAS is the floor a neural model must clear, not a ceiling it can ignore. Neural models add value mainly as an ETAS-generalizing conditional intensity that ingests covariates and learns spatial flexibility — the two FERN wins — not through architectural depth alone.
Scope honesty. "NPPs do not beat ETAS" is established on the California benchmark to date (1971–2021), not proven as a universal law. It justifies shipping an ETAS-class core for v0 and gating any neural model behind a prospective CSEP win — but we do not claim ML can never add skill.
Suppose a neural challenger reports an Information-Gain-Per-Earthquake of
IGPE is the mean per-event log-rate difference (in nats, natural-log units — not bits):
$$I_N(A, B) = \frac{1}{N}\sum_{i=1}^{N}\Big(\log \lambda_A(k_i) - \log \lambda_B(k_i)\Big)
- \frac{\hat N_A - \hat N_B}{N},$$
with model
But a point estimate is not a release decision. The product's rule:
-
Significance, not just sign. The paired T-test confidence interval on
$I_N$ must exclude zero, corroborated by the W-test (Wilcoxon). A$+0.1$ that is statistically indistinguishable from$0$ buys nothing. -
State-dependence. IGPE over a baseline is large during active aftershock sequences and near
zero in quiet periods — for scale, time-independent contrasts in prospective California CSEP give
only about
$-0.7$ to$+0.5$ nats. A headline "+0.1 average" must be reported as state-dependent, never as a flat steady-state number. - Calibration. Even a significant, positive IGPE does not ship unless the reliability diagram says "when we said 5%, it happened ~5% of the time" (see Evaluation).
So a
Operational earthquake forecasting (OEF) — issuing recurrent, calibrated probabilities of future seismicity — has, for decades, run on ETAS-class and Reasenberg–Jones-class models precisely because they are interpretable, generative, and CSEP-testable. RECAST and FERN are the most credible candidates to augment that core, but neither is yet an OEF-ready replacement:
- FERN's contribution to OEF is a template for principled covariate ingestion (sub-$M_c$ events, and by extension geodesy / strain / multiple catalogs) and learned spatial anisotropy — added on top of an ETAS skeleton, behind a CSEP gate.
- RECAST's contribution is a scalable temporal backbone for the rare data-rich regions where catalog size justifies the extra capacity.
In this product, both inform the gated, context-conditioned neural challenger of Models — Employed §5: keep the additive-background-plus-summed-triggering Hawkes skeleton, replace the fixed kernels with small learned components, model magnitude explicitly (a gap EarthquakeNPP flagged in most NPPs), and let it reach the public map only after it beats ETAS in our own prospective CSEP harness and is calibrated. Until then, the classical ETAS reference is the product.
- Zlydenko, O., Elidan, G., Hassidim, A., Kukliansky, D., Matias, Y., Meade, B., Molchanov, A., Nevo, A. & Bar-Sinai, Y. (2023). A neural encoder for earthquake rate forecasting. Scientific Reports 13, 12350. doi:10.1038/s41598-023-38033-9
- Dascher-Cousineau, K., Shchur, O., Brodsky, E.E. & Günnemann, S. (2023). Using deep learning for flexible and scalable earthquake forecasting (RECAST). Geophysical Research Letters 50, e2023GL103909. doi:10.1029/2023GL103909
- Stockman, S., Lawson, D. & Werner, M.J. (2026, accepted). EarthquakeNPP: A Benchmark for Earthquake Forecasting with Neural Point Processes. Transactions on Machine Learning Research (TMLR). arXiv:2410.08226
- Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association 83(401), 9–27. doi:10.1080/01621459.1988.10478560
- Ogata, Y. (1998). Space–time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics 50(2), 379–402. doi:10.1023/A:1003403601725
- Zaheer, M., Kottur, S., Ravanbakhsh, S., Póczos, B., Salakhutdinov, R. & Smola, A. (2017). Deep Sets. NeurIPS 2017. arXiv:1703.06114
- Schultz, R. (2026). Forecasting the Rate of Induced Seismicity as a Neural Temporal Point Process. JGR: Machine Learning and Computation. doi:10.1029/2025JH001052
- Rhoades, D.A., Schorlemmer, D., Gerstenberger, M.C., Christophersen, A., Zechar, J.D. & Imoto, M. (2011). Efficient testing of earthquake forecasting models. Acta Geophysica 59(4), 728–747. doi:10.2478/s11600-011-0013-5
- Serafini, F., Bayona, J.A., Silva, F., Savran, W., Stockman, S., Maechling, P.J. & Werner, M.J. (2025). A benchmark database of ten years of prospective next-day earthquake forecasts in California from CSEP. Scientific Data 12, 1501. doi:10.1038/s41597-025-05766-3
- CSEP / pyCSEP — Collaboratory for the Study of Earthquake Predictability. https://cseptesting.org · https://github.com/SCECcode/pycsep
See also: Models — Classical · Models — ML · Models — Employed · CNN Spatial Models · Graph & Recurrent Networks · Detection vs. Forecasting · Evaluation · Honest Limits.
⚠️ Disclaimer — read this. CAOS_SEISMIC produces probabilistic forecasts, not predictions. It is an independent research and education tool. It is NOT an official earthquake early-warning or civil-protection system, it does NOT predict when, where, or how large an earthquake will be, and it must NOT be used for life-safety, emergency, or evacuation decisions. Every number it publishes is a bounded, calibrated probability conditioned on the present state of seismicity — never an alarm, a countdown, or a "safe" state. A single outcome neither confirms nor refutes a probabilistic forecast.It complements, and does not replace or speak for, official agencies — always follow your national seismological and civil-protection authorities (e.g. USGS, INGV, CSN (Chile, SENAPRED for civil protection), GeoNet, JMA). The software is provided "as is", without warranty of any kind (MIT License); the authors accept no liability for its use. Data are courtesy of their providers (USGS/ANSS, ISC/ISC-GEM, Global CMT, EMSC, CSN, and others) under their respective licenses and attribution terms. See Honest-Limits for the full epistemic context.
CAOS_SEISMIC · seismic.fasl-work.com · source · MIT
Conditional probabilistic seismic forecasting — forecasts, never predictions.
Overview
Methodology & History
Classical models
- Models-Classical · index
- Gutenberg-Richter-Law
- Omori-Utsu-Law
- ETAS-Model
- Reasenberg-Jones-Model
- STEP-Model
- EEPAS-Model
- Smoothed-Seismicity
- Brownian-Passage-Time
- Rate-and-State-and-Coulomb
ML & analytical methods
- Models-ML · index
- Temporal-Point-Processes
- RMTPP
- Neural-Hawkes-Process
- Transformer-Hawkes-Process
- RECAST-and-FERN
- CNN-Spatial-Models
- Graph-and-Recurrent-Networks
- Detection-vs-Forecasting
Models employed
Data
Architecture
Evaluation
Progress
Reference