RECAST and FERN

RECAST and FERN — Neural Temporal Point Processes Built for Earthquakes

The two most important neural temporal-point-process (neural-TPP) models built specifically for earthquake-rate forecasting: RECAST (Dascher-Cousineau et al., 2023) and FERN (Zlydenko et al., 2023). Both are honest, peer-reviewed results that earn a place in this product's design reasoning — not because they "beat" the classical baseline, but because they show precisely where a neural model can add skill (covariate ingestion, learned spatial anisotropy, inference speed) and, just as importantly, where it cannot (calibration, uncertainty, prospective generalization). This page treats each model in depth and ends with the honest ETAS-vs-neural-point-process verdict that governs how — and whether — a neural challenger ships here.

The one-line framing (honesty over hype). As of 2026, no neural-TPP has been shown to reliably beat a well-fit ETAS for short-term earthquake forecasting under fair, prospective, CSEP-style testing. RECAST and FERN are the strongest, most honest pro-neural results in the literature, and even they win only in narrow, well-characterized settings. We treat them as evidence for a gated challenger, never as a replacement for the calibrated classical core.

Why these two models matter
The shared object: a conditional intensity, learned
FERN — a neural encoder that generalizes ETAS
RECAST — a recurrent encoder–decoder neural TPP
Side-by-side: what each one actually buys you
The honest caveats (their authors', not ours)
The ETAS-vs-NPP verdict and what it means here
Worked illustration: reading an IGPE gain honestly
Role in operational earthquake forecasting
References

1. Why these two models matter

Most "AI predicts earthquakes" headlines fail one of three tests: they abandon the point-process likelihood (and become un-calibratable), they evaluate on shuffled rather than chronological data (and leak the future into the past), or they never compare against a real ETAS baseline. RECAST and FERN fail none of these. They are:

Genuine temporal point processes — they estimate a conditional intensity and evaluate the survival (compensator) term, so their outputs are proper probabilities, not regression guesses.
Compared against ETAS — the right, defensible baseline, not a strawman.
Honest about their limits — each paper states, in its own words, exactly where it does not yet replace the classical model.

That combination is rare, and it is why these two — not the louder CNN-classifier papers (see CNN Spatial Models) — are the neural results this product takes seriously.

2. The shared object: a conditional intensity, learned

A seismicity catalog is a realization of a marked spatio-temporal point process. Every method, classical or neural, ultimately estimates the conditional intensity function

$$\lambda^*(t, x, y, m \mid \mathcal{H}_t),$$

the instantaneous expected rate of events at time $t$, location $(x,y)$, magnitude $m$, given the history $\mathcal{H}_t = {(t_i, x_i, y_i, m_i) : t_i < t}$. The $*$ denotes conditioning on history. A model is trained by maximizing the point-process log-likelihood over $[0, T]$,

$$\log \mathcal{L} = \sum_{i=1}^{n} \log \lambda^_(t_i) ;-; \int_0^T \lambda^_(\tau), d\tau .$$

The second term — the integral, or compensator — is the survival penalty. It is what forces the model to spend probability mass honestly over time: a model cannot simply assign high intensity everywhere, because the integral term debits it for the rate it never "uses." A forecasting system that does not evaluate this term is not doing point-process forecasting, and is the structural root of most ML-forecasting failures discussed on the ML overview and CNN pages.

ETAS (Ogata 1988, 1998) is the parametric, physics-informed member of this family:

$$\lambda(t, x, y \mid \mathcal{H}_t) = \mu(x,y)

\sum_{i:,t_i < t} \underbrace{A,e^{\alpha(m_i - m_0)}}{\text{Utsu productivity }\kappa(m_i)} ,\underbrace{\frac{K}{(t - t_i + c)^p}}{\text{Omori–Utsu }g(t-t_i)} ,\underbrace{h(x - x_i, y - y_i)}_{\text{spatial kernel}} .$$

RECAST and FERN are the learned members of the same family: they keep this conditional-intensity target and likelihood, but replace some or all of the hand-designed kernels with neural networks. That is exactly why they are testable against ETAS on equal footing — and why their results are interpretable as "did the learned kernel beat the hand-built kernel, on held-forward time?"

flowchart TD
    H["Event history H_t<br/>(t_i, x_i, y_i, m_i)"]
    subgraph ETAS["ETAS — parametric"]
        K1["Fixed kernels:<br/>Omori g, Utsu kappa, spatial h"]
    end
    subgraph FERN["FERN — generalizes ETAS"]
        K2["MLP over event features<br/>(Deep-Sets, permutation-invariant)<br/>+ learned location encoder"]
    end
    subgraph RECAST["RECAST — recurrent encoder-decoder"]
        K3["GRU encoder over history<br/>+ neural-density decoder"]
    end
    H --> K1 --> L["Conditional intensity<br/>lambda*(t,x,y|H_t)"]
    H --> K2 --> L
    H --> K3 --> L
    L --> LL["Point-process log-likelihood<br/>(survival term retained)"]
    LL --> P["Proper, calibratable rate"]

3. FERN — a neural encoder that generalizes ETAS

Zlydenko, Elidan, Hassidim, Kukliansky, Matias, Meade, Molchanov, Nevo & Bar-Sinai (2023), A neural encoder for earthquake rate forecasting, Scientific Reports 13 (doi:10.1038/s41598-023-38033-9). FERN is the cleanest demonstration that the value of a neural model is concentrated in two specific ETAS gaps — not in network depth.

3.1 The architecture (intuition)

FERN keeps the ETAS skeleton — an additive stationary background plus a sum of triggering contributions from past events — but replaces the fixed response function with learned components:

$$\lambda(x, y, t \mid \mathcal{H}_t) = \mu(x,y)

\sum_{i:,t_i < t} T(t - t_i); S(x - x_i,, y - y_i;, M_i).$$

The pieces:

A Recent Earthquakes Encoder — a small multilayer perceptron (MLP) that maps each past event's features (magnitude, elapsed time, distance, depth) to its contribution. This replaces ETAS's hand-tuned Omori/Utsu/spatial kernels with a function the network learns from data.
A long-term seismicity encoder and a learned location encoder for the background $\mu(x,y)$.
A permutation-invariant summation (Deep-Sets style): contributions from all past events are summed, so the model is invariant to the order in which past events are listed and naturally handles a variable number of them — no RNN and no transformer is used.

The Deep-Sets design is the elegant part: by summing per-event MLP outputs, FERN inherits ETAS's "each past event independently raises future rate" inductive bias, while letting the shape of that contribution be learned rather than assumed.

3.2 What it actually won

A 4–12% improvement in Information Gain Per Earthquake (IGPE) over ETAS — but only in the FERN+ variant, which adds sub-completeness-magnitude events (events below the catalog's nominal $M_c$) as extra input. Reported region endpoints (e.g. Region A $2.278 \to 2.395$, Region C $1.395 \to 1.803$ IGPE) are ETAS → FERN+ values; plain FERN, without the extra small events, is lower. The average gain is on the order of ~0.1 information bits per earthquake.
It learned anisotropic spatial structure aligned with fault traces — without being given any fault geometry as input. The MLP discovered that aftershocks cluster along strike, which ETAS's isotropic kernel cannot represent unless fault geometry is hand-coded.
Runtime ~1 s vs. ~10 h for an equivalent ETAS simulation — roughly a 1000× inference speedup, which matters for a cheap daily public app.

3.3 The honest reading

FERN's gain is real but modest, retrospective, region-specific, and uncalibrated. Critically, the gain traces almost entirely to two things that are not "deep-learning magic":

Ingesting more data — the sub-$M_c$ events that ETAS conventionally discards. This is a data win, available to any model that can consume the extra events.
Learning spatial anisotropy — flexibility ETAS lacks by construction.

Both are exactly the ETAS limitations that motivate a challenger. This is the good news for the product: neural value is concentrated in covariate ingestion + spatial flexibility, and both can be added incrementally on top of an ETAS skeleton — which is precisely the design on the Models — Employed page.

4. RECAST — a recurrent encoder–decoder neural TPP

Dascher-Cousineau, Shchur, Brodsky & Günnemann (2023), Using deep learning for flexible and scalable earthquake forecasting (RECAST), Geophysical Research Letters 50, e2023GL103909 (doi:10.1029/2023GL103909). RECAST takes the opposite architectural bet from FERN: instead of a permutation-invariant sum, it uses a recurrent encoder–decoder built from modern neural-TPP components.

4.1 The architecture (intuition)

An encoder (a GRU, gated recurrent unit) reads the event history sequentially and compresses it into a hidden state that summarizes "where the sequence is" — analogous to how the RMTPP / Neural-Hawkes family encode history (see ML overview §3).
A neural-density decoder turns that hidden state into a flexible conditional density for the time to the next event, from which the conditional intensity follows.

Where FERN's inductive bias is "sum of independent per-event triggers" (very ETAS-like), RECAST's is "a learned recurrent summary of the whole sequence" (very modern-TPP-like). This makes RECAST more flexible but also more data-hungry: the GRU has more capacity to fit, and therefore more capacity to overfit when the catalog is small.

4.2 What it actually won — and the data-hunger result

RECAST's headline, honest finding:

RECAST improves on temporal ETAS only when the training catalog is large ($\gtrsim 10^4$ events). On smaller catalogs, it merely matches ETAS.

This is one of the most useful sentences in the whole neural-forecasting literature, because it makes the trade-off explicit and falsifiable. The neural model's extra flexibility is an asset only once there are enough events to constrain it; below that, the parsimonious ETAS is at least as good and far more interpretable. RECAST is primarily a temporal model (rate through time), and its scalability claim is about handling very large catalogs efficiently — not about beating ETAS spatially.

4.3 The honest reading

RECAST is a valid alternative backbone to FERN if the product ever ships a neural challenger — it is principled, peer-reviewed, and built on solid neural-TPP foundations. But its data-hunger means it only earns its keep in data-rich regions (dense networks, low $M_c$, long catalogs). In a data-sparse subduction build, RECAST would offer no advantage over ETAS — a fact the regionalization strategy on Models — Employed §6 accounts for directly.

5. Side-by-side: what each one actually buys you

	FERN (Zlydenko et al. 2023)	RECAST (Dascher-Cousineau et al. 2023)
Family	Neural TPP, ETAS-generalizing	Neural TPP, recurrent encoder–decoder
Backbone	MLP + Deep-Sets (permutation-invariant sum); no RNN/transformer	GRU encoder + neural-density decoder
Inductive bias	Sum of independent per-event triggers (very ETAS-like)	Learned recurrent summary of the sequence
Headline win	4–12% IGPE over ETAS — only as FERN+ (sub-$M_c$ events)	Beats temporal ETAS only on large catalogs ($\gtrsim 10^4$)
Where the gain comes from	Sub-$M_c$ data ingestion + learned anisotropy	Capacity to fit large, rich sequences
Spatial structure	Learns fault-aligned anisotropy without fault inputs	Primarily temporal
Runtime	~1 s vs. ~10 h ETAS (≈1000×)	Scalable; efficient on large catalogs
Calibration / uncertainty	None provided (authors' own caveat)	Not the focus
CSEP-tested?	No	No
Honest verdict	Modest, retrospective, region-specific, uncalibrated	Data-hungry; matches ETAS below ~$10^4$ events

The complementary lesson: FERN tells you the win is in data + anisotropy; RECAST tells you the cost is data-hunger. Together they bound the realistic expectation for a neural challenger here.

6. The honest caveats (their authors', not ours)

The strongest evidence that these are honest papers is that the caveats are stated by the authors themselves, not extracted by critics. For FERN:

Not tested under CSEP. No N/M/S/CL consistency tests; the gain is measured by IGPE on a retrospective split, not by the field's prospective standard.
No uncertainty quantification at all — in the authors' words, "we do not provide any uncertainty estimates." For a product that publishes a public probability, uncertainty bands are not optional (see Models — Employed §8), so this is a release-blocker as-is.
Test period ends before the 2011 Tohoku $M_w$ 9.0 sequence — the single most stress-testing event in the modern catalog is excluded.
Parameters are not interpretable, and performance varies by region with possible train/test distribution shift.

For RECAST: the advantage is conditional on catalog size, and the model is primarily temporal — it does not, on its own, deliver the spatial calibration a gridded public map requires.

Why this matters for shipping. Each of these caveats maps onto a hard release rule in this product: CSEP-tested or it does not ship; calibrated with a reliability diagram or it does not ship; validated through a great earthquake, not around it. A neural model that cannot clear those rules stays behind a feature flag — exactly the gating architecture the product adopts.

7. The ETAS-vs-NPP verdict and what it means here

The decisive context for both models is the field's first rigorous benchmark:

Stockman, Lawson & Werner (accepted TMLR 2026; arXiv:2410.08226), EarthquakeNPP — benchmarked five modern neural point processes (NSTPP, DeepSTPP, AutoSTPP, DSTPP, SMASH) against ETAS on California catalogs (1971–2021) with strict chronological splits and CSEP consistency tests, with each model generating 24-hour forecasts via 10,000 simulated catalogs per day. The headline:

None of the five NPPs outperformed ETAS. ETAS won spatial log-likelihood "consistently," and led the CSEP pass rates (on ComCat, ETAS passed at ~95.8–97.6% across the number / pseudo-likelihood tests, vs. the best NPP at ~86–88% — and only ~68.6% on the spatial test, which is exactly where forecasting value lives). The authors' conclusion: "current NPP implementations are not yet suitable for practical earthquake forecasting."

Crucially, EarthquakeNPP repaired a data-leakage flaw in earlier neural-TPP-for-earthquakes work (non-chronological / alternating splits that "artificially inflate performance measures due to the nature of earthquake triggering", plus the exclusion of the Tohoku $M_w$ 9.0 sequence). With temporal splits and the big sequences restored, the apparent neural advantage evaporated.

Where this leaves FERN and RECAST. They are the honest exceptions that prove the rule: each shows a real but conditional edge (FERN with sub-$M_c$ augmentation; RECAST on large catalogs), and each is candid that it is not yet CSEP-validated or calibrated. The blunt verdict for 2026:

A well-fit ETAS is the floor a neural model must clear, not a ceiling it can ignore. Neural models add value mainly as an ETAS-generalizing conditional intensity that ingests covariates and learns spatial flexibility — the two FERN wins — not through architectural depth alone.

Scope honesty. "NPPs do not beat ETAS" is established on the California benchmark to date (1971–2021), not proven as a universal law. It justifies shipping an ETAS-class core for v0 and gating any neural model behind a prospective CSEP win — but we do not claim ML can never add skill.

8. Worked illustration: reading an IGPE gain honestly

Suppose a neural challenger reports an Information-Gain-Per-Earthquake of $+0.1$ over ETAS on a region. What does that mean, and is it enough to ship?

IGPE is the mean per-event log-rate difference (in nats, natural-log units — not bits):

$$I_N(A, B) = \frac{1}{N}\sum_{i=1}^{N}\Big(\log \lambda_A(k_i) - \log \lambda_B(k_i)\Big)

\frac{\hat N_A - \hat N_B}{N},$$

with model $A$ = challenger, $B$ = ETAS. A value of $+0.1$ nats means the challenger assigned, on average, $e^{0.1} \approx 1.105\times$ the probability density to events that actually occurred — about a 10% per-event likelihood improvement. That is genuinely the same order as FERN's reported gain.

But a point estimate is not a release decision. The product's rule:

Significance, not just sign. The paired T-test confidence interval on $I_N$ must exclude zero, corroborated by the W-test (Wilcoxon). A $+0.1$ that is statistically indistinguishable from $0$ buys nothing.
State-dependence. IGPE over a baseline is large during active aftershock sequences and near zero in quiet periods — for scale, time-independent contrasts in prospective California CSEP give only about $-0.7$ to $+0.5$ nats. A headline "+0.1 average" must be reported as state-dependent, never as a flat steady-state number.
Calibration. Even a significant, positive IGPE does not ship unless the reliability diagram says "when we said 5%, it happened ~5% of the time" (see Evaluation).

So a $+0.1$-nat challenger is promising, not shippable — it enters the gate, it does not skip it.

9. Role in operational earthquake forecasting

Operational earthquake forecasting (OEF) — issuing recurrent, calibrated probabilities of future seismicity — has, for decades, run on ETAS-class and Reasenberg–Jones-class models precisely because they are interpretable, generative, and CSEP-testable. RECAST and FERN are the most credible candidates to augment that core, but neither is yet an OEF-ready replacement:

FERN's contribution to OEF is a template for principled covariate ingestion (sub-$M_c$ events, and by extension geodesy / strain / multiple catalogs) and learned spatial anisotropy — added on top of an ETAS skeleton, behind a CSEP gate.
RECAST's contribution is a scalable temporal backbone for the rare data-rich regions where catalog size justifies the extra capacity.

In this product, both inform the gated, context-conditioned neural challenger of Models — Employed §5: keep the additive-background-plus-summed-triggering Hawkes skeleton, replace the fixed kernels with small learned components, model magnitude explicitly (a gap EarthquakeNPP flagged in most NPPs), and let it reach the public map only after it beats ETAS in our own prospective CSEP harness and is calibrated. Until then, the classical ETAS reference is the product.

References

Zlydenko, O., Elidan, G., Hassidim, A., Kukliansky, D., Matias, Y., Meade, B., Molchanov, A., Nevo, A. & Bar-Sinai, Y. (2023). A neural encoder for earthquake rate forecasting. Scientific Reports 13, 12350. doi:10.1038/s41598-023-38033-9
Dascher-Cousineau, K., Shchur, O., Brodsky, E.E. & Günnemann, S. (2023). Using deep learning for flexible and scalable earthquake forecasting (RECAST). Geophysical Research Letters 50, e2023GL103909. doi:10.1029/2023GL103909
Stockman, S., Lawson, D. & Werner, M.J. (2026, accepted). EarthquakeNPP: A Benchmark for Earthquake Forecasting with Neural Point Processes. Transactions on Machine Learning Research (TMLR). arXiv:2410.08226
Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association 83(401), 9–27. doi:10.1080/01621459.1988.10478560
Ogata, Y. (1998). Space–time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics 50(2), 379–402. doi:10.1023/A:1003403601725
Zaheer, M., Kottur, S., Ravanbakhsh, S., Póczos, B., Salakhutdinov, R. & Smola, A. (2017). Deep Sets. NeurIPS 2017. arXiv:1703.06114
Schultz, R. (2026). Forecasting the Rate of Induced Seismicity as a Neural Temporal Point Process. JGR: Machine Learning and Computation. doi:10.1029/2025JH001052
Rhoades, D.A., Schorlemmer, D., Gerstenberger, M.C., Christophersen, A., Zechar, J.D. & Imoto, M. (2011). Efficient testing of earthquake forecasting models. Acta Geophysica 59(4), 728–747. doi:10.2478/s11600-011-0013-5
Serafini, F., Bayona, J.A., Silva, F., Savran, W., Stockman, S., Maechling, P.J. & Werner, M.J. (2025). A benchmark database of ten years of prospective next-day earthquake forecasts in California from CSEP. Scientific Data 12, 1501. doi:10.1038/s41597-025-05766-3
CSEP / pyCSEP — Collaboratory for the Study of Earthquake Predictability. https://cseptesting.org · https://github.com/SCECcode/pycsep

See also: Models — Classical · Models — ML · Models — Employed · CNN Spatial Models · Graph & Recurrent Networks · Detection vs. Forecasting · Evaluation · Honest Limits.

⚠️ Disclaimer — read this. CAOS_SEISMIC produces probabilistic forecasts, not predictions. It is an independent research and education tool. It is NOT an official earthquake early-warning or civil-protection system, it does NOT predict when, where, or how large an earthquake will be, and it must NOT be used for life-safety, emergency, or evacuation decisions. Every number it publishes is a bounded, calibrated probability conditioned on the present state of seismicity — never an alarm, a countdown, or a "safe" state. A single outcome neither confirms nor refutes a probabilistic forecast.

It complements, and does not replace or speak for, official agencies — always follow your national seismological and civil-protection authorities (e.g. USGS, INGV, CSN (Chile, SENAPRED for civil protection), GeoNet, JMA). The software is provided "as is", without warranty of any kind (MIT License); the authors accept no liability for its use. Data are courtesy of their providers (USGS/ANSS, ISC/ISC-GEM, Global CMT, EMSC, CSN, and others) under their respective licenses and attribution terms. See Honest-Limits for the full epistemic context.

CAOS_SEISMIC · seismic.fasl-work.com · source · MIT

CAOS_SEISMIC

Conditional probabilistic seismic forecasting — forecasts, never predictions.

Live site · Repo

Overview

Methodology & History

Methodology-History

Classical models

ML & analytical methods

Models employed

Models-Employed

Data

Architecture

Evaluation

Evaluation-and-Tests

Progress

Changelog-and-Progress

Reference

RECAST and FERN

RECAST and FERN — Neural Temporal Point Processes Built for Earthquakes

Table of contents

1. Why these two models matter

2. The shared object: a conditional intensity, learned

3. FERN — a neural encoder that generalizes ETAS

3.1 The architecture (intuition)

3.2 What it actually won

3.3 The honest reading

4. RECAST — a recurrent encoder–decoder neural TPP

4.1 The architecture (intuition)

4.2 What it actually won — and the data-hunger result

4.3 The honest reading

5. Side-by-side: what each one actually buys you

6. The honest caveats (their authors', not ours)

7. The ETAS-vs-NPP verdict and what it means here

8. Worked illustration: reading an IGPE gain honestly

9. Role in operational earthquake forecasting

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CAOS_SEISMIC

Clone this wiki locally