-
Notifications
You must be signed in to change notification settings - Fork 0
Graph and Recurrent Networks
Two large families of neural networks are routinely applied to seismicity: graph neural networks (GNNs), which exploit the natural graph of a seismic-station network, and recurrent networks (RNNs / LSTMs / GRUs), which process event or rate sequences in time. This page treats both in depth: the intuition and governing equations, what they are genuinely good at, the recurring traps when they are pointed at forecasting rather than characterization, and the honest verdict on why — for a calibrated, testable forecasting product — they sit upstream or behind a gate, not at the core.
The framing. GNNs and RNNs are real, useful tools. But their established wins are overwhelmingly on the detection / characterization side (where the answer is about events that already happened), not the forecasting side (where the answer is a calibrated probability of events that have not). Keeping that line explicit is the whole point — see Detection vs. Forecasting.
- Two graphs, two sequence problems — orienting the families
- Graph neural networks — intuition and equations
- GNNs on seismicity — where they win, where they don't
- Recurrent networks — RNN, LSTM, GRU
- RNN/LSTM seismicity-rate regression — the failure modes
- The right recurrent design: a recurrent point process
- Honest verdict and role in this product
- References
It helps to fix what the network is a function of before judging it:
- A GNN operates on a graph
$G = (V, E)$ . In seismology the most productive choice is$V$ = seismic stations,$E$ = station adjacency — the network is the station array. Here the GNN fuses multi-station waveforms to detect, associate, and locate events. A different, harder choice is$V$ = spatial cells or faults, used to forecast — and this is where results weaken. - An RNN / LSTM / GRU operates on a sequence. In seismology that sequence is either a waveform (samples in time, for detection) or an event/rate series (a catalog or binned counts, for forecasting). Again: the waveform/detection use is strong; the rate-regression forecasting use is where the traps live.
The recurring pattern across both families: the graph or sequence over the station network / waveform is a forecasting-irrelevant strength; the graph or sequence over future seismicity is a forecasting-relevant weakness.
flowchart TD
subgraph Strong["Strong, mature — DETECTION / CHARACTERIZATION"]
G1["GNN over station network<br/>(V = stations)"] --> S1["association · location · source params"]
R1["RNN/LSTM over waveform<br/>(samples in time)"] --> S2["phase picking · detection"]
end
subgraph Weak["Weak / unproven — FORECASTING"]
G2["GNN over spatial cells / faults<br/>(V = cells)"] --> W1["magnitude / occurrence forecast<br/>'remains weak for all models'"]
R2["LSTM over binned rate series"] --> W2["next-bin rate regression<br/>un-calibratable"]
end
A GNN learns representations of nodes by passing messages along edges. Each node aggregates
features from its neighbours, transforms the result, and repeats over several layers, so that after
where
with
Why this fits a station array. A seismic network is literally a graph: stations are nodes, geographic/operational proximity defines edges, and an event is observed at many stations at once. A GNN respects that the same wavefront hits the array — it can fuse the multi-station picture in a way a single-station model cannot. Spatio-temporal GNNs add a temporal module (a recurrence or temporal convolution) on top, giving message passing in space and propagation in time.
Where GNNs are genuinely strong. When the graph is the station network, GNNs are a natural and effective fit for:
- Phase association — deciding which picks across many stations belong to the same event.
- Earthquake location — jointly using multi-station arrivals on the array graph.
- Source characterization — fusing station observations to estimate source parameters.
This is active and credible 2024–2025 research (e.g. spatio-temporal graph convolutional networks for source characterization; graph-based association). It is detection-side work: it concerns events that have already occurred.
Where GNNs underwhelm — forecasting. When the graph is re-purposed as spatial cells or faults to forecast future magnitude or occurrence, the literature converges on a blunt finding:
Depth and magnitude prediction "remain weak for all tested models."
The reasons are structural, not incidental:
- Forecasting future magnitude is fighting Gutenberg–Richter: given that an event occurs, its magnitude is approximately memoryless, so there is little learnable signal in "what size is next" (see Honest Limits).
- A cell-graph GNN that outputs a per-cell occurrence label inherits the DeVries trap — correlated cells, classification metrics, no survival term (see CNN Spatial Models).
- No GNN forecaster has passed prospective CSEP testing against ETAS.
The honest summary: GNNs help where the graph is the station network (detection-side), not where you need a calibrated future rate. A 2025 hybrid spatio-temporal GNN line of work (Frontiers in AI) is interesting research, not a shipping forecaster.
A recurrent network maintains a hidden state
Vanilla RNNs suffer vanishing/exploding gradients over long sequences. The LSTM (Hochreiter &
Schmidhuber 1997) fixes this with a gated cell state
The forget gate
A very common — and very flawed — design feeds binned seismicity-rate or magnitude time series into an LSTM to "predict the next bin." It looks reasonable and fails for structural reasons:
-
It drops the point-process survival term. Regressing a binned rate abandons the compensator
integral
$\int_0^T \lambda^*(\tau),d\tau$ that makes a forecast a proper probability. The output is un-calibratable — there is no honest probability to publish (see RECAST and FERN §2). - Class imbalance defeats it. Large events are rare, so a model trained to minimize average error learns to predict "no large event" essentially always. It scores beautifully on accuracy and fails on exactly the events that matter — the same imbalance pathology that makes accuracy useless here.
- Shuffled splits leak the future. Random train/test shuffling of a clustered catalog lets the model see aftershocks of a sequence whose mainshock is in the test set — the data-leakage flaw EarthquakeNPP identified, which "artificially inflates performance measures due to the nature of earthquake triggering." Metrics that look strong under shuffling evaporate under chronological splits.
- No built-in seismological physics. An LSTM's memory decay biases it toward the most recent bin; it has no built-in Omori–Utsu decay or Gutenberg–Richter magnitude law, so it must re-learn from scarce data what ETAS encodes for free.
The net effect: binned-rate LSTM "forecasters" tend to report impressive retrospective numbers that do not survive a fair, prospective, calibration-aware evaluation.
Recurrence is not the problem — recurrence used to regress a rate is. The principled use of an RNN in
forecasting is to encode history inside a temporal point process, so the survival term is retained.
The foundational example is RMTPP (Du et al. 2016): an RNN encodes the history after the
where the
which can even let past events lower future intensity (inhibition) — something a classical Hawkes
process cannot represent. RECAST (Dascher-Cousineau et al. 2023) is the earthquake-specific member
of this lineage: a GRU encoder + neural-density decoder, which beats temporal ETAS only on large
catalogs (
The distinction in one line. An LSTM that regresses a binned rate is un-calibratable and leaks; an LSTM (or GRU) that parameterizes a conditional intensity with the survival term is a legitimate, testable forecaster. This product only ever considers the second kind, and even then behind a CSEP gate.
-
GNNs: kept upstream, on the detection side, where the graph is the station network —
association, location, source characterization. They build better, more complete catalogs (a
lower, more stable
$M_c$ ), which is the single biggest realizable near-term lever for both ETAS and any neural forecaster. They are not used as a cell-graph occurrence classifier. - RNN/LSTM/GRU: never as a binned-rate regressor. The only admissible recurrent forecaster is a recurrent neural point process (RECAST-style), and only as the gated challenger of Models — Employed §5 — it reaches the public map solely if it beats ETAS in our own prospective CSEP harness and is calibrated.
This places both families exactly where the evidence supports them: GNNs upstream building the catalog, recurrent point processes as a gated challenger, and the calibrated classical ETAS reference as the shipping core. The line between detection and forecasting is kept explicit throughout — see Detection vs. Forecasting.
- Kipf, T.N. & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. ICLR 2017. arXiv:1609.02907
- Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O. & Dahl, G.E. (2017). Neural Message Passing for Quantum Chemistry. ICML 2017. arXiv:1704.01212
- Hochreiter, S. & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation 9(8), 1735–1780. doi:10.1162/neco.1997.9.8.1735
- Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. EMNLP 2014. arXiv:1406.1078
- Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M. & Song, L. (2016). Recurrent Marked Temporal Point Processes. KDD 2016. doi:10.1145/2939672.2939875
- Mei, H. & Eisner, J. (2017). The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process. NeurIPS 2017. arXiv:1612.09328
- Dascher-Cousineau, K., Shchur, O., Brodsky, E.E. & Günnemann, S. (2023). Using deep learning for flexible and scalable earthquake forecasting (RECAST). Geophysical Research Letters 50, e2023GL103909. doi:10.1029/2023GL103909
- Stockman, S., Lawson, D. & Werner, M.J. (2026, accepted). EarthquakeNPP: A Benchmark for Earthquake Forecasting with Neural Point Processes. TMLR. arXiv:2410.08226
- McBrearty, I.W. & Beroza, G.C. (2023). Earthquake phase association with graph neural networks. Bulletin of the Seismological Society of America 113(2), 524–547. doi:10.1785/0120220182
- Mousavi, S.M. & Beroza, G.C. (2022). Deep-learning seismology. Science 377, eabm4470. doi:10.1126/science.abm4470
See also: Models — Classical · Models — ML · Models — Employed · RECAST and FERN · CNN Spatial Models · Detection vs. Forecasting · Evaluation · Honest Limits.
⚠️ Disclaimer — read this. CAOS_SEISMIC produces probabilistic forecasts, not predictions. It is an independent research and education tool. It is NOT an official earthquake early-warning or civil-protection system, it does NOT predict when, where, or how large an earthquake will be, and it must NOT be used for life-safety, emergency, or evacuation decisions. Every number it publishes is a bounded, calibrated probability conditioned on the present state of seismicity — never an alarm, a countdown, or a "safe" state. A single outcome neither confirms nor refutes a probabilistic forecast.It complements, and does not replace or speak for, official agencies — always follow your national seismological and civil-protection authorities (e.g. USGS, INGV, CSN (Chile, SENAPRED for civil protection), GeoNet, JMA). The software is provided "as is", without warranty of any kind (MIT License); the authors accept no liability for its use. Data are courtesy of their providers (USGS/ANSS, ISC/ISC-GEM, Global CMT, EMSC, CSN, and others) under their respective licenses and attribution terms. See Honest-Limits for the full epistemic context.
CAOS_SEISMIC · seismic.fasl-work.com · source · MIT
Conditional probabilistic seismic forecasting — forecasts, never predictions.
Overview
Methodology & History
Classical models
- Models-Classical · index
- Gutenberg-Richter-Law
- Omori-Utsu-Law
- ETAS-Model
- Reasenberg-Jones-Model
- STEP-Model
- EEPAS-Model
- Smoothed-Seismicity
- Brownian-Passage-Time
- Rate-and-State-and-Coulomb
ML & analytical methods
- Models-ML · index
- Temporal-Point-Processes
- RMTPP
- Neural-Hawkes-Process
- Transformer-Hawkes-Process
- RECAST-and-FERN
- CNN-Spatial-Models
- Graph-and-Recurrent-Networks
- Detection-vs-Forecasting
Models employed
Data
Architecture
Evaluation
Progress
Reference