Graph and Recurrent Networks

Graph and Recurrent Networks for Seismicity — GNNs, RNNs, and LSTMs

Two large families of neural networks are routinely applied to seismicity: graph neural networks (GNNs), which exploit the natural graph of a seismic-station network, and recurrent networks (RNNs / LSTMs / GRUs), which process event or rate sequences in time. This page treats both in depth: the intuition and governing equations, what they are genuinely good at, the recurring traps when they are pointed at forecasting rather than characterization, and the honest verdict on why — for a calibrated, testable forecasting product — they sit upstream or behind a gate, not at the core.

The framing. GNNs and RNNs are real, useful tools. But their established wins are overwhelmingly on the detection / characterization side (where the answer is about events that already happened), not the forecasting side (where the answer is a calibrated probability of events that have not). Keeping that line explicit is the whole point — see Detection vs. Forecasting.

Two graphs, two sequence problems — orienting the families
Graph neural networks — intuition and equations
GNNs on seismicity — where they win, where they don't
Recurrent networks — RNN, LSTM, GRU
RNN/LSTM seismicity-rate regression — the failure modes
The right recurrent design: a recurrent point process
Honest verdict and role in this product
References

1. Two graphs, two sequence problems — orienting the families

It helps to fix what the network is a function of before judging it:

A GNN operates on a graph $G = (V, E)$. In seismology the most productive choice is $V$ = seismic stations, $E$ = station adjacency — the network is the station array. Here the GNN fuses multi-station waveforms to detect, associate, and locate events. A different, harder choice is $V$ = spatial cells or faults, used to forecast — and this is where results weaken.
An RNN / LSTM / GRU operates on a sequence. In seismology that sequence is either a waveform (samples in time, for detection) or an event/rate series (a catalog or binned counts, for forecasting). Again: the waveform/detection use is strong; the rate-regression forecasting use is where the traps live.

The recurring pattern across both families: the graph or sequence over the station network / waveform is a forecasting-irrelevant strength; the graph or sequence over future seismicity is a forecasting-relevant weakness.

flowchart TD
    subgraph Strong["Strong, mature — DETECTION / CHARACTERIZATION"]
        G1["GNN over station network<br/>(V = stations)"] --> S1["association · location · source params"]
        R1["RNN/LSTM over waveform<br/>(samples in time)"] --> S2["phase picking · detection"]
    end
    subgraph Weak["Weak / unproven — FORECASTING"]
        G2["GNN over spatial cells / faults<br/>(V = cells)"] --> W1["magnitude / occurrence forecast<br/>'remains weak for all models'"]
        R2["LSTM over binned rate series"] --> W2["next-bin rate regression<br/>un-calibratable"]
    end

2. Graph neural networks — intuition and equations

A GNN learns representations of nodes by passing messages along edges. Each node aggregates features from its neighbours, transforms the result, and repeats over several layers, so that after $k$ layers a node "sees" its $k$-hop neighbourhood. The canonical message-passing update for node $v$ at layer $\ell$ is

$$h_v^{(\ell+1)} = \phi!\Big( h_v^{(\ell)},; \bigoplus_{u \in \mathcal{N}(v)} \psi\big(h_v^{(\ell)}, h_u^{(\ell)}, e_{uv}\big) \Big),$$

where $\mathcal{N}(v)$ is the neighbour set, $\psi$ is a learnable message function, $\bigoplus$ a permutation-invariant aggregator (sum / mean / max), $\phi$ a learnable update, and $e_{uv}$ optional edge features. The graph convolutional special case (Kipf & Welling 2017) is

$$H^{(\ell+1)} = \sigma!\Big( \tilde{D}^{-1/2}, \tilde{A}, \tilde{D}^{-1/2}, H^{(\ell)}, W^{(\ell)} \Big),$$

with $\tilde{A} = A + I$ the adjacency-plus-self-loops, $\tilde{D}$ its degree matrix, $H^{(\ell)}$ the node-feature matrix, $W^{(\ell)}$ the learnable weights, and $\sigma$ a nonlinearity. The $\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$ factor is a symmetric normalization of the neighbour average.

Why this fits a station array. A seismic network is literally a graph: stations are nodes, geographic/operational proximity defines edges, and an event is observed at many stations at once. A GNN respects that the same wavefront hits the array — it can fuse the multi-station picture in a way a single-station model cannot. Spatio-temporal GNNs add a temporal module (a recurrence or temporal convolution) on top, giving message passing in space and propagation in time.

3. GNNs on seismicity — where they win, where they don't

Where GNNs are genuinely strong. When the graph is the station network, GNNs are a natural and effective fit for:

Phase association — deciding which picks across many stations belong to the same event.
Earthquake location — jointly using multi-station arrivals on the array graph.
Source characterization — fusing station observations to estimate source parameters.

This is active and credible 2024–2025 research (e.g. spatio-temporal graph convolutional networks for source characterization; graph-based association). It is detection-side work: it concerns events that have already occurred.

Where GNNs underwhelm — forecasting. When the graph is re-purposed as spatial cells or faults to forecast future magnitude or occurrence, the literature converges on a blunt finding:

Depth and magnitude prediction "remain weak for all tested models."

The reasons are structural, not incidental:

Forecasting future magnitude is fighting Gutenberg–Richter: given that an event occurs, its magnitude is approximately memoryless, so there is little learnable signal in "what size is next" (see Honest Limits).
A cell-graph GNN that outputs a per-cell occurrence label inherits the DeVries trap — correlated cells, classification metrics, no survival term (see CNN Spatial Models).
No GNN forecaster has passed prospective CSEP testing against ETAS.

The honest summary: GNNs help where the graph is the station network (detection-side), not where you need a calibrated future rate. A 2025 hybrid spatio-temporal GNN line of work (Frontiers in AI) is interesting research, not a shipping forecaster.

4. Recurrent networks — RNN, LSTM, GRU

A recurrent network maintains a hidden state $h_t$ that it updates as it consumes a sequence, giving it memory of the past. The vanilla RNN update is

$$h_t = \tanh!\big(W_{hh}, h_{t-1} + W_{xh}, x_t + b_h\big), \qquad y_t = W_{hy}, h_t + b_y.$$

Vanilla RNNs suffer vanishing/exploding gradients over long sequences. The LSTM (Hochreiter & Schmidhuber 1997) fixes this with a gated cell state $c_t$ and input/forget/output gates:

$$ \begin{aligned} f_t &= \sigma(W_f [h_{t-1}, x_t] + b_f), &\quad i_t &= \sigma(W_i [h_{t-1}, x_t] + b_i), \\ \tilde{c}_t &= \tanh(W_c [h_{t-1}, x_t] + b_c), &\quad o_t &= \sigma(W_o [h_{t-1}, x_t] + b_o), \\ c_t &= f_t \odot c_{t-1} + i_t \odot \tilde{c}_t, &\quad h_t &= o_t \odot \tanh(c_t). \end{aligned} $$

The forget gate $f_t$ lets the cell retain information across long gaps, mitigating vanishing gradients. The GRU (Cho et al. 2014) is a lighter two-gate variant (update + reset) that RECAST uses as its encoder. All three are excellent sequence encoders — the question is always what sequence, predicting what target.

5. RNN/LSTM seismicity-rate regression — the failure modes

A very common — and very flawed — design feeds binned seismicity-rate or magnitude time series into an LSTM to "predict the next bin." It looks reasonable and fails for structural reasons:

It drops the point-process survival term. Regressing a binned rate abandons the compensator integral $\int_0^T \lambda^*(\tau),d\tau$ that makes a forecast a proper probability. The output is un-calibratable — there is no honest probability to publish (see RECAST and FERN §2).
Class imbalance defeats it. Large events are rare, so a model trained to minimize average error learns to predict "no large event" essentially always. It scores beautifully on accuracy and fails on exactly the events that matter — the same imbalance pathology that makes accuracy useless here.
Shuffled splits leak the future. Random train/test shuffling of a clustered catalog lets the model see aftershocks of a sequence whose mainshock is in the test set — the data-leakage flaw EarthquakeNPP identified, which "artificially inflates performance measures due to the nature of earthquake triggering." Metrics that look strong under shuffling evaporate under chronological splits.
No built-in seismological physics. An LSTM's memory decay biases it toward the most recent bin; it has no built-in Omori–Utsu decay or Gutenberg–Richter magnitude law, so it must re-learn from scarce data what ETAS encodes for free.

The net effect: binned-rate LSTM "forecasters" tend to report impressive retrospective numbers that do not survive a fair, prospective, calibration-aware evaluation.

6. The right recurrent design: a recurrent point process

Recurrence is not the problem — recurrence used to regress a rate is. The principled use of an RNN in forecasting is to encode history inside a temporal point process, so the survival term is retained. The foundational example is RMTPP (Du et al. 2016): an RNN encodes the history after the $j$-th event into a hidden state $h_j$, and the conditional intensity between events is

$$\lambda^*(t) = \exp!\big(\mathbf{v}^{\top} h_j + w,(t - t_j) + b\big),$$

where the $w(t - t_j)$ term gives a log-linear, Omori-like decay and $h_j$ carries marked history. The Neural Hawkes process (Mei & Eisner 2017) goes further with a continuous-time LSTM whose cell state decays between events,

$$\lambda^*(t) = \mathrm{softplus}!\big(\mathbf{v}^{\top} h(t)\big),$$

which can even let past events lower future intensity (inhibition) — something a classical Hawkes process cannot represent. RECAST (Dascher-Cousineau et al. 2023) is the earthquake-specific member of this lineage: a GRU encoder + neural-density decoder, which beats temporal ETAS only on large catalogs ($\gtrsim 10^4$ events) and otherwise matches it (see RECAST and FERN).

The distinction in one line. An LSTM that regresses a binned rate is un-calibratable and leaks; an LSTM (or GRU) that parameterizes a conditional intensity with the survival term is a legitimate, testable forecaster. This product only ever considers the second kind, and even then behind a CSEP gate.

7. Honest verdict and role in this product

GNNs: kept upstream, on the detection side, where the graph is the station network — association, location, source characterization. They build better, more complete catalogs (a lower, more stable $M_c$), which is the single biggest realizable near-term lever for both ETAS and any neural forecaster. They are not used as a cell-graph occurrence classifier.
RNN/LSTM/GRU: never as a binned-rate regressor. The only admissible recurrent forecaster is a recurrent neural point process (RECAST-style), and only as the gated challenger of Models — Employed §5 — it reaches the public map solely if it beats ETAS in our own prospective CSEP harness and is calibrated.

This places both families exactly where the evidence supports them: GNNs upstream building the catalog, recurrent point processes as a gated challenger, and the calibrated classical ETAS reference as the shipping core. The line between detection and forecasting is kept explicit throughout — see Detection vs. Forecasting.

References

Kipf, T.N. & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. ICLR 2017. arXiv:1609.02907
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O. & Dahl, G.E. (2017). Neural Message Passing for Quantum Chemistry. ICML 2017. arXiv:1704.01212
Hochreiter, S. & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation 9(8), 1735–1780. doi:10.1162/neco.1997.9.8.1735
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. EMNLP 2014. arXiv:1406.1078
Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M. & Song, L. (2016). Recurrent Marked Temporal Point Processes. KDD 2016. doi:10.1145/2939672.2939875
Mei, H. & Eisner, J. (2017). The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process. NeurIPS 2017. arXiv:1612.09328
Dascher-Cousineau, K., Shchur, O., Brodsky, E.E. & Günnemann, S. (2023). Using deep learning for flexible and scalable earthquake forecasting (RECAST). Geophysical Research Letters 50, e2023GL103909. doi:10.1029/2023GL103909
Stockman, S., Lawson, D. & Werner, M.J. (2026, accepted). EarthquakeNPP: A Benchmark for Earthquake Forecasting with Neural Point Processes. TMLR. arXiv:2410.08226
McBrearty, I.W. & Beroza, G.C. (2023). Earthquake phase association with graph neural networks. Bulletin of the Seismological Society of America 113(2), 524–547. doi:10.1785/0120220182
Mousavi, S.M. & Beroza, G.C. (2022). Deep-learning seismology. Science 377, eabm4470. doi:10.1126/science.abm4470

See also: Models — Classical · Models — ML · Models — Employed · RECAST and FERN · CNN Spatial Models · Detection vs. Forecasting · Evaluation · Honest Limits.

⚠️ Disclaimer — read this. CAOS_SEISMIC produces probabilistic forecasts, not predictions. It is an independent research and education tool. It is NOT an official earthquake early-warning or civil-protection system, it does NOT predict when, where, or how large an earthquake will be, and it must NOT be used for life-safety, emergency, or evacuation decisions. Every number it publishes is a bounded, calibrated probability conditioned on the present state of seismicity — never an alarm, a countdown, or a "safe" state. A single outcome neither confirms nor refutes a probabilistic forecast.

It complements, and does not replace or speak for, official agencies — always follow your national seismological and civil-protection authorities (e.g. USGS, INGV, CSN (Chile, SENAPRED for civil protection), GeoNet, JMA). The software is provided "as is", without warranty of any kind (MIT License); the authors accept no liability for its use. Data are courtesy of their providers (USGS/ANSS, ISC/ISC-GEM, Global CMT, EMSC, CSN, and others) under their respective licenses and attribution terms. See Honest-Limits for the full epistemic context.

CAOS_SEISMIC · seismic.fasl-work.com · source · MIT

CAOS_SEISMIC

Conditional probabilistic seismic forecasting — forecasts, never predictions.

Live site · Repo

Overview

Methodology & History

Methodology-History

Classical models

ML & analytical methods

Models employed

Models-Employed

Data

Architecture

Evaluation

Evaluation-and-Tests

Progress

Changelog-and-Progress

Reference

Graph and Recurrent Networks

Graph and Recurrent Networks for Seismicity — GNNs, RNNs, and LSTMs

Table of contents

1. Two graphs, two sequence problems — orienting the families

2. Graph neural networks — intuition and equations

3. GNNs on seismicity — where they win, where they don't

4. Recurrent networks — RNN, LSTM, GRU

5. RNN/LSTM seismicity-rate regression — the failure modes

6. The right recurrent design: a recurrent point process

7. Honest verdict and role in this product

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CAOS_SEISMIC

Clone this wiki locally