-
Notifications
You must be signed in to change notification settings - Fork 0
Neural Hawkes Process
One topic, in depth. The Neural Hawkes Process (NHP) is the neural temporal point process that fixed RMTPP's biggest rigidity: instead of freezing the history vector between events, it lets the hidden state evolve continuously in time via a continuous-time LSTM. This grants the model an ability a classical Hawkes process structurally lacks — inhibition, where a past event can lower the future rate. This page derives the continuous-time cell, the intensity, and the likelihood, and gives an honest account of its relevance to seismic forecasting.
Honest framing up front. NHP was introduced and validated on non-seismic event streams (retail, social media, electronic health records, synthetic self-modulating processes). It is a foundational neural-TPP architecture, not an earthquake model. Its log-likelihood wins on those domains do not automatically transfer to seismicity: under fair, prospective, CSEP-style testing on earthquake catalogs, no neural point process has robustly beaten a well-fit ETAS as of 2026 (see Models — ML §4). NHP belongs in this wiki as a key conceptual step between RMTPP and Transformer Hawkes, and because its self-modulation idea (events can excite or inhibit) is genuinely relevant to seismicity's mix of triggering and quiescence.
- Intuition: a self-modulating Hawkes process
- From discrete-state to continuous-time LSTM
- The conditional intensity
- Excitation and inhibition (what classical Hawkes cannot do)
- The likelihood and its Monte-Carlo compensator
- Parameter estimation and practicalities
- Strengths
- Limitations — and why they matter for seismicity
- Role in operational earthquake forecasting
- Worked illustration
- References
A classical Hawkes process (the point-process skeleton of ETAS) is
purely self-exciting: every past event adds a non-negative kick to the intensity,
Mei & Eisner (2017) generalized this into a neurally self-modulating point process. The idea:
let a recurrent hidden state
For seismicity this is conceptually attractive: real catalogs show both excitation (Omori aftershock bursts) and effective quiescence (rate drops, stress shadows, post-seismic relaxation patterns) that a purely additive, non-negative kernel cannot represent.
A standard LSTM updates a cell state
-
$c_j$ — the cell state immediately after event$j$ (the usual LSTM input/forget/cell update, applied at the event). -
$\bar{c}_j$ — the target the cell decays toward between events (a second, learned cell vector). -
$\delta_j > 0$ — a learned per-dimension decay rate (output of a softplus gate). Large$\delta$ means fast relaxation to baseline; small$\delta$ means long memory.
The hidden state is then
At each event the cLSTM performs a standard gated update (input, forget, output gates, plus the extra
decay-rate gate producing
flowchart LR
subgraph "Between events: continuous decay"
C0["cⱼ (post-event)"] -->|"c(t)=c̄ⱼ+(cⱼ−c̄ⱼ)e^(−δⱼ(t−tⱼ))"| HT["h(t)=oⱼ⊙tanh(c(t))"]
end
HT --> INT["λ*ₖ(t)=softplus(wₖᵀ h(t))"]
EV["event (tⱼ,kⱼ)"] -->|gated cLSTM update| C0
EV -.->|learned δⱼ via softplus| C0
INT --> LL["log-likelihood<br/>Σ log λ* − ∫ λ* dτ (Monte-Carlo)"]
For each event type (mark)
where
Why softplus rather than RMTPP's
The mark distribution at an event is obtained from the per-type intensities, $P(k \mid t) = \lambda^k(t),/,\sum{k'}\lambda^_{k'}(t)$ — a natural multivariate competing-risks form.
This is the headline capability. In a classical Hawkes process the kernel
- An event updates the cell state
$c_j$ and the decay target$\bar{c}_j$ . If the update drives$\mathbf{w}_k^{\top} h(t)$ downward, the intensity for type$k$ decreases after that event — inhibition. So "this event makes the next one less likely / later" is representable. - The baseline itself is self-modulating:
$h(t)$ drifts toward a learned target, so the effective background rate is not a fixed$\mu$ but a state-dependent, evolving quantity.
For seismicity, inhibition and a modulating baseline map (loosely) onto phenomena a purely excitatory ETAS handles only by hand: stress shadows, post-mainshock rate deficits in some regions, and non-stationary background driven by transients. NHP can represent these in principle. Whether it estimates them reliably from limited earthquake data — and whether that helps a calibrated forecast — is a different question, answered honestly in §8.
NHP is trained on the standard point-process log-likelihood:
Unlike RMTPP, the compensator
This is the practical price of continuous-time expressiveness — the survival penalty is now an estimated quantity rather than an analytic one. It remains the same compensator that makes the model calibratable: it penalizes intensity placed where no event occurred. (This is the term that classification/regression framings discard, the structural failure noted on Models — ML.)
-
Parameters. All cLSTM gate weights (input/forget/output, plus the decay-rate gate producing
$\delta_j$ ), the two cell vectors per step ($c_j$ ,$\bar{c}_j$ ), the per-type intensity weights$\mathbf{w}_k$ and scales$s_k$ , and the mark embeddings — trained end-to-end by stochastic gradient ascent (BPTT) on the Monte-Carlo log-likelihood. -
Compensator samples (
$L$ ). A bias/variance knob: too few samples gives a noisy survival penalty; too many is slow. Implementations tune$L$ per interval length. - Cost. Heavier than RMTPP (continuous-state decay + MC integration), but still cheap at inference relative to large ETAS simulations.
- Data hunger / overfitting. Like every neural TPP, NHP has many parameters and needs many effectively-independent sequences. Seismic catalogs supply few independent large sequences — the exact condition under which over-parameterized models overfit (the DeVries–Mignan lesson, Models — ML §5). The product's response is unchanged: any neural model must beat ETAS in a strictly temporal, prospective CSEP harness and pass calibration before it ships.
- Inhibition / self-modulation. Can represent an event lowering the future rate and a time-varying baseline — strictly more expressive than a non-negative-kernel Hawkes/ETAS.
-
Non-monotone intra-interval intensity. Because
$h(t)$ evolves continuously, the intensity can rise and fall between events — fixing RMTPP's monotone-interval rigidity. -
Continuous-time memory with controllable horizon. The learned decay rates
$\delta_j$ let the model keep some dimensions long-lived (slow context) and others short-lived (recent dynamics). - Principled probabilistic object. Trained on the true point-process likelihood with a (Monte-Carlo) compensator, so it yields a proper, calibratable intensity — not a classifier.
- Validated off-domain. NHP's gains are on retail/EHR/social/synthetic streams. They do not auto-transfer to earthquakes. On earthquake catalogs under fair temporal splits, neural TPPs of this family have not robustly beaten ETAS (EarthquakeNPP; see Models — ML §4).
- Recurrent memory bottleneck. A single evolving state must summarize all history; very long-range or cross-fault dependencies can still be lost to recurrence decay — the gap that motivates the attention-based Transformer Hawkes Process.
-
No native spatial kernel. NHP is a multivariate temporal process (event types), not a
continuous spatial one. Seismic forecasting needs a spatial density over
$(x,y)$ ; discretizing space into "types" is crude and scales poorly. - No built-in seismic physics. No Omori law, no Gutenberg–Richter magnitude distribution, no branching-ratio subcriticality constraint unless added by hand. A free NHP can drift from seismologically sensible behaviour.
- Costlier, approximate likelihood. The Monte-Carlo compensator adds variance and compute relative to ETAS's analytic integrals and RMTPP's closed form.
-
Interpretability. Parameters are not seismologically meaningful (no
$p$ ,$c$ ,$\alpha$ ,$b$ ), making expert review and uncertainty propagation harder than for ETAS.
Net assessment for this product. NHP's self-modulation is a genuinely interesting capability, and it is studied in the neural-challenger research track. But it is never a default forecaster. If a seismic adaptation of this idea is pursued, it keeps the ETAS skeleton (additive background + summed triggering), models magnitude and space explicitly, and must clear the hard gate: a prospective CSEP win over a well-fit ETAS plus a passing reliability diagram.
NHP has no direct operational role in CAOS_SEISMIC today. Its contributions are conceptual:
- Self-modulation as a design idea. The notion that the background and the event-influence can be learned and time-varying — including inhibition — is a useful lens on non-stationary seismicity (transients, stress shadows) that a fixed-$\mu$, non-negative-kernel ETAS handles only by hand.
- Continuous-time state as a template. The cLSTM's continuously evolving hidden state is the recurrent counterpart to attention-based history summaries, both of which a seismic neural challenger might use on top of an ETAS inductive bias rather than as a free-form replacement.
- A cautionary data-point. That such an expressive, principled model still does not beat ETAS on earthquakes under fair testing is exactly why the product ships an ETAS-class core and gates all neural work behind prospective CSEP skill + calibration.
In OEF terms, NHP widens what an intensity can express; the seismic evidence keeps the burden of proof on demonstrated prospective skill, not expressiveness.
Take a single intensity dimension (drop the type subscript) with learned scale
The intensity is
| elapsed |
|
|
|---|---|---|
| 0.0 | ||
| 0.5 | ||
| 1.0 | ||
| 2.0 | ||
The intensity starts elevated (just under 1/day), falls below where it would settle, and relaxes
toward a baseline of
The probability of at least one event within
- Mei, H. & Eisner, J. (2017). The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process. Advances in Neural Information Processing Systems (NeurIPS) 30. arXiv:1612.09328
- Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M. & Song, L. (2016). Recurrent Marked Temporal Point Processes. KDD 2016. doi:10.1145/2939672.2939875
- Hawkes, A.G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1), 83–90. doi:10.1093/biomet/58.1.83
- Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. JASA 83(401), 9–27. doi:10.1080/01621459.1988.10478560
- Shchur, O., Türkmen, A.C., Januschowski, T. & Günnemann, S. (2021). Neural Temporal Point Processes: A Review. IJCAI 2021 Survey Track. arXiv:2104.03528
- Zuo, S., Jiang, H., Li, Z., Zhao, T. & Zha, H. (2020). Transformer Hawkes Process. ICML 2020, PMLR v119, 11692–11702.
- Stockman, S., Lawson, D. & Werner, M.J. (2026, accepted). EarthquakeNPP: A Benchmark for Earthquake Forecasting with Neural Point Processes. TMLR. arXiv:2410.08226
See also: Temporal Point Processes · RMTPP · Transformer Hawkes Process · Models — ML · Models — Classical · Honest-Limits.
⚠️ Disclaimer — read this. CAOS_SEISMIC produces probabilistic forecasts, not predictions. It is an independent research and education tool. It is NOT an official earthquake early-warning or civil-protection system, it does NOT predict when, where, or how large an earthquake will be, and it must NOT be used for life-safety, emergency, or evacuation decisions. Every number it publishes is a bounded, calibrated probability conditioned on the present state of seismicity — never an alarm, a countdown, or a "safe" state. A single outcome neither confirms nor refutes a probabilistic forecast.It complements, and does not replace or speak for, official agencies — always follow your national seismological and civil-protection authorities (e.g. USGS, INGV, CSN (Chile, SENAPRED for civil protection), GeoNet, JMA). The software is provided "as is", without warranty of any kind (MIT License); the authors accept no liability for its use. Data are courtesy of their providers (USGS/ANSS, ISC/ISC-GEM, Global CMT, EMSC, CSN, and others) under their respective licenses and attribution terms. See Honest-Limits for the full epistemic context.
CAOS_SEISMIC · seismic.fasl-work.com · source · MIT
Conditional probabilistic seismic forecasting — forecasts, never predictions.
Overview
Methodology & History
Classical models
- Models-Classical · index
- Gutenberg-Richter-Law
- Omori-Utsu-Law
- ETAS-Model
- Reasenberg-Jones-Model
- STEP-Model
- EEPAS-Model
- Smoothed-Seismicity
- Brownian-Passage-Time
- Rate-and-State-and-Coulomb
ML & analytical methods
- Models-ML · index
- Temporal-Point-Processes
- RMTPP
- Neural-Hawkes-Process
- Transformer-Hawkes-Process
- RECAST-and-FERN
- CNN-Spatial-Models
- Graph-and-Recurrent-Networks
- Detection-vs-Forecasting
Models employed
Data
Architecture
Evaluation
Progress
Reference