-
Notifications
You must be signed in to change notification settings - Fork 0
Temporal Point Processes
One topic, in depth. This page is the mathematical spine of every forecasting model in CAOS_SEISMIC. It defines the object all of them estimate — the conditional intensity function
$\lambda^*(t \mid \mathcal{H}_t)$ — and the point-process log-likelihood that turns an intensity into a calibratable probability. Classical models (ETAS, Omori–Utsu, Reasenberg–Jones) and neural models (RMTPP, Neural Hawkes, Transformer Hawkes) are all special cases of what is defined here. If you read only one page on the why of the model family, read this one.
Honest framing up front. Everything below describes how to estimate rates and probabilities of
seismicity, not how to predict individual earthquakes. A point-process model never says "an
earthquake will happen." It says "given the history, the expected rate over the next window is
- Intuition: a catalog is a realization of a point process
- The counting process and the conditional intensity
- From intensity to a full probability model
- The point-process log-likelihood (the heart of it)
- Marks: magnitude and space
- Self-exciting (Hawkes) processes — the seismic special case
- Parameter estimation: MLE, the compensator, and residual analysis
- Simulation: from intensity to forecasts
- Why this framework, and its limits
- Role in operational earthquake forecasting
- Worked illustration
- References
An earthquake catalog is a list of events
The key intuition is that we do not try to model each
This is the same mathematical object that governs queue arrivals, neuron firing, financial trades, and social-media cascades. What makes seismic point processes special is the physics baked into the intensity (Omori decay, Gutenberg–Richter magnitudes, self-excitation) — but the scaffolding is universal, which is precisely why the neural architectures on the sibling pages, originally built for non-seismic streams, are even candidates here.
Let
denote the history: everything observed strictly before
The superscript $$ is the standard Daley–Vere-Jones notation reminding us that the intensity is
conditioned on history — without it, $\lambda$ would be a deterministic (Poisson) rate. Reading
the definition: in a tiny window $[t, t+\Delta t)$, the probability of exactly one event is
$\lambda^(t),\Delta t + o(\Delta t)$, and the probability of two or more is
A few immediate consequences:
-
Non-negativity.
$\lambda^*(t) \ge 0$ always — it is a rate. Neural parameterizations enforce this with asoftplusorexpoutput (see Neural Hawkes and RMTPP). -
History dependence is the whole game. If
$\lambda^*$ did not depend on$\mathcal{H}_t$ , the process would be an inhomogeneous Poisson process and earthquakes would be "memoryless" in their occurrence — which contradicts the observed Omori-law clustering. The dependence on$\mathcal{H}_t$ is what lets a mainshock raise the future rate. -
The compensator. The integral
$\Lambda(t) = \int_0^t \lambda^*(\tau),d\tau$ is the compensator (cumulative intensity). It is the expected number of events by time$t$ and is the engine of both the likelihood (§4) and residual diagnostics (§7).
The conditional intensity determines the entire distribution of the next event. Define the
conditional survival between the last event at
The probability density of the next event time is then
This factorization — an instantaneous hazard $\lambda^(t)$ multiplied by the survival up to that instant — is the single most important identity in the theory. It says the model is fully specified by $\lambda^$ alone: give me the intensity and I can write down the density of every inter-event time, hence the joint density of the whole catalog.
Crucially, the probability of at least one event in a forecast window
This is exactly the exceedance probability the product publishes (restricted to magnitudes
flowchart LR
H["History H_t<br/>(t_i, x_i, y_i, m_i)"] --> L["Conditional intensity<br/>λ*(t | H_t)"]
L --> C["Compensator<br/>Λ = ∫ λ* dτ"]
C --> S["Survival<br/>S*(t) = exp(−Λ)"]
S --> P["Exceedance probability<br/>P(≥1 event) = 1 − e^(−Λ)"]
L --> LL["Log-likelihood<br/>Σ log λ*(t_i) − ∫ λ* dτ"]
LL --> FIT["MLE / training"]
L --> SIM["Ogata thinning<br/>→ synthetic catalogs"]
SIM --> FC["Forecast bands<br/>(CSEP tests)"]
Given an observed catalog with event times
Read the two terms separately, because the contrast is the whole point:
-
The sum
$\sum_i \log \lambda^*(t_i)$ rewards placing high intensity exactly where events actually occurred. A model that puts a tall spike on every observed event maximizes this term. -
The integral
$\int_0^T \lambda^*(\tau),d\tau$ — the compensator — is the survival penalty. It punishes putting high intensity where events did not occur. A model that is "always alarmed" pays a large integral penalty.
These two terms are in tension, and that tension is what forces a model to output an honest rate
rather than a binary "alarm / no alarm." A classifier ("will there be an
Operational note. The same log-likelihood is reused at test time as the CSEP L-test (likelihood test) and pseudo-likelihood score — so the quantity we fit on is the quantity we are later graded on. See Evaluation.
For the inhomogeneous Poisson special case (
Seismic forecasting needs where and how large, not just when. The marked conditional intensity factorizes (under the usual separability assumption) into a ground intensity times mark densities:
-
Magnitude mark
$f(m)$ . Standard seismology takes magnitude as (approximately) independent of history given that an event occurs — the Gutenberg–Richter law,$f(m) = \beta, e^{-\beta(m - M_c)}$ with$\beta = b \ln 10$ , for$m \ge M_c$ . This "magnitude is memoryless" property is why deterministically predicting the size of the next event is not supported, and why the product forecasts the distribution of magnitudes, not a number. -
Spatial mark
$f(x,y\mid\cdot)$ . A density over location, typically a magnitude-scaled power-law kernel centered on triggering events (classical ETAS) or a learned spatial field (FERN-style neural encoders).
The marked log-likelihood simply adds the mark log-densities at observed events and integrates the ground intensity over space as well as time:
$$\log\mathcal{L} = \sum_{i=1}^{n}\Big[\log\lambda^*(t_i) + \log f(x_i,y_i\mid\cdot) + \log f(m_i)\Big]
- \int_0^T!!\int_A \lambda^*(t,x,y),dx,dy,dt.$$
The marked exceedance probability at a display threshold
with the GR exceedance factor $\Phi(M^) = 10^{-b,(M^ - M_c)}$. The neural pages keep
A Hawkes process is the temporal point process whose intensity is a constant background plus a sum of decaying "kicks," one per past event:
where
ETAS is precisely a marked Hawkes process with seismologically motivated kernels: an Omori–Utsu
temporal decay
A central quantity is the branching ratio
the expected number of direct aftershocks per event. The process is subcritical / stationary only
if
| Generalization | What it relaxes | Which page |
|---|---|---|
| Kernel becomes a learned RNN function of history |
|
RMTPP |
| Background + kicks become a continuous-time LSTM with decay | allows inhibition ( |
Neural Hawkes |
| History summarized by self-attention | long-range dependencies without recurrence memory decay | Transformer Hawkes |
Each is a strictly more expressive intensity plugged into the same log-likelihood of §4. That shared likelihood is why they are directly comparable to ETAS — and why, despite the extra expressiveness, none has robustly beaten a well-fit ETAS in prospective CSEP testing as of 2026.
Maximum likelihood. Parameters
Residual analysis (the random-time-change theorem). A deep, useful fact: if
turns the catalog into a unit-rate Poisson process in the transformed time
Uncertainty. Parameter uncertainty comes from the inverse Hessian (Fisher information) of the log-likelihood, from bootstrap, or from a Bayesian posterior. The product propagates this into the published P10/median/P90 bands — a Poisson-only interval is not sufficient because regional seismicity is over-dispersed relative to Poisson (see Evaluation).
Because the model is generative, we forecast by simulation rather than by a closed-form formula whenever the window contains self-triggering. The standard method is Ogata's thinning algorithm:
- Propose candidate event times from a homogeneous Poisson process with rate
$\bar\lambda \ge \sup_{[t,t+H)} \lambda^*$ (an upper bound on the intensity over the window). - Accept each candidate
$t^\star$ with probability$\lambda^*(t^\star)/\bar\lambda$ . - Each accepted event is appended to the history, raising the intensity for later candidates (this is what reproduces aftershock cascades).
- Repeat to the horizon
$H$ ; draw each accepted event's magnitude from$f(m)$ and location from the spatial kernel.
Running this
Strengths.
-
One object, everything derived. Counts, probabilities, simulations, and scores all flow from
$\lambda^*$ . No separate "probability calibration model" is bolted on — the survival identity is the probability model. - Calibratable by construction. The compensator/survival penalty is what makes the output a proper probability, gradeable by CSEP consistency tests and reliability diagrams.
- Unifies classical and neural. ETAS, Reasenberg–Jones, RMTPP, Neural Hawkes, and Transformer Hawkes are all the same framework with different intensities — directly comparable on the same likelihood and the same simulation protocol.
Limits and assumptions (stated honestly).
- Separability (time ⟂ space ⟂ magnitude) is an approximation; real catalogs show magnitude–space–time coupling that simple separable kernels miss.
-
Completeness. The likelihood assumes events below
$M_c$ are simply absent, not mis-recorded. A drifting or post-mainshock-spiking$M_c$ silently biases the fit — handled by a time-varying$M_c(x,y,t)$ (see Data Sources and Methodology). -
Stationarity of the background
$\mu$ over the fit window — often violated by transients (slow-slip, swarms, induced seismicity). - The framework forecasts rates, not individual events. No amount of intensity modeling makes the next earthquake's time/size deterministically knowable — the GR memorylessness of magnitude and the nonlinearity of rupture nucleation forbid it. See Honest-Limits.
Operational Earthquake Forecasting (OEF) is the practice of issuing authoritative, regularly updated probabilities of future seismicity (Jordan et al. 2011). The conditional-intensity framework is its mathematical foundation:
- The daily public number is the marked exceedance probability
$1 - e^{-\Lambda_{\ge M^*}}$ of §5, computed from a fitted intensity over a 1d / 2d / 7d window. - The information gain of one model over another (the CSEP comparison metric, in nats) is a difference of the per-event log-intensities — a direct consequence of the §4 likelihood. This is how the product certifies that any neural challenger genuinely beats ETAS before it reaches the map.
- The honesty guarantees — bounded probabilities, no alarms, a climatological baseline always
shown alongside — are properties of the math, not editorial choices: the survival identity bounds
the output in
$[0,1]$ , and the compensator forces it to track observed frequency.
In short, the conditional intensity is what the product estimates; the log-likelihood is how it is fit and graded; and the survival identity is why the published number is an honest probability and never a prediction.
Consider a single isolated
with illustrative values
Plugging in:
To get the probability of at least one aftershock of
Two honest readings of this worked number: (1) the relative gain over a quiet-day climatology (which
might be $\sim$0.01%) is two-to-three orders of magnitude — the genuine value of conditional
forecasting; (2) the absolute probability is still small (~2%), so if no
- Daley, D.J. & Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes, Vol. I: Elementary Theory and Methods (2nd ed.). Springer. doi:10.1007/b97277
- Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association 83(401), 9–27. doi:10.1080/01621459.1988.10478560
- Ogata, Y. (1998). Space-time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics 50(2), 379–402. doi:10.1023/A:1003403601725
- Ogata, Y. (1981). On Lewis' simulation method for point processes (the thinning algorithm). IEEE Transactions on Information Theory 27(1), 23–31. doi:10.1109/TIT.1981.1056305
- Hawkes, A.G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1), 83–90. doi:10.1093/biomet/58.1.83
- Rasmussen, J.G. (2018). Lectures on the Poisson Process and temporal point processes. arXiv:1806.00221
- Reinhart, A. (2018). A review of self-exciting spatio-temporal point processes and their applications. Statistical Science 33(3), 299–318. doi:10.1214/17-STS629
- Schoenberg, F.P. (2003). Multidimensional residual analysis of point process models for earthquake occurrences. Journal of the American Statistical Association 98(464), 789–795. doi:10.1198/016214503000000710
- Jordan, T.H., Chen, Y.-T., Gasparini, P., Madariaga, R., Main, I., Marzocchi, W., Papadopoulos, G., Sobolev, G., Yamaoka, K. & Zschau, J. (2011). Operational earthquake forecasting: state of knowledge and guidelines for utilization. Annals of Geophysics 54(4), 315–391. doi:10.4401/ag-5350
See also: Models — Classical · Models — ML · RMTPP · Neural Hawkes Process · Transformer Hawkes Process · Evaluation · Glossary · Honest-Limits.
⚠️ Disclaimer — read this. CAOS_SEISMIC produces probabilistic forecasts, not predictions. It is an independent research and education tool. It is NOT an official earthquake early-warning or civil-protection system, it does NOT predict when, where, or how large an earthquake will be, and it must NOT be used for life-safety, emergency, or evacuation decisions. Every number it publishes is a bounded, calibrated probability conditioned on the present state of seismicity — never an alarm, a countdown, or a "safe" state. A single outcome neither confirms nor refutes a probabilistic forecast.It complements, and does not replace or speak for, official agencies — always follow your national seismological and civil-protection authorities (e.g. USGS, INGV, CSN (Chile, SENAPRED for civil protection), GeoNet, JMA). The software is provided "as is", without warranty of any kind (MIT License); the authors accept no liability for its use. Data are courtesy of their providers (USGS/ANSS, ISC/ISC-GEM, Global CMT, EMSC, CSN, and others) under their respective licenses and attribution terms. See Honest-Limits for the full epistemic context.
CAOS_SEISMIC · seismic.fasl-work.com · source · MIT
Conditional probabilistic seismic forecasting — forecasts, never predictions.
Overview
Methodology & History
Classical models
- Models-Classical · index
- Gutenberg-Richter-Law
- Omori-Utsu-Law
- ETAS-Model
- Reasenberg-Jones-Model
- STEP-Model
- EEPAS-Model
- Smoothed-Seismicity
- Brownian-Passage-Time
- Rate-and-State-and-Coulomb
ML & analytical methods
- Models-ML · index
- Temporal-Point-Processes
- RMTPP
- Neural-Hawkes-Process
- Transformer-Hawkes-Process
- RECAST-and-FERN
- CNN-Spatial-Models
- Graph-and-Recurrent-Networks
- Detection-vs-Forecasting
Models employed
Data
Architecture
Evaluation
Progress
Reference