Skip to content

ETAS Model

Felipe Santibañez-Leal edited this page Jun 17, 2026 · 1 revision

ETAS — Epidemic-Type Aftershock Sequence model

The model to beat. ETAS is a self-exciting point process in which every earthquake — main, fore, or after — can trigger its own offspring, which in turn trigger theirs, so that observed seismicity is a superposition of a stationary background and an "epidemic" of cascading triggered events. It is the de-facto operational baseline for short-term earthquake forecasting: any candidate model, classical or neural, must demonstrate positive, significant information gain over a well-fit ETAS in prospective CSEP testing before it can claim forecasting skill. v0 of this product ships an ETAS-class core (see Models-Employed).

Honest framing. ETAS produces a conditional intensity — an instantaneous expected rate, given the history — never a deterministic prediction. Integrated over a horizon and combined with the Gutenberg–Richter magnitude term, it yields a bounded, calibrated probability shown next to its long-term baseline. Even at the peak of an active sequence, the absolute probability of a large event in the next day is usually well under a few percent; the relative gain over background can be large, but the absolute number stays small.

Conventions. $\mathcal{H}_t = {(t_i, x_i, y_i, m_i): t_i < t}$ is the history available at time $t$; magnitudes are homogenized to $M_w$; $M_0$ is a reference magnitude; $M_c$ is completeness; $b$ is the Gutenberg–Richter slope and $\beta = b\ln 10$.


Table of contents

  1. Intuition and history
  2. The conditional intensity
  3. Background term $\mu(x,y)$
  4. Utsu productivity $k(m)$
  5. Omori–Utsu temporal kernel $g(t)$
  6. Spatial kernel $f(r\mid m)$
  7. Branching ratio and the two stability gates
  8. Estimation — MLE and stochastic declustering
  9. The dual-catalog rule
  10. From intensity to the published probability
  11. Assumptions and honest limits
  12. Role in operational earthquake forecasting
  13. Worked illustration
  14. Structure of the process (diagram)
  15. References

1. Intuition and history

The classical Omori–Utsu law describes the aftershocks of one mainshock. But aftershocks have aftershocks, foreshocks are simply mainshocks of smaller events that came first, and real catalogs are a tangle of overlapping cascades. Yosihiko Ogata's insight (1988) was to stop labelling events as "main" or "after" and instead model seismicity as a self-exciting point process — a Hawkes process — in which every event, regardless of label, contributes a burst of elevated rate to its space–time neighborhood, and those bursts superpose.

The name is deliberately epidemiological. Each earthquake is like an infected individual: it "infects" its surroundings with a temporarily elevated probability of further earthquakes (offspring), each of which can infect again. The whole observed sequence is an epidemic of triggering on top of a steady background of spontaneous (independent) events driven by long-term tectonic loading. Ogata (1998) extended the temporal model to space–time, adding a spatial triggering kernel so the model forecasts where as well as how many and how big.

ETAS is "physics-free" in the sense that it encodes no fault mechanics directly — only the empirical Gutenberg–Richter, Omori–Utsu, and Utsu productivity scaling laws. Yet it remains, decades on, the model that neural and physics-based forecasters are measured against, because it captures the dominant, most predictable feature of seismicity — clustering — with a handful of interpretable parameters (see Methodology-History and Models-ML).


2. The conditional intensity

ETAS is defined by its conditional intensity $\lambda(t, x, y \mid \mathcal{H}_t)$: the instantaneous expected rate of events at time $t$ and location $(x,y)$, given everything that has happened before $t$. It is the sum of a background term and the triggered contributions of all past events:

$$\boxed{;\lambda(t, x, y \mid \mathcal{H}_t) = \mu(x,y) + \sum_{i:,t_i < t} k(m_i), g(t - t_i), f(x - x_i,, y - y_i \mid m_i);}$$

Each past event $i$ at $(t_i, x_i, y_i, m_i)$ adds a separable burst built from three factors:

  • $k(m_i)$productivity: how many offspring an event of magnitude $m_i$ seeds (§4),
  • $g(t - t_i)$temporal decay: the Omori–Utsu kernel, how that burst fades in time (§5),
  • $f(x - x_i, y - y_i \mid m_i)$spatial spread: where the offspring land relative to the parent (§6),

on top of $\mu(x,y)$, the time-independent background rate of spontaneous events (§3). The magnitude of each event is drawn independently from the Gutenberg–Richter density $f(m) = \beta e^{-\beta(m - M_c)}$ — magnitude is separable from the time–space triggering (a standard ETAS assumption).

The temporal-only special case (Ogata 1988), useful for intuition and for 1-D testing, is

$$\lambda(t \mid \mathcal{H}_t) = \mu + \sum_{i:,t_i < t} \frac{K, e^{\alpha(m_i - M_0)}}{(t - t_i + c)^{p}} .$$

References. Ogata, Y. (1988), Statistical models for earthquake occurrences and residual analysis for point processes, J. Am. Stat. Assoc. 83(401), 9–27, doi:10.1080/01621459.1988.10478560; Ogata, Y. (1998), Space–time point-process models for earthquake occurrences, Ann. Inst. Statist. Math. 50(2), 379–402, doi:10.1023/A:1003403601725.


3. Background term $\mu(x,y)$

The background is the rate of spontaneous earthquakes — those not triggered by any catalogued predecessor, driven by long-term tectonic loading. It is taken time-independent and factored as

$$\mu(x,y) = \mu_0, u(x,y),$$

with $\mu_0$ the total background rate and $u(x,y)$ a normalized spatial density of where background events occur. In practice $u(x,y)$ is supplied by adaptive smoothed seismicity on a declustered catalog (Helmstetter–Kagan–Jackson; see Models-Classical §7), and $\mu_0$ either fit jointly or recovered by stochastic declustering (§8). The background is also the floor to which the conditional rate relaxes when no recent events are triggering — and the mandatory null any time-dependent model must beat (see Evaluation-and-Tests). On quiet days, $\lambda$ correctly reads near this climatological background.


4. Utsu productivity $k(m)$

The number of direct offspring an event seeds grows exponentially with its magnitude (the Utsu productivity law):

$$\boxed{,k(m) = A, e^{,\alpha (m - M_0)} \quad(\text{equivalently } K_0, e^{\alpha(m - M_0)}),}$$

  • $A$ (or $K_0$) — the aftershock-occurrence rate at the reference magnitude $m = M_0$,
  • $\alpha$ — the productivity parameter: how strongly a larger event seeds more aftershocks. A larger $\alpha$ means big events dominate the triggering; a smaller $\alpha$ means even small events contribute appreciably.
  • $M_0$ — a reference / threshold magnitude (often set at or above $M_c$).

$\alpha$ is one of the two parameters that govern stability (§7). Physically it is bounded: $\alpha$ is typically found near $\beta = b\ln 10$, and the regime $\alpha < \beta$ versus $\alpha \ge \beta$ is exactly the finite-branching condition derived below.


5. Omori–Utsu temporal kernel $g(t)$

The burst from each past event decays in time as the modified Omori law. Written as a normalized density in elapsed time (integrating to 1 over $t \in [0,\infty)$ for $p > 1$):

$$g(t) = \frac{p-1}{c}\left(1 + \frac{t}{c}\right)^{-p}, \qquad \int_0^\infty g(t),\mathrm{d}t = 1 \ \ (p > 1).$$

  • $c$ — the Omori–Utsu time offset (hours to a day),
  • $p$ — the decay exponent ($p \approx 1$–1.3 typically).

This is the same kernel as in the standalone Omori–Utsu law; ETAS's contribution is to attach one such decaying burst to every event and sum them, which is what produces secondary aftershocks and bursts that a single Omori law cannot. The same short-term incompleteness caveat applies: $c$ and $p$ are biased if the post-mainshock catalog is incomplete, so an incompleteness-aware likelihood / time-dependent $M_c(t)$ is used (see Omori-Utsu-Law §incompleteness).


6. Spatial kernel $f(r\mid m)$

Offspring cluster around their parent, within a zone that grows with the parent's magnitude (a larger rupture seeds aftershocks over a larger area). Ogata's (1998) inverse-power spatial kernel, normalized to integrate to 1 over the plane, is

$$\boxed{,f(x - x_i,, y - y_i \mid m_i) = \frac{q - 1}{\pi,\zeta^2}\left(1 + \frac{(x - x_i)^2 + (y - y_i)^2}{\zeta^2}\right)^{-q}, \qquad \zeta = D, e^{,\gamma (m_i - M_0)},}$$

  • $D$ — the characteristic triggering length at the reference magnitude $M_0$,
  • $\gamma$ — the magnitude scaling of the aftershock-zone size,
  • $q$ — the tail exponent, controlling how far triggering reaches (larger $q$ = tighter clustering).

The bandwidth $\zeta = D,e^{\gamma(m_i - M_0)}$ encodes the empirical scaling of rupture/aftershock dimension with magnitude. A Gaussian spatial kernel is an alternative used in some implementations; the inverse-power form has heavier tails and is common operationally. The product treats the inverse-power-vs-Gaussian choice, and anisotropic/finite-fault triggering for great earthquakes, as explicit design decisions (see Models-Employed).


7. Branching ratio and the two stability gates

The branching ratio $n$ is the single most important diagnostic of an ETAS fit: the average number of direct offspring per event, integrated over all magnitudes and all time. With the magnitude density $\beta e^{-\beta(m - M_0)}$:

$$n = \int_{M_0}^{M_{\max}}!!\int_{0}^{\infty} k(m), g(\tau), \beta e^{-\beta(m - M_0)}, \mathrm{d}\tau, \mathrm{d}m .$$

Because $g$ is normalized ($\int_0^\infty g,\mathrm{d}\tau = 1$ for $p>1$), the time integral is 1 and $n$ reduces to the magnitude integral of the productivity:

$$n = A \int_{M_0}^{\infty} e^{\alpha(m - M_0)}, \beta e^{-\beta(m - M_0)}, \mathrm{d}m = A,\beta \int_{M_0}^{\infty} e^{(\alpha - \beta)(m - M_0)}, \mathrm{d}m .$$

This integral converges only if $\alpha < \beta$, in which case $n = A,\beta / (\beta - \alpha)$. This exposes two logically distinct stability gates that the product enforces separately:

  1. Finite branching ($\alpha < \beta$). If $\alpha \ge \beta$ the productivity × magnitude integral diverges — the largest events dominate without bound and the model is improper. The constraint $\alpha < \beta$ (with $\beta = b\ln 10$) is a genuine real-world fitting gotcha; $\alpha$ and $b$ must be estimated consistently.
  2. Subcriticality / stationarity ($n < 1$). Given $\alpha < \beta$, the branching ratio must satisfy $n < 1$ for the cascade to die out:
    • $n < 1$subcritical: each generation is smaller; the sequence is stationary and the expected total number of descendants per event is finite, $1/(1-n)$.
    • $n = 1$critical.
    • $n > 1$supercritical: explosive, non-physical for a real finite catalog; a fit with $n \ge 1$ is rejected as a mis-fit.

The expected family size $1/(1-n)$ also shows why $n$ near 1 means strong, long-lived sequences and why estimating it accurately matters: a small error in $\alpha$ or $b$ moves $n$ — and hence the forecast amplitude — substantially.


8. Estimation — MLE and stochastic declustering

Maximum likelihood. The parameters $(\mu_0, A, \alpha, c, p, D, \gamma, q)$ are fit by maximizing the space–time point-process log-likelihood on the full, un-declustered catalog:

$$\ln L = \sum_{i} \ln \lambda(t_i, x_i, y_i \mid \mathcal{H}_{t_i})

  • \int_{0}^{T}!!\int_{A} \lambda(t, x, y \mid \mathcal{H}_t), \mathrm{d}x, \mathrm{d}y, \mathrm{d}t .$$

The first (sum) term rewards high intensity where events occurred; the second (compensator / integral) term penalizes total predicted intensity over the whole space–time volume and is what makes the fit a calibrated probability model rather than a regressor. The same log-likelihood is reused as the L-test score in CSEP evaluation.

Stochastic declustering / EM. A second, complementary estimator (Zhuang, Ogata & Vere-Jones 2002) assigns to each event a probability of being background versus triggered:

$$\rho_i = \frac{\mu(x_i, y_i)}{\lambda(t_i, x_i, y_i \mid \mathcal{H}_{t_i})} \quad(\text{background probability}), \qquad 1 - \rho_i = \text{triggered probability}.$$

Iterating (an EM scheme) between these weights and the parameters both recovers the background field $u(x,y)$ and decouples background from clustering — a principled, soft alternative to window-based declustering. It is the standard route to a self-consistent $\mu(x,y)$ for ETAS.

Bayesian / scalable inference. For uncertainty quantification and scalability the product can use a Bayesian posterior (e.g. INLAbru, or simulation-based ETAS), giving the full parameter covariance needed for honest forecast bounds rather than a single point estimate (see Technical-Architecture and Evaluation-and-Tests). Software in the ecosystem includes the R ETAS/bayesianETAS packages and pyetas.

References. Ogata, Y. (1988), doi:10.1080/01621459.1988.10478560; Zhuang, J., Ogata, Y. & Vere-Jones, D. (2002), Stochastic declustering of space–time earthquake occurrences, J. Am. Stat. Assoc. 97(458), 369–380, doi:10.1198/016214502760046925.


9. The dual-catalog rule

A subtle but decisive pipeline rule distinguishes a correct ETAS deployment from a naive one:

  • Background $\mu(x,y)$ is estimated on a declustered catalog (Zaliapin–Ben-Zion nearest-neighbour, or Gardner–Knopoff windowing as a cross-check), so the background reflects independent mainshocks, not aftershock-inflated rates.
  • The conditional triggering term is fit on the full, un-declustered catalog, because aftershock/foreshock triggering is the predictable short-term signal — declustering it away would discard exactly what ETAS exists to model.

Forecasts are scored on the non-declustered catalog, because the product deliberately forecasts clustering. A consequence that must be stated, not hidden: because most scored events are aftershocks, a model that merely reproduces Omori decay already passes the consistency tests — so skill is established only by winning a comparison test against a real ETAS baseline (positive information gain per earthquake; see Evaluation-and-Tests), never by passing consistency tests alone (see Models-Employed).

Reference. Zaliapin, I. & Ben-Zion, Y. (2020), Earthquake declustering using the nearest-neighbor approach in space-time-magnitude domain, J. Geophys. Res. Solid Earth 125, e2018JB017120, doi:10.1029/2018JB017120.


10. From intensity to the published probability

ETAS gives the conditional intensity $\lambda$. The public number is an exceedance probability at a region-appropriate threshold $M^\ast$ over a horizon $[0, T]$, built by integrating $\lambda$, applying the Gutenberg–Richter tail, and wrapping in the Poisson survival:

$$N_{\ge M^\ast} = \int_{0}^{T}!!\int_{A} \lambda(t, x, y \mid \mathcal{H}_t), \Phi(M^\ast), \mathrm{d}x, \mathrm{d}y, \mathrm{d}t, \qquad \Phi(M^\ast) = 10^{-b(M^\ast - M_c)},$$

$$\boxed{,P(\ge 1\ \text{event} \ge M^\ast) = 1 - e^{-N_{\ge M^\ast}},}$$

In practice the integral is evaluated by simulating a large ensemble of synthetic catalogs (≥ 10,000 per day) from the fitted intensity, which also yields the over-dispersion-honest, catalog-based uncertainty bounds that a naive Poisson interval would understate (see Evaluation-and-Tests and Models-Employed). The $1 - e^{-N}$ wrapper never changes; only the quality of $\lambda$ improves as the model improves, and quiet days correctly read near the climatological background.


11. Assumptions and honest limits

  • Isotropic, magnitude-scaled triggering. The standard spatial kernel is circular, whereas real aftershock zones are elongated along the rupture. For great earthquakes this is a known simplification; anisotropic / finite-fault ETAS variants exist and matter for subduction megathrusts (a flagged design decision for a Chile build; see Models-Employed).
  • Separable magnitude. Magnitudes are drawn from a fixed GR distribution independent of history; in reality $b$ varies in space and time, and that variation is carried separately with uncertainty.
  • Stationary background. $\mu(x,y)$ is assumed time-independent — violated near swarms, slow-slip episodes, and induced/injection-driven seismicity, where the background itself moves.
  • Post-mainshock incompleteness. The highest-stakes moment is exactly where the catalog is worst; without an incompleteness-aware likelihood, ETAS under-forecasts aftershocks right after a large event (see Omori-Utsu-Law §incompleteness).
  • Not a predictor. ETAS forecasts a rate; whether a small rupture cascades into a great earthquake depends on unmeasurably fine details of the crust, and a single outcome neither validates nor invalidates a probabilistic forecast (see Honest-Limits).

12. Role in operational earthquake forecasting

ETAS is the gold-standard physics-free short-term baseline and the v0 core of this product:

  • It is what real operational systems run: USGS Operational Aftershock Forecasting and OEF-Italy (an ensemble that includes ETAS) publish calibrated probabilities with uncertainty as scheduled services.
  • It is the benchmark gate: any neural temporal point process or hybrid must beat a well-fit ETAS in prospective CSEP testing — and, as of 2026, no pure neural model has robustly done so on the public benchmarks to date (see Models-ML). This is a requirement, not a claim that ML can never help.
  • It supplies the conditional intensity that, combined with GR and the Poisson wrapper, produces the daily, region-scoped, horizon-specific probabilities the product publishes (see Models-Employed and Pipeline).

The product ships a region-refit space–time ETAS with the full hygiene pipeline — per-region $M_c(x,y,t)$, $M_w$ homogenization, dual-catalog declustering, propagated parameter uncertainty, and full CSEP testability on a fine 0.1° grid — which is materially stronger by design than any hand-binned or coarse-rectangle Hawkes approximation.


13. Worked illustration

Consider a temporal ETAS with reference $M_0 = 3.0$, productivity $A = 0.2$, productivity scaling $\alpha = 1.5$, and a GR slope $b = 1.0$ so $\beta = \ln 10 \approx 2.303$.

Stability check (gate 1). Is $\alpha < \beta$? $1.5 < 2.303$ ✓ — the branching ratio integral converges. The branching ratio (gate 2):

$$n = \frac{A,\beta}{\beta - \alpha} = \frac{0.2 \times 2.303}{2.303 - 1.5} = \frac{0.4606}{0.803} \approx 0.57 .$$

Since $n = 0.57 < 1$ the process is subcritical — accepted. The expected total family size per event is $1/(1-n) = 1/0.43 \approx 2.3$ direct-plus-indirect descendants, a moderately clustering regime. Had a fit returned, say, $\alpha = 2.4 > \beta$, gate 1 would reject it outright (divergent); had it returned $n = 1.2$, gate 2 would reject it as supercritical.

Forecast amplitude. Suppose an $M,5.0$ event just occurred. Its productivity is $k(5.0) = A,e^{\alpha(5.0 - 3.0)} = 0.2,e^{1.5\times 2} = 0.2,e^{3} \approx 4.0$ expected direct offspring (above $M_0$). With an Omori–Utsu kernel ($c = 0.01$ d, $p = 1.1$), the fraction of those offspring expected in the first day is $\int_0^1 g(t),\mathrm{d}t = 1 - (1 + 1/c)^{-(p-1)} = 1 - (101)^{-0.1} \approx 1 - 0.63 = 0.37$, so roughly $4.0 \times 0.37 \approx 1.5$ events $\ge M_0$ are expected in the next day from this one parent, added on top of the background and the contributions of all other recent events. Folding in the GR exceedance to, say, $M^\ast = 5$ and the Poisson wrapper (§10) then yields the small absolute probability of a further large event — large relative to the quiet-day baseline, but still low in absolute terms, exactly the honest message the product conveys.


14. Structure of the process (diagram)

flowchart TD
    BG[Background mu of x,y<br/>spontaneous events<br/>smoothed seismicity, declustered] --> SUM[Conditional intensity<br/>lambda of t,x,y given history]
    subgraph TRIG[Triggering: summed over every past event i]
      P[Productivity k of m = A exp alpha m - M0]
      T[Temporal g of t = Omori–Utsu kernel]
      S[Spatial f of r given m, zeta = D exp gamma m - M0]
      P --> KER[k x g x f burst per parent]
      T --> KER
      S --> KER
    end
    KER --> SUM
    SUM --> GATE{Stability gates}
    GATE -- gate 1: alpha < beta --> OK1[finite branching]
    GATE -- gate 2: n < 1 --> OK2[subcritical / stationary]
    GATE -- n >= 1 or alpha >= beta --> REJ[reject fit: mis-fit]
    OK1 --> SIM[Simulate >= 10,000 catalogs over horizon]
    OK2 --> SIM
    SIM --> N[Expected count x GR exceedance Phi of M*]
    N --> PR[Probability P = 1 - exp - N<br/>+ over-dispersion-honest bounds]
    PR --> CSEP[Scored prospectively in CSEP<br/>must beat smoothed-seismicity AND ETAS]
Loading

Each past event contributes a separable burst (productivity × temporal × spatial) on top of the background; the summed intensity, gated for stability and simulated forward, becomes the published probability and is scored in CSEP.


References

  1. Ogata, Y. (1988), Statistical models for earthquake occurrences and residual analysis for point processes, J. Am. Stat. Assoc. 83(401), 9–27, doi:10.1080/01621459.1988.10478560.
  2. Ogata, Y. (1998), Space–time point-process models for earthquake occurrences, Ann. Inst. Statist. Math. 50(2), 379–402, doi:10.1023/A:1003403601725.
  3. Zhuang, J., Ogata, Y. & Vere-Jones, D. (2002), Stochastic declustering of space–time earthquake occurrences, J. Am. Stat. Assoc. 97(458), 369–380, doi:10.1198/016214502760046925.
  4. Helmstetter, A., Kagan, Y. Y. & Jackson, D. D. (2007), High-resolution time-independent grid-based forecast for M ≥ 5 earthquakes in California, Seismol. Res. Lett. 78(1), 78–86, doi:10.1785/gssrl.78.1.78.
  5. Zaliapin, I. & Ben-Zion, Y. (2020), Earthquake declustering using the nearest-neighbor approach in space-time-magnitude domain, J. Geophys. Res. Solid Earth 125, e2018JB017120, doi:10.1029/2018JB017120.
  6. Reasenberg, P. A. & Jones, L. M. (1989), Earthquake hazard after a mainshock in California, Science 243(4895), 1173–1176, doi:10.1126/science.243.4895.1173.
  7. Jordan, T. H. et al. (2011), Operational earthquake forecasting: state of knowledge and guidelines for utilization (ICEF Report), Annals of Geophysics 54(4), 315–391, doi:10.4401/ag-5350.
  8. Mizrahi, L., Nandan, S. & Wiemer, S. (2024), Developing, testing, and communicating earthquake forecasts: current practices and future directions, Reviews of Geophysics 62, doi:10.1029/2023RG000823.

See also: Gutenberg-Richter-Law · Omori-Utsu-Law · Models-Classical · Models-ML · Models-Employed · Evaluation-and-Tests · Glossary

Clone this wiki locally