-
Notifications
You must be signed in to change notification settings - Fork 0
Smoothed Seismicity
A single-topic deep page. Smoothed seismicity is the model that answers "where do earthquakes happen?" by turning a catalog of past small events into a smooth, stationary map of the long-term rate. It is the most consequential "boring" model in the whole system, because it plays two jobs at once: it is the spatial background
$\mu(x,y)$ that the time-dependent ETAS model needs, and it is the stationary, time-independent Poisson reference — the mandatory null that every time-dependent or machine-learning forecaster must demonstrably beat in prospective CSEP testing before it can claim any forecasting skill at all.
Honest framing. This page describes a forecast, not a prediction. Smoothed seismicity
produces a conditional rate density — a number of expected events per unit area per unit time
per magnitude bin — never a deterministic statement that an event will or will not occur. It is
time-independent by construction: it says nothing about when, only about the long-run where
and how often. The published quantity is always a bounded probability in
- Intuition and history
- The governing model — kernel density of a point process
- The adaptive power-law kernel
- Why adaptive bandwidth, not fixed bandwidth
- From a spatial density to a space–rate–magnitude forecast
- Declustering: the long-term map must not be a snapshot of a burst
- Parameter estimation and retrospective optimization
- The dual role: background for ETAS, and the mandatory null
- Assumptions, strengths and limitations
- A worked illustration
- Role in operational earthquake forecasting
- References
The founding empirical observation is simple and robust: large earthquakes tend to nucleate where
small earthquakes have clustered in the past. Microseismicity (down to
The naïve way to use this observation — counting events in fixed grid cells — is badly behaved: cells are arbitrary, the map is discontinuous at cell boundaries, an empty cell next to a busy cell reads as "impossible" when it is merely "undersampled", and the resolution is fixed regardless of how much data there is locally. Kernel smoothing fixes all of these. Instead of binning, each past event is replaced by a smooth "bump" (a kernel) centred on its epicentre, and the bumps are summed to give a continuous density field. Where events are dense the bumps overlap and the density is high; where events are sparse the density tapers smoothly to a low, non-zero floor.
The decisive methodological step — and what makes this a genuinely competitive forecast rather than a textbook exercise — is the adaptive bandwidth introduced by Helmstetter, Kagan & Jackson (2007). The width of each bump is not fixed; it is set by the local data density, so that well-sampled fault traces are imaged sharply and sparsely-sampled regions are smoothed broadly. This single idea let an unassuming, parameter-light, time-independent model become the best-performing time-independent forecast in the RELM/CSEP California experiments — beating several far more elaborate models. Werner et al. (2011) later extended the same machinery to time-dependent adaptive smoothing, and the testing/evaluation discipline around it grew into the modern CSEP framework. The model is now a standard ingredient of national hazard models and the default background term of operational ETAS implementations.
flowchart LR
A["Catalog of past<br/>small events (M ≥ 2)"] --> B["Decluster<br/>(remove aftershock bursts)"]
B --> C["Place an adaptive kernel<br/>on each event"]
C --> D["Sum kernels →<br/>smooth density μ(x,y)"]
D --> E["× Gutenberg–Richter<br/>magnitude tail"]
E --> F["× time scaling"]
F --> G["Space–rate–magnitude<br/>forecast λ(x,y,m)"]
G --> H1["Role 1:<br/>background μ for ETAS"]
G --> H2["Role 2:<br/>stationary Poisson NULL<br/>(model to beat)"]
Treat the past epicentres
where
so that if we sum
The same object can be written as the background term of the general
conditional intensity of a marked space–time
point process. In a pure smoothed-seismicity (time-independent) forecast, the triggering sum is
absent and the intensity is just
Helmstetter, Kagan & Jackson (2007) use an isotropic power-law kernel rather than a Gaussian, because earthquake epicentres have heavy spatial tails — neighbouring faults are correlated over long distances and a Gaussian decays far too fast. The kernel has the form
-
$r$ — horizontal distance from the kernel's centre (the event) to the evaluation point. -
$d$ — the bandwidth (smoothing length) of this kernel. -
$s$ — the power-law exponent controlling how fast the kernel decays at large$r$ . -
$C(d)$ — the normalization constant, chosen so the kernel integrates to 1 over the plane.
For a power-law
so that
Implementation note — do not hard-code the exponent. The exponent
$s$ and the matching normalization$C(d)$ are not a single universal constant across the Helmstetter–Kagan–Jackson family of papers. Both a power-$1$ form ($s = 1$ , $\propto 1/(r^2 + d^2)$) and a power-$3/2$ form ($s = 3/2$ ,$\propto 1/(r^2 + d^2)^{3/2}$ ) appear, depending on the specific kernel, the normalization convention, and whether the smoothing is done on a sphere or a plane. The correct engineering practice is to pin$s$ ,$C(d)$ and the neighbour count to one specific reference implementation (Helmstetter–Kagan–Jackson 2007 / Werner et al. 2011, or the pyCSEP / floatCSEP reference code) and validate it on a known catalog, rather than treating$s = 3/2$ as a verified universal value. Throughout this page$s$ is kept symbolic for exactly this reason.
The defining feature of the model is that each event gets its own bandwidth
with
-
Where data are dense (an active, well-imaged fault trace), the
$n$ -th neighbour is close, so$d_i$ is small and the kernel is narrow — the fault is imaged sharply. -
Where data are sparse (a quiet region, the edge of a network), the
$n$ -th neighbour is far, so$d_i$ is large and the kernel is broad — the rate is spread out and no single isolated event produces a spurious hot-spot.
This automatic, data-driven trade-off is exactly what a fixed-bandwidth estimator cannot do. A fixed bandwidth narrow enough to resolve a busy fault would produce noisy, spiky garbage in sparse regions; a fixed bandwidth broad enough to smooth the sparse regions would blur every fault into mush. The adaptive bandwidth resolves the resolution-vs-stability tension locally, and it is the single reason the method is competitive.
flowchart TB
subgraph Dense["Dense region (active fault)"]
D1["n-th neighbour is CLOSE"] --> D2["small bandwidth d_i"] --> D3["NARROW kernel<br/>sharp fault image"]
end
subgraph Sparse["Sparse region (quiet area)"]
S1["n-th neighbour is FAR"] --> S2["large bandwidth d_i"] --> S3["BROAD kernel<br/>smooth, stable floor"]
end
A subtlety worth stating: a fixed bandwidth makes the smoothing time-independent and stationary by construction, but the adaptive bandwidth introduces a mild dependence on the catalog used to define the neighbours. For the canonical time-independent forecast, the bandwidths are computed once from a long learning catalog and then held fixed, preserving stationarity. Werner et al. (2011) deliberately relax this to build a time-dependent adaptive-smoothing variant that updates as new events arrive — a different, short-term product.
A CSEP-testable forecast is a rate in every space–magnitude bin over a fixed horizon. The
smoothed density
The expected number of events in spatial cell
where
The single number a user sees is then the exceedance probability at a region-appropriate target
magnitude
This is the same
A smoothed-seismicity background must reflect the long-term average rate of independent events, not the transient inflation of an ongoing aftershock sequence. If the raw catalog is smoothed directly, a single recent large mainshock and its thousands of aftershocks would paint a huge, spurious hot-spot that has nothing to do with the long-run loading — the map would be a photograph of last month's burst, not the decade's tectonics. Therefore the catalog is declustered first: aftershock and foreshock bursts are stripped so that only (approximately) independent mainshocks remain, and the smoothing then images the stationary spatial structure.
This is one half of the system's dual-catalog rule, which is worth stating crisply because it is the most common pipeline mistake in the field:
| Quantity | Catalog used | Why |
|---|---|---|
| Smoothed background |
Declustered | Must reflect independent long-term rate, not aftershock inflation. |
| ETAS / triggering term | Full (un-declustered) | Aftershock/foreshock triggering is the predictable short-term signal. |
Declustering methods range from the classic Gardner–Knopoff space–time windowing (transparent, easy to audit) to the modern Zaliapin–Ben-Zion nearest-neighbour method (statistically principled, based on a rescaled space–time–magnitude proximity). The choice of declustering is itself an assumption that propagates into the background map, so it is treated as a documented, versioned decision and ideally cross-checked across methods. (Declustering is deceptively dangerous: it is the step where it is easiest to silently delete real signal.)
The smoothed-seismicity model is refreshingly parameter-light. The quantities to set are:
-
The neighbour count
$n$ (which controls all bandwidths$d_i$ ). Typically$n \approx 6$ , but it is optimized retrospectively: the learning catalog is split into a fitting period and a target period, the forecast is generated for a grid of candidate$n$ values, and the$n$ that maximizes the out-of-sample log-likelihood of the target events is selected. -
The kernel exponent
$s$ (and the matching normalization $C(d)$) — pinned to the chosen reference implementation (§3). -
The completeness magnitude
$M_c$ — the lowest magnitude included in the smoothing catalog (small events dominate the count, so a stable, low$M_c$ is what gives the map its resolution). -
The Gutenberg–Richter
$b$ -value — for the magnitude factor; estimated by the binning-corrected Aki–Utsu MLE, never hard-coded to 1.
The optimization objective is the point-process log-likelihood of the held-out target events
under the forecast intensity. For a Poisson forecast with binned rates
summed over space–magnitude bins
Crucially, all hyperparameters must be frozen on data strictly before the forecast period — the forecast-clock causal cutoff — so that the reported skill is genuinely prospective and not a hindsight artifact.
Smoothed seismicity earns its central place because it serves two structurally different purposes.
Role 1 — the spatial background
Role 2 — the stationary Poisson null. A time-independent forecast is, by definition, the hypothesis "the future rate equals the smoothed past rate, with no time-dependence." This is the honest null hypothesis for the whole product. Any model that adds time-dependence — ETAS, a neural point process, a tidal covariate — must show, in prospective CSEP comparison tests, a positive and statistically significant information gain per earthquake over this null. If it cannot, the added machinery is adding noise, not skill, and must not touch a public number.
flowchart LR
SS["Smoothed seismicity<br/>μ(x,y)"]
SS -->|"Role 1: background term"| ETAS["ETAS conditional intensity<br/>λ = μ + triggering"]
SS -->|"Role 2: stationary null"| NULL["Time-independent<br/>Poisson reference"]
ETAS -->|"must beat (prospective IGPE > 0)"| NULL
ML["Neural / hybrid model"] -->|"must beat"| NULL
ML -->|"must beat"| ETAS
This is a sharp, honest bar. In prospective California CSEP, even the spread of time-independent models against a smoothed-seismicity reference is only of order a fraction of a nat per earthquake — the null is genuinely hard to beat, and saying so openly is part of the product's credibility.
Core assumptions.
- The future resembles the smoothed past locations. Future large events occur (in a smoothed sense) where past small events have occurred.
- Stationarity. The spatial rate does not change over the forecast horizon (true to good approximation over years, by construction in the time-independent variant).
-
Separability of magnitude. The Gutenberg–Richter distribution is the same everywhere (a
simplification —
$b$ genuinely varies in space, which a refined version maps locally). - Isotropy of the kernel. Each event smears out in a circularly symmetric bump, whereas real seismicity is elongated along fault strike (an anisotropic kernel does better but adds parameters).
Strengths.
- Best-performing time-independent forecast in the RELM/CSEP California experiments — a strong, hard-to-beat baseline despite its simplicity.
-
Parameter-light and transparent. One main hyperparameter (
$n$ ), no hidden tuning, fully auditable. This is exactly the property you want in a null. - Naturally regularized. The adaptive bandwidth prevents both over-fitting in dense regions and spurious hot-spots in sparse ones.
- Reusable. The same map is both the ETAS background and the null, so it is computed once and serves two purposes.
Limitations (stated honestly).
- Blind to silent faults. A fault that is fully locked and has produced no recent microseismicity gets a low rate — yet it may be precisely where the next great earthquake strikes. This is the model's deepest weakness, and it is why geodetic strain (which images loading on silent faults) is a valuable complementary input to the background term.
- No time information. By design it cannot raise the rate during an active sequence — that is ETAS's job. On its own it under-warns during aftershock storms and over-warns on the day after they end.
-
Catalog-quality sensitive. A change in network detection capability changes
$M_c$ and the apparent spatial density; an apparent "hot-spot" can be a network artifact, so$M_c(x,y,t)$ must be monitored and homogenized. - Declustering-dependent. The background inherits whatever assumptions the declustering method makes; an over-aggressive declusterer erases real clustered loading.
Consider a simplified region with a declustered catalog of
-
Bandwidths. For an event on the busy trace, its
$n = 6$ -th nearest neighbour sits at, say,$d_i \approx 2$ km, so its kernel is narrow. An isolated event off-fault might have$d_i \approx 30$ km, smearing its contribution broadly. -
Spatial weight. Summing the kernels and integrating over our target cell gives a spatial probability weight, say
$\int_{\text{cell}}\mu \approx 3\times 10^{-3}$ (the cell holds 0.3 % of the region's long-term rate). On a quiet cell far from any trace this might be$10^{-6}$ — small, but never exactly zero, which is the point of smoothing. -
Temporal rate. The total rate
$\ge M_c$ is$N/T = 10{,}000/20 = 500$ independent events per year region-wide. -
Magnitude factor. With
$b = 1.0$ , the Gutenberg–Richter fraction of events reaching$M \ge 6$ relative to$M \ge M_c = 2$ is$10^{-b(6-2)} = 10^{-4}$ . -
Expected number in the cell, per year. $$N_{\ge 6} = \underbrace{3\times 10^{-3}}{\text{spatial}} \times \underbrace{500}{\text{rate/yr}} \times \underbrace{10^{-4}}_{\text{GR}} = 1.5\times 10^{-4}.$$
-
Annual probability.
$$P(\ge 1\ M!\ge!6 \text{ in the cell, 1 yr}) = 1 - e^{-1.5\times 10^{-4}} \approx 1.5\times 10^{-4} ;;(\approx 0.015%).$$
The number is deliberately tiny — a single small cell over a single year is a rare-event problem, and an honest time-independent forecast says so. The value of the model is not any single cell's number but the relative field: this busy-trace cell carries a rate three orders of magnitude higher than a quiet off-fault cell, and that spatial contrast — calibrated and testable — is the information content. A genuine time-dependent model is then judged by whether it can sharpen this field and track when, beating the flat-in-time null it started from.
In an operational system that issues one forecast per day, smoothed seismicity is the always-on floor:
- It is the cold-start / quiescent-cell rate. Where recent seismicity is sparse or absent, the
conditional rate does not collapse to zero and does not jump to an arbitrary hard-coded floor — it
floors to the long-term smoothed background
$\mu(x,y)$ . This is what makes a "quiet" cell read as low-but-real climatology rather than impossible (and, critically, a blank cell must never be allowed to read as "safe"). - It is the denominator of every skill claim. The daily pipeline scores the live forecaster's information gain against this null. Reporting "we beat climatology by X nats during the sequence" is only meaningful because the null is fixed, transparent and pre-registered.
- It anchors the public number. Every probability is shown next to its smoothed-seismicity / climatological baseline, so a user reads "today: X % vs. baseline Y %," never an unanchored figure that invites alarm.
Because most of any map and most of any year are quiet, the calibration of the product is dominated by these background-rate cells. The reliability diagram ("when we said 2 %, it happened ~2 % of the time") must therefore be validated specifically in the cold-start regime, not only during the rare, dramatic active sequences. A model that is well-calibrated only during aftershock storms but mis-calibrated on quiet days is mis-calibrated most of the time. Getting the boring background right is, quietly, most of the job.
- Helmstetter, A., Kagan, Y.Y. & Jackson, D.D. (2007). High-resolution time-independent grid-based forecast for M ≥ 5 earthquakes in California. Seismological Research Letters 78(1), 78–86. doi:10.1785/gssrl.78.1.78
- Werner, M.J., Helmstetter, A., Jackson, D.D. & Kagan, Y.Y. (2011). High-resolution long-term and short-term earthquake forecasts for California. Bulletin of the Seismological Society of America 101(4), 1630–1648. doi:10.1785/0120090340
- Zechar, J.D., Schorlemmer, D., Werner, M.J., Gerstenberger, M.C., Rhoades, D.A. & Jordan, T.H. (2013). Regional Earthquake Likelihood Models I: First-order results. Bulletin of the Seismological Society of America 103(2A), 787–798. doi:10.1785/0120120186
- Schorlemmer, D., Gerstenberger, M.C., Wiemer, S., Jackson, D.D. & Rhoades, D.A. (2007). Earthquake likelihood model testing. Seismological Research Letters 78(1), 17–29. doi:10.1785/gssrl.78.1.17
- Wiemer, S. & Wyss, M. (2000). Minimum magnitude of completeness in earthquake catalogs: examples from Alaska, the western United States, and Japan. Bulletin of the Seismological Society of America 90(4), 859–869. doi:10.1785/0119990114
- Ogata, Y. (1998). Space–time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics 50(2), 379–402. doi:10.1023/A:1003403601725
- Zaliapin, I. & Ben-Zion, Y. (2020). Earthquake declustering using the nearest-neighbor approach in space–time–magnitude domain. Journal of Geophysical Research: Solid Earth 125, e2018JB017120. doi:10.1029/2018JB017120
- Jordan, T.H., Chen, Y.-T., Gasparini, P., Madariaga, R., Main, I., Marzocchi, W., Papadopoulos, G., Sobolev, G., Yamaoka, K. & Zschau, J. (2011). Operational Earthquake Forecasting: state of knowledge and guidelines for utilization. Annals of Geophysics 54(4), 315–391. doi:10.4401/ag-5350
See also: Models — Classical · Brownian Passage Time · Rate-and-State & Coulomb · Evaluation & Tests · Data Sources · Methodology & History. All equations are transcribed from the primary/authoritative sources cited above.
⚠️ Disclaimer — read this. CAOS_SEISMIC produces probabilistic forecasts, not predictions. It is an independent research and education tool. It is NOT an official earthquake early-warning or civil-protection system, it does NOT predict when, where, or how large an earthquake will be, and it must NOT be used for life-safety, emergency, or evacuation decisions. Every number it publishes is a bounded, calibrated probability conditioned on the present state of seismicity — never an alarm, a countdown, or a "safe" state. A single outcome neither confirms nor refutes a probabilistic forecast.It complements, and does not replace or speak for, official agencies — always follow your national seismological and civil-protection authorities (e.g. USGS, INGV, CSN (Chile, SENAPRED for civil protection), GeoNet, JMA). The software is provided "as is", without warranty of any kind (MIT License); the authors accept no liability for its use. Data are courtesy of their providers (USGS/ANSS, ISC/ISC-GEM, Global CMT, EMSC, CSN, and others) under their respective licenses and attribution terms. See Honest-Limits for the full epistemic context.
CAOS_SEISMIC · seismic.fasl-work.com · source · MIT
Conditional probabilistic seismic forecasting — forecasts, never predictions.
Overview
Methodology & History
Classical models
- Models-Classical · index
- Gutenberg-Richter-Law
- Omori-Utsu-Law
- ETAS-Model
- Reasenberg-Jones-Model
- STEP-Model
- EEPAS-Model
- Smoothed-Seismicity
- Brownian-Passage-Time
- Rate-and-State-and-Coulomb
ML & analytical methods
- Models-ML · index
- Temporal-Point-Processes
- RMTPP
- Neural-Hawkes-Process
- Transformer-Hawkes-Process
- RECAST-and-FERN
- CNN-Spatial-Models
- Graph-and-Recurrent-Networks
- Detection-vs-Forecasting
Models employed
Data
Architecture
Evaluation
Progress
Reference