Skip to content

Changelog and Progress

Felipe Santibañez-Leal edited this page Jun 18, 2026 · 2 revisions

Changelog and Progress

A chronological, honest account of how CAOS_SEISMIC was built and what was decided along the way — including what is done, what is in progress, and what is not yet finished.

The project's value proposition is "calibrated, evaluated against reality." That is only credible if the record is honest about its own state. So this page deliberately keeps the distinction between built and verified and not yet real sharp. In particular, the headline scientific claims (real-data training, multi-region CSEP back-analysis with real numbers, a neural challenger that has actually beaten ETAS) are not yet finished at the time of writing, and this page says so plainly.

Reminder on framing. CAOS_SEISMIC is a conditional probabilistic forecaster, not an earthquake predictor. Deterministic prediction is impossible. Every milestone below is measured against calibration and information gain, never against "calling a quake."

Contents

  1. Status at a glance
  2. Phase 1 — Deep research
  3. Phase 2 — The repository build
  4. The hosting decision: GitHub Pages
  5. The global re-scope
  6. The web rebuild
  7. Governance and communication posture
  8. What is DONE vs PENDING
  9. Open follow-ups
  10. The roadmap from here

1. Status at a glance

Area State
Deep research (methodology, models, evaluation, data, tidal triggering, web design) Done
Public repository scaffolded, built end-to-end, and published Done
Forecasting core (ETAS reference + smoothed-seismicity null + Reasenberg–Jones fallback) Done (fit on the real global catalog)
Catalog hygiene ($M_c$ / b-value, moment-magnitude homogenization, dual-catalog declustering) Done (97,230 real events, 1990–2026)
Forecast-clock daily inference (P10/P90 bounds, isotonic calibration, QA gate) Done (real global artifact live)
pyCSEP evaluation + pseudo-prospective back-analysis harness Done — real 8-view numbers live
Static web viewer (i18n EN/ES, light/dark, probability-field map + heatmap mode, six pages) Done
Hosting decision (GitHub Pages, custom domain seismic.fasl-work.com) Done + live
Global re-scope (one global field, many-country inference, bias evaluation) Done
Real-data depth: full catalog + full ETAS fit + real multi-region CSEP back-analysis Done + live
Neural context-conditioned challenger trained and gated on a CSEP win over ETAS Trained; gate ≈ 0 (seismicity-only context) until covariates wired
Efficiency (tractable global pipeline; cadenced back-analysis) Done (79→12 s inference; 13× faster back-analysis)

The honest one-line summary: the headline scientific results are now produced and live. The global field trains on 97k real events (195 ETAS tiles, b=1.337) and the 8-view pseudo-prospective back-analysis is published. The central numbers:

  • Global context contribution — the global conditioned field gains 0.0835 nats/eq in information over the Poisson null (1-day, prospective), more than any single country (the global field aggregates worldwide triggering).
  • High-vs-low-seismicity bias — skill concentrates in active zones: the high−low gap is +0.0184 nats/eq (Japan leads at +0.072; low-seismicity interiors sit at 0 — the honest statement that a self-exciting model has nothing to add without aftershock sequences).
  • Neural challenger — trained, but ≈ ETAS by construction until real geodetic/stress covariates are wired; consistent with the field, where no neural point process beats ETAS prospectively as of 2026 (see the cited improvement-evidence).

Calibration is the honest open item: the N-test is not yet passed (the forecast count needs calibrating), surfaced rather than hidden.


2. Phase 1 — Deep research

The project began with a deep, multi-source literature and tooling survey, treated as the authoritative specification for everything that followed: methodology, model design, evaluation, data sources, tidal triggering, and web/monitoring design.

The research produced cited reports spanning, in order:

  1. Problem framing and predictability epistemics — Operational Earthquake Forecasting (OEF) as the mainstream scientific framing; the hard limit that deterministic prediction is impossible; what "skill" can and cannot mean.
  2. Classical statistical seismology models — Gutenberg–Richter, Omori–Utsu, ETAS (Ogata 1998), EEPAS, BPT — with real equations and references.
  3. Modern / machine-learning approaches — and the honest verdict that, as of writing, no pure neural point process has robustly beaten a well-fit ETAS in prospective CSEP testing.
  4. Evaluation and testing framework — CSEP / pyCSEP, the consistency and comparison tests, proper scoring rules, the Molchan / Area-Skill-Score view, and the strict pseudo-prospective protocol. (See Evaluation and Tests.)
  5. Data sources — global and regional catalogs (USGS ComCat, ISC-GEM, GCMT, national networks), FDSN/ObsPy access, magnitude-of-completeness handling, declustering, and geophysical enrichers (slab geometry, plate boundaries, GNSS geodesy).
  6. Tidal triggering — at maximum detail: theory, effect sizes, tools, and the honest verdict that the tidal effect is real but small, concentrated in specific regimes, and usable only as a regularized covariate — never a standalone predictor.
  7. Web-app architecture and monitoring-section design — including the model-size and deploy feasibility analysis.

These were consolidated into a synthesis spec covering methodology, model design, the evaluation plan, data and pipelines, and the web-app spec. A directive fixed the posture: the research output is the authoritative spec — implement its definitions without re-asking what the research already resolved.

Several research conclusions were corrected and locked in during this phase, and they persist through the rest of the project:

  • Information gain is reported in nats, not bits.
  • Information gain over a Poisson baseline is state-dependent (large in active sequences, near zero in quiet periods) — there is no stable steady-state value to quote.
  • Skill must be established against ETAS, not just against a Poisson null, because both the model and ETAS already capture aftershock clustering.
  • The MAXC + 0.2 completeness correction is California-tuned, not universal, and must be re-validated per region.

Phase 1 outcome: complete. The authoritative spec exists and is adopted.


3. Phase 2 — The repository build

The product graduated from research into a real, public repository — built as a real, functional, scalable system, not a mockup. The build was verified end-to-end.

What was built (real implementations, not stubs):

  • Forecasting core: space–time ETAS (MLE fit, stability gates, simulation) + Reasenberg–Jones transparent fallback + a smoothed-seismicity Poisson null (the mandatory baseline every challenger must beat).
  • Catalog hygiene (load-bearing order): magnitude of completeness and b-value estimation (Maximum-Curvature + Goodness-of-Fit Test; Aki–Utsu MLE for b) → moment-magnitude homogenization (total-least-squares) → dual-catalog declustering (Gardner–Knopoff and Zaliapin–Ben-Zion).
  • Forecast-clock daily inference: the leakage-free driver that hands the model only the past catalog slice, emits the forecast, and seals it — producing an ensemble, P10/P90 bounds, isotonic calibration, and an operational QA gate.
  • Compact artifact writer — the few-hundred-KB-to-few-MB forecast file the web reads.
  • pyCSEP evaluation + pseudo-prospective back-analysis harness.
  • CLI (a typer command line) plus reproducible PowerShell and shell scripts, including the daily job (scoped publish), a Windows Task Scheduler entry, and a portable systemd timer.
  • Documentation and diagrams, and the static web app (see §6).

Verification (the honest part): an end-to-end smoke test ran the whole short pipeline on a small slice of real catalog data — fetch → clean → estimate $M_c$ → condition ETAS → write artifact, with the QA gate passing. All modules imported cleanly, the unit-test suite passed, the web build was clean, and a secret-scan came back clean. (See the software-tests section of Evaluation and Tests.)

Honest caveat recorded at build time: the build ran across multiple work sessions; one intermediate run was cut mid-way and a second completed it, and the integration was verified manually afterward. This is noted because honest provenance is part of the project's posture.

Phase 2 outcome: the repository is live, public, and verified end-to-end on a small real slice. Full-scale fitting and real multi-region evaluation are Phase 3+ (pending).


4. The hosting decision: GitHub Pages

After a decision-focused review, hosting for this product was settled on GitHub Pages with a custom domain. The base web is live and serving.

The reasoning, in full:

  • The web app is a pure static viewer with no processing backend. It fetches a committed JSON artifact and renders maps and charts; there is no server that computes anything.
  • Compute (training + daily inference) runs off the web host entirely — on an always-on machine with a GPU — and publishes by committing the compact artifact to the public repo (the "git-as-data" pattern). Pages then auto-publishes on push, which is one fewer moving part than a VPS that has to pull.
  • This crystallized as an architecture-decision rule: static + public + no-backend + git-updated → GitHub Pages; anything with a backend or anything private → VPS. Pages is a deliberately narrow exception class; backend services stay on a VPS.
flowchart LR
    A["Always-on GPU machine<br/>(scheduled daily job)"] -->|"fetch -> hygiene -> condition ETAS<br/>-> simulate >=10k catalogs<br/>-> calibrate -> ONE compact artifact"| B["scoped: git add results/ -> commit -> push"]
    B --> C["Public repo<br/>(git-as-data)"]
    C -->|"Pages auto-publishes on push"| D["GitHub Pages<br/>static SPA viewer"]
    D --> E["Public web<br/>(maps + charts, no backend)"]
Loading

Operational shape of the publish: a scheduled daily job fetches the catalog delta, runs hygiene, re-conditions ETAS, simulates the synthetic-catalog ensemble, calibrates, writes one compact artifact, then scoped-commits it (git add results/ — an explicit allowlist, never git add -A) and pushes. The site rebuilds automatically. Content updates once per day as one small commit.

Why this is the right call: it is cheaper, simpler, and drops the runtime backend entirely, while matching the project's local-first research discipline (heavy data and model weights stay outside git; only the compact artifact is committed). A VPS remains an optional fallback compute host — the daily job is portable (a parallel shell script + systemd timer exists) and can move there unchanged if higher uptime is ever needed. The VPS is never a web backend.

Known trade-off, handled honestly: a laptop has lower uptime than a VPS — sleep, reboots, a closed lid — so a run can be missed and the forecast can go stale. This is mitigated by the scheduler's wake + missed-run catch-up, and, crucially, by the UI degrading honestly: a prominent staleness banner ("generated … · next run …") and a coverage mask make a late/stale forecast obvious. This is acceptable precisely because the product is an independent research/education tool, not an official civil-protection alarm.


5. The global re-scope

The project's geographic scope was re-aimed from a single focus region to global. The reasoning was direct: a region-only model is the opposite of the product's purpose.

What "global" means here:

  • Train on global seismicity plus complementary global covariates, feasible to update daily for inference.
  • Run inference across many countries — spanning high-seismicity regions (Chile, Japan, Indonesia, Mexico, Turkey, California, New Zealand) and low-seismicity regions (e.g. the UK, Germany, Australia, Brazil).
  • Explicitly compare across the seismicity spectrum to detect bias toward high-seismicity zones. This is a first-class evaluation goal, not a side effect: if the model is well calibrated only where data are abundant and silently mis-calibrated in quiet regions, the reliability diagrams and per-region CSEP scores are designed to reveal it. (See Evaluation and Tests §11.)

Architecturally, "any country is a view into one global field." The model side gained the machinery for this:

  • Tectonic-regime assignment — five regimes (subduction interface, intraslab, crustal / strike-slip, intraplate, ridge), assigned from slab geometry and plate-boundary data where available, with a self-contained heuristic fallback when those grids are absent (the source is always recorded). Per-regime ETAS priors are anchored on the published USGS tectonic-regime study (Page et al. 2016).
  • Global tiling — interior + halo tiles, where the halo gives edge-correct triggering and the interior defines ownership for aggregation, so the global field can be assembled without an intractable all-pairs computation.
  • A tiled forecaster — fits ETAS (plus its smoothed background) per tile on halo events with the regime prior, enforces the ETAS stability gates per tile (a thin or supercritical tile falls back to its smoothed null, recorded transparently), then routes every global cell to its owning tile and concatenates the per-tile expected counts into one global field. This stays the calibrated reference that any neural challenger must beat.

The model design is: a global ETAS reference + smoothed Poisson null as the calibrated baseline and safe default, with the thesis model being a context-conditioned neural temporal point process that ingests global covariates. (A convolutional network is used only as the spatial-context encoder — not as a standalone aftershock predictor, an approach the research recorded as a refuted lesson.) The neural model is gated: it reaches the public forecast field only on a CSEP win over ETAS plus calibration.

Status: the global re-scope is in progress — the regime/tiling machinery is landed and tested; the full re-scope of configs, fetch, enrichers, model, and evaluation plus the global data download are underway. Full versions only — never tiny or partial as a deliverable (partial runs are internal validation).


6. The web rebuild

The web app is a React + Vite + TypeScript single-page app, styled in a dark-technical palette, with internationalization (English-first, then Spanish) and light/dark themes. It is a pure static viewer: it renders the committed forecast artifact and computes nothing server-side.

Pages (six):

  • Introduction — what the system is and is not.
  • The problem — the honest epistemics: prediction is impossible; we give bounded conditional probabilities reflecting the best understanding under given conditions.
  • Methodology — tabbed, with the classical theories (Gutenberg–Richter, Omori–Utsu, ETAS, EEPAS, BPT, …), each with real equations and references.
  • Implementation — the model architecture and an SVG flow diagram of the pipeline.
  • Back-analysis — back-test results across diverse regions and periods (populated as real CSEP results land).
  • Monitoring — the default view is a world probability FIELD (a perceptually-uniform H3-tiled map rendered with MapLibre + deck.gl), with per-country drill-down and a no-map accessible summary.

Monitoring design principles (deliberate and honest):

  • No alarm dots. The default is a continuous probability field, never discrete "danger" markers.
  • Bounds triad — P10 / median / P90 always shown together; "the pessimistic view is a plausible bad case, not a prediction."
  • Always-visible horizon selector (1 d / 2 d / 7 d) and magnitude threshold, with every number shown next to its long-term baseline (ratio + absolute count).
  • The only place red appears is the calibration badge — red signals model quality, never danger.
  • Staleness banner + coverage mask so that blank never reads as "safe."

The web build is clean (TypeScript + Vite, with the map code lazily split out of the main bundle so the no-WebGL summary path stays light).

Net change from the original research default: the research had assumed a thin FastAPI API serving the forecast. The adopted architecture removes the backend entirely — the "API" becomes static asset paths (e.g. latest.json, forecast/{date}.json.gz, calibration.json). FastAPI, if present at all, survives only as an optional local preview server, never on the production request path.


7. Governance and communication posture

Honest technical calibration is necessary but not sufficient for a public live-number product. The communication-governance question was flagged as the highest real-world-harm dimension and was addressed explicitly.

The core risk is the field's cautionary tale, L'Aquila (2009): the harm there was a communication failure — false reassurance, not a failure to predict, and it produced criminal trials of scientists. A public site publishing live earthquake probabilities inherits that exact surface, with two symmetric failure modes: "you said 2% and it happened" (under-warning) and "you said elevated and nothing happened" (crying wolf). A single calibrated probabilistic forecast is not wrong in either case — but being technically right does not prevent out-of-context screenshots or press misreading.

Safeguards built in by design (within our control, implemented now):

  1. Never an alarm, never a "safe" state — no alarm dots, no countdown, no binary call; every number is shown against its long-term baseline with horizon and magnitude threshold attached.
  2. Complement, never compete — copy states plainly that this is an independent research/education tool that complements official OEF systems (USGS, INGV, national networks) and is not an authoritative civil-protection alarm; defer to official agencies for action.
  3. Calibration-first credibility — the live reliability diagram ("when we said 5%, it happened ~5%") is the headline artifact, and every forecast is logged at issue time and scored prospectively.
  4. Honest, not cosmetic, uncertainty — P10/median/P90 bounds, over-dispersion-aware; staleness banner + coverage mask.
  5. A visible, plain-language disclaimer and "how to read this" on every forecast surface, plus a methodology/limits page carrying the honest creed verbatim.

The terms-of-use / liability disclaimer is approved and implemented (research/education purpose, no warranty, not for life-safety decisions, limitation of liability — present in the LICENSE disclaimer, the README disclaimer, and the web app's framing). A lawyer's review remains advisable before a high-visibility launch given the L'Aquila precedent.

What still gates the public launch of live numbers (none of which blocks building, and none of which blocks publishing the historical back-analysis, which carries no real-time public harm surface):

  • the explicit go / no-go on publishing live public numbers vs launching back-analysis-only first (the recommended sequencing);
  • the coordination stance with the relevant official authority (credit + complement, never speak for or compete with);
  • a prepared press / social-misuse posture and a screenshot-resilient UI (every shared frame carries horizon + threshold + baseline + timestamp + "not an alarm" so an out-of-context crop still self-explains).

8. What is DONE vs PENDING

Done

  • Deep research consolidated into an authoritative spec (methodology, model design, evaluation plan, data and pipelines, web-app spec).
  • Public repository scaffolded, built, and published — real implementations of ETAS, Reasenberg–Jones, the smoothed null, catalog hygiene, the forecast-clock inference, the QA gate, the compact-artifact writer, the pyCSEP back-analysis harness, the CLI, and the reproducible scripts.
  • End-to-end verification on a small slice of real catalog data (smoke test passing, unit suite passing, clean web build, clean secret-scan).
  • Hosting decided (GitHub Pages, custom domain) with the base web live.
  • Web app built — six pages, i18n EN/ES, light/dark, the probability-field monitoring map.
  • Governance posture decided and the disclaimer/terms approved and implemented.
  • Global re-scope machinery — tectonic-regime assignment, global tiling, and the tiled forecaster — landed and tested.

Pending (honestly not yet finished)

  • Real-data depth at scale: the full catalog fetch, full feature build, and full ETAS fit (the smoke test verified the pipeline, not a production-scale fit).
  • Real multi-region CSEP back-analysis: running the pseudo-prospective harness across ≥4 regions and periods to populate the Back-analysis page with real numbers. The harness exists; the results do not yet.
  • The global re-scope, completed: configs, fetch, enrichers, model, and evaluation fully re-scoped to global, plus the global data download.
  • The neural challenger, trained and gated: the context-conditioned neural temporal point process trained on the GPU and shown to beat ETAS in our own pseudo-prospective CSEP harness (positive IGPE, T-test CI excluding zero) plus passing calibration — the only condition under which it reaches the public map. Until then, ETAS is what ships.
  • Wiring the live daily job end-to-end (scheduler + scoped publish + least-privilege deploy credential) for the live Monitoring numbers.
  • Public launch of live numbers — gated on the governance go/no-go (back-analysis can go public first).

The single most important honesty statement: the real training and real evaluation are not yet complete. The Back-analysis page shows real numbers only when the multi-region pseudo-prospective CSEP run has actually been performed — never in-progress or partial results dressed up as final.


9. Open follow-ups

Captured openly so the record stays honest:

Scientific / methodological:

  • The predictability ceiling is real — the best operational models beat a time-independent Poisson baseline by a modest information gain, mostly in aftershock-rich windows. Any claim beyond "modest, well-calibrated conditional probabilities" is overclaiming, and the web app must say so plainly.
  • ML must prove it beats physics. A famous deep-learning aftershock result was later shown to be largely reproducible by a two-parameter physical model under proper evaluation. The neural challenger therefore has to demonstrate a CSEP win over ETAS — it is never assumed.
  • Tidal triggering is a weak standalone signal — real but small, concentrated in specific regimes. It is used only as a regularized covariate; the web app must not imply tides "cause" forecastable quakes.
  • Catalog artifacts — completeness varies in space and time; declustering choices bias models — are handled explicitly and documented, never swept under the rug.

Engineering / product:

  • Auto-commit leakage is the #1 engineering risk — the publish job runs on a machine that has the raw data and environment, so the commit is scoped to an explicit allowlist (git add results/), with .gitignore as a second line and a pre-push guard as a third; the push credential is a dedicated, least-privilege, single-repo credential.
  • Laptop reliability — mitigated by scheduler wake + missed-run catch-up and the honest staleness UI; the job is portable to a VPS if uptime ever needs to improve.
  • No silent corrupt publishes — the QA gate refuses to commit an anomalous or stale artifact and leaves the last-good one in place.

Open product/research decisions and pre-publication checks:

  • Real-time vs final-catalog input policy, documenting the optimistic bias where historical real-time states cannot be reconstructed.
  • Whether to publish a full predictive count distribution (enabling CRPS) in addition to the binary exceedance scalar (Brier-friendly).
  • The compute/storage budget for ≥10,000 synthetic catalogs per day per region per horizon over multi-year back-tests.
  • Whether to pursue true prospective CSEP registration (depositing forecasts with a testing center) for the strongest possible credibility, alongside the pseudo-prospective back-analysis and a live prospective tail.
  • Pinning the exact smoothed-seismicity and ETAS baseline implementations so the comparison is fair and the baselines are themselves defensible.
  • A small set of pre-publication citation checks to confirm from full text before they appear on the public site.

10. The roadmap from here

  1. Deep research — done.
  2. Data pipeline + environment — done (built and smoke-verified; full-scale fetch pending).
  3. Model + training (on the GPU machine) — ETAS reference + Poisson null first (built), then the gated neural challenger (pending). Compute is no longer the ceiling; the design target is the strongest validated model, not the biggest network.
  4. Back-analysis + evaluation — pseudo-prospective CSEP across ≥4 regions and periods (harness built; real numbers pending). This feeds the web Back-analysis section with honest, real results. (See Evaluation and Tests.)
  5. Web app (static viewer) — built; refreshed as real results land.
  6. Daily inference + publish — scheduled job → scoped commit → push → the static site updates. The public launch of live numbers is gated on the governance sign-off; back-analysis can go public first.

Public technical documentation. This page is kept deliberately honest about what is built and verified versus what is not yet finished — because "evaluated against reality" is only a credible claim if the record of progress is itself truthful.

Clone this wiki locally