GitHub - fathom-lab/fathom: cognitive geometry of llm reasoning via sae feature coherence. cross-architecture. physics-grounded. patent pending.

  ███████╗ █████╗ ████████╗██╗  ██╗ ██████╗ ███╗   ███╗
  ██╔════╝██╔══██╗╚══██╔══╝██║  ██║██╔═══██╗████╗ ████║
  █████╗  ███████║   ██║   ███████║██║   ██║██╔████╔██║
  ██╔══╝  ██╔══██║   ██║   ██╔══██║██║   ██║██║╚██╔╝██║
  ██║     ██║  ██║   ██║   ██║  ██║╚██████╔╝██║ ╚═╝ ██║
  ╚═╝     ╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝ ╚═════╝ ╚═╝     ╚═╝

cognitive state measurement for transformers

SAE feature decomposition reveals a cognitive property — commitment intensity — that is statistically significant, beats standard uncertainty baselines, and is completely invisible to three independent raw-activation probes across four transformer architectures.

SAEs are not interpretability aids. They are the spectroscopes of artificial cognition.

Research lab. For the production runtime that puts this instrument on live LLM agents, see fathom-lab/styxx — one-line drop-in for openai, langchain, crewai, autogen.

results at a glance

 cross-architecture     d = 0.584      Fisher p = 0.00018
 beats logit entropy    AUC 0.663      vs 0.607 (p = 0.013)
 2D cognitive map       65% halluc     danger zone vs 22% safe
 SAE necessity          3 probes       4 architectures — all null
 architectures          Gemma-2-2B     Llama-3.2-1B confirmed
 patents                3 filed        US 64/020,489 · 64/021,113 · 64/026,964

what is commitment intensity?

                    ┌─────────────────────────────────────────┐
                    │          S = max(C∆) / K_τ              │
                    │                                         │
                    │   C∆(t) = coherence_late - coherence_early   │
                    │   K_τ   = count of spikes above threshold    │
                    │                                         │
                    │   mathematically equivalent to:         │
                    │   S = M × IPR(event_locations)          │
                    │                                         │
                    │   IPR = inverse participation ratio     │
                    │   (condensed-matter physics, 70+ years) │
                    └─────────────────────────────────────────┘

     high S ──► few intense commitment events ──► attractor lock-in
     low  S ──► many distributed events      ──► exploration

cross-architecture replication

  MODEL                    LAYERS    WINDOW     d        AUC      p
  ─────────────────────────────────────────────────────────────────
  Gemma-2-2B-IT              26     [0, 7)    +0.535    0.663    0.013
  Llama-3.2-1B-Instruct      16     [0, 20)   +0.635    0.641    0.005
  ─────────────────────────────────────────────────────────────────
  POOLED                                      +0.584             0.00018

commitment window scales with model depth. fewer layers → more tokens to settle.

head-to-head vs standard methods

  SIGNAL                AUC      p          SOURCE
  ──────────────────────────────────────────────────
  S_early (ours)       0.663    0.013  ◄── SAE coherence
  logit entropy        0.607    0.053      standard
  logprob              0.559    0.291      standard
  top-2 margin         0.477    0.624      standard
  ──────────────────────────────────────────────────
  S is the ONLY feature reaching significance.
  r(S, entropy) = -0.17 — nearly independent signals.

SAE necessity — the probes that failed

  PROBE                  ARCHITECTURES    FISHER p    RESULT
  ─────────────────────────────────────────────────────────
  top-k IPR                    4           0.619      null
  cross-layer cosine           4           0.948      null
  cognitive fingerprint        4             —        AUC 0.566
  ─────────────────────────────────────────────────────────
  SAE-based S (Gemma)          1           0.013      ✓ confirmed
  SAE-based S (Llama)          1           0.005      ✓ confirmed
  ─────────────────────────────────────────────────────────
  SAE decomposition is doing irreplaceable work.

quick start

Production runtime (recommended). Use styxx — a one-line drop-in wrapper that carries the same centroids, the same classifier, and reads cognitive state off any openai / langchain / crewai / autogen agent without loading a local model:

pip install styxx[openai]

from styxx import OpenAI
client = OpenAI()
r = client.chat.completions.create(model="gpt-4o", messages=[...])
print(r.vitals.gate)    # "pass" / "warn" / "fail"

Research pipeline. To run the SAE measurement engine directly from this repo (requires a local model checkpoint + transformer-lens + sae-lens):

cd api/
python - <<'PY'
from coherence_steerer_ext import CoherenceSteererExt

steerer = CoherenceSteererExt(model_name="google/gemma-2-2b-it")
result = steerer.generate_with_entropy(
    "Q: What is the capital of France?\nA:",
    max_tokens=20,
)

for i, (cd, ent) in enumerate(zip(
    result.c_delta_trajectory,
    result.entropy_trajectory,
)):
    print(f"  t={i}: C_delta={cd:+.4f}  entropy={ent:.3f}")
PY

cognitive zones

the classifier maps every token to one of four zones in a 2D (commitment, entropy) space:

  SAFE       low commitment + low entropy      grounded retrieval
  UNCERTAIN  low commitment + high entropy     exploring
  RISKY      high commitment + low entropy     confident — verify
  DANGER     high commitment + high entropy    hallucination signature

at inference time this becomes the hallucination / reasoning / adversarial / refusal signal that styxx emits on every call. the production classifier is frozen from the atlas centroids below.

the fathom cognitive atlas

  the first public, cross-architecture cognitive state atlas of
  open-weight language models.

  v0.3 — H1 supported (pre-registered replication)
    n = 6 model family pairs (Gemma-2, Gemma-3, Llama-3.2-3B,
                              Llama-3.2-1B, Qwen2.5-3B, Qwen2.5-1.5B)
    mean LOO cos = +0.769   bootstrap CI [+0.571, +0.869]
    permutation p = 0.0315  (one-sided, 2000 shuffles)

  pre-registered in PREREG_v0.3_attractor_replication.md before
  any v0.3 data was captured. analysis ran without modification.

→ full writeup: atlas/FINDINGS_v0.3.md → atlas overview + roadmap: atlas/README.md → sealed pre-registration: atlas/PREREG_v0.3_attractor_replication.md → rigor audit: atlas/FINDINGS_bulletproof_audit.md

repository structure

  fathom-lab/fathom/
  │
  ├── README.md                  you are here
  ├── coherence_steerer.py       core SAE measurement engine
  ├── depth_scorer.py            depth scoring via circuit attribution
  ├── fathom_oversight.py        D-axis + K/C/S measurement pipeline
  ├── requirements.txt           dependencies
  │
  ├── paper/                     ICML 2026 workshop submission (LaTeX)
  ├── atlas/                     cognitive atlas v0.3 (see above)
  │   ├── probes/                probe_set_v0.1.json, hash-pinned
  │   ├── captures/              12 versioned per-model JSONs
  │   ├── analysis/              bootstrap, permutation, estimator validation
  │   ├── FINDINGS_*.md          v0.1 → v0.2 → v0.2.1 → v0.3 trail
  │   └── PREREG_v0.3_*.md       sealed pre-registration
  │
  ├── analysis/                  D-axis + S-axis analysis scripts
  ├── runners/                   experiment drivers
  ├── probes/                    SAE-free cross-architecture probes
  ├── verification/              reproducibility assertions
  ├── api/                       Fathom Scan API v2 + interactive demo
  │
  ├── findings/                  documented results
  ├── prereg/                    OSF pre-registrations (locked before data)
  ├── docs/                      strategy, vision, applications
  │
  ├── figures/                   publication figures (PDF + PNG)
  └── truthfulqa_results/        experimental data (JSON)

reproducibility

python verification/verify_all_claims.py   # 17 assertions from raw data
python verification/verify_s_axis.py       # 14 S-axis specific assertions

all claims in the paper verified to reproduce exactly from saved JSON data.

citation

@article{rodabaugh2026fathom,
  title   = {Fathom: Cognitive Measurement Instruments for
             Transformer Internals via SAE Feature Coherence Geometry},
  author  = {Rodabaugh, Alexander},
  year    = {2026},
  note    = {Zenodo concept DOI. doi:10.5281/zenodo.19504993}
}

@misc{rodabaugh2026atlas,
  title  = {The Fathom Cognitive Atlas v0.3: Pre-registered
            Cross-Architecture Replication of the RLHF Attractor},
  author = {Rodabaugh, Alexander},
  year   = {2026},
  note   = {Zenodo concept DOI. doi:10.5281/zenodo.19504993}
}

license

MIT on code. CC-BY-4.0 on the atlas data. patent pending on the underlying methodology — US Provisional 64/020,489 · 64/021,113 · 64/026,964.

Fathom Intelligence · fathom.darkflobi.com · Alexander Rodabaugh · 2026

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github		.github
analysis		analysis
api		api
atlas		atlas
brand		brand
data		data
docs		docs
figures		figures
findings		findings
oversight_results		oversight_results
paper		paper
prereg		prereg
probes		probes
runners		runners
surgical_checkpoints		surgical_checkpoints
truthfulqa_results		truthfulqa_results
verification		verification
viz		viz
zenodo_v14		zenodo_v14
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
ROADMAP_DRIFT_MONITOR.md		ROADMAP_DRIFT_MONITOR.md
ROADMAP_D_AXIS.md		ROADMAP_D_AXIS.md
SITE_COPY_GUARDIAN.md		SITE_COPY_GUARDIAN.md
analyze_cognitive_profile.py		analyze_cognitive_profile.py
analyze_d_axis.py		analyze_d_axis.py
autopilot_results.json		autopilot_results.json
autopilot_results_v2.json		autopilot_results_v2.json
check_orphan.py		check_orphan.py
cognitive_ekg.py		cognitive_ekg.py
cognitive_state.py		cognitive_state.py
coherence_steerer.py		coherence_steerer.py
depth_scorer.py		depth_scorer.py
eval_oversight.py		eval_oversight.py
fathom_autopilot.py		fathom_autopilot.py
fathom_guardian.py		fathom_guardian.py
fathom_oversight.py		fathom_oversight.py
fathom_pipeline.py		fathom_pipeline.py
findings_2026_04_10_guardian.md		findings_2026_04_10_guardian.md
guardian_confab_results.json		guardian_confab_results.json
guardian_results.json		guardian_results.json
make_guardian_figure.py		make_guardian_figure.py
osf_prereg_d_axis.md		osf_prereg_d_axis.md
requirements.txt		requirements.txt
run_all_experiments.py		run_all_experiments.py
spectrum_sampler.py		spectrum_sampler.py
spectrum_steerer.py		spectrum_steerer.py
surgical_fix.py		surgical_fix.py
surgical_fix_v2.py		surgical_fix_v2.py
surgical_fix_v3.py		surgical_fix_v3.py
surgical_verify.py		surgical_verify.py
sweep_guardian_thresholds.py		sweep_guardian_thresholds.py
test_autopilot.py		test_autopilot.py
test_guardian.py		test_guardian.py
test_guardian_confab.py		test_guardian_confab.py
upload_osf_prereg.py		upload_osf_prereg.py
upload_zenodo_v11.py		upload_zenodo_v11.py
upload_zenodo_v8.py		upload_zenodo_v8.py
zenodo_v11_deposit.json		zenodo_v11_deposit.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cognitive state measurement for transformers

results at a glance

what is commitment intensity?

cross-architecture replication

head-to-head vs standard methods

SAE necessity — the probes that failed

quick start

cognitive zones

the fathom cognitive atlas

repository structure

reproducibility

citation

license

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

cognitive state measurement for transformers

results at a glance

what is commitment intensity?

cross-architecture replication

head-to-head vs standard methods

SAE necessity — the probes that failed

quick start

cognitive zones

the fathom cognitive atlas

repository structure

reproducibility

citation

license

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages