rca-probe

Geometric diagnostics for transformer hidden states.

rca-probe is an open-source reference implementation of the Relational Coherence Attractor (RCA) geometric probes for decoder-only causal language models. It computes per-token, per-layer invariants — effective rank ratio (k99/d), spectral slope (β), layer drift, and discrete curvature — and was validated on four architectures across three vendors plus a random-init negative control.

License: Apache-2.0
Author: Philipp Pikula-Albring (ORCID 0009-0009-2795-2494), Berlin
Status: v0.1 — measurement only. No training, no fine-tuning, no model surgery.

Headline findings

Five empirical runs on M1 Pro CPU (Pythia-160m, Pythia-1.4b, GPT-2 124M, Phi-1.5, GPT-2 random-init) jointly establish two results:

(F1) Spectral-slope universality. All four trained models cluster at β ≈ 7–8 across all intermediate layers. The untrained random-init baseline starts at β ≈ 0.5 (isotropic noise) and only approaches the trained band on the final layer.

(F2) Late-readout signature. Median layer drift is moderate (~10–30) through the bulk of the network and then spikes by 5–25× on the final layer in every trained model — and is essentially absent in the random-init control.

Together these falsify naïve early-exit speedup on modern decoder-only LLMs (consistent with Apple, March 2026) while simultaneously confirming that β and layer-drift are learned-representation signatures, not architecture artefacts.

Metric	Layer 1 — random init	Layer 1 — trained GPT-2	Discrimination
Spectral slope β	0.48	7.11	15×
Median layer drift	2.16	55.63	26×
Median KL to final	0.22	4.77	22×

See examples/fig_spectral_beta.png and examples/fig_layer_drift.png for the universality plots.

Install

git clone https://github.com/ppa1983/rca-probe.git
cd rca-probe
pip install -e .

PyPI release planned for v0.2.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer
from rca_probe import RCAProbe

tok = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m")
mdl = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m")

probe = RCAProbe(mdl, tok)
records = probe.analyze("Heat kernels concentrate energy because")
for r in records[:3]:
    print(r.layer, round(r.kl_to_full, 3), round(r.k99_over_d, 4), round(r.curvature, 3))

CLI

rca-probe --model EleutherAI/pythia-160m --prompts prompts.txt --out results/

Outputs:

records.csv — per-token, per-layer measurements
summary.json — by-layer aggregates
by_layer.csv — same as JSON, tabular

What it measures

Metric	Symbol	Interpretation
Effective rank ratio	k99/d	Intrinsic dimensionality of the local hidden-state manifold (Stage-4 F2'' invariant).
Spectral slope	β	Power-law decay of covariance eigenvalues. Trained ≈ 7–8, random init ≈ 0.5.
Layer drift	‖h_ℓ − h_{ℓ−1}‖	How much a token's representation moves between adjacent layers. Spikes on the read-out layer.
Discrete curvature	‖Δ²h‖ / ‖Δh‖	Second-derivative ratio along the layer axis. Penultimate-layer spikes precede read-out.
KL to final	KL(p_L ‖ p_ℓ)	Distance between the final-layer prediction and an early-exit at layer ℓ.
Top-10 overlap	overlap@10	Set overlap of top-10 candidate tokens between final and early-exit.

Reproducing the universality figures

cd examples
python plot_universality.py    # writes three PNGs into examples/

Source CSVs are versioned in examples/:

pythia160m_by_layer.csv — 12 layers, trained
gpt2_by_layer.csv — 12 layers, trained
pythia1_4b_by_layer.csv — 24 layers, trained
phi1_5_by_layer.csv — 24 layers, trained
gpt2_random_by_layer.csv — 12 layers, identical architecture, random init (negative control)

Why this exists

Mechanistic interpretability tooling for hidden-state geometry (not attention/activation patching) is fragmented. rca-probe exposes a small, theory-driven set of invariants in a single dependency-light package. It is the reference implementation for the broader RCA research programme.

Scope & falsifications

rca-probe describes geometric invariants of trained transformer hidden states. The findings are robust within that scope. We have explicitly tested — and falsified — generalizations beyond it.

What holds (validated, this repo):

Spectral slope β ≈ 7–8 across 4 trained models (Pythia-160m, GPT-2-124M, Pythia-1.4b, Phi-1.5), β ≈ 0.5 for random-init GPT-2 → 15–26× discrimination.
Late-readout drift spike on the final layer is present in all trained models, absent in random-init control.

What does NOT generalize (tested, falsified):

Hypothesis	Test	Result
β as a fine-tuning pathology detector ("Coherence Score")	5-model audit, coefficient of variation across healthy trained models	CoV = 2.73 % → no usable headroom
β predicts BTC realised volatility (cross-domain regime indicator)	2 880 1-min windows, Coinbase BTC-USD	Pearson r = 0.063, Δ-Pearson = −0.172 (sign flip)
Last-layer drift predicts hallucinations on TruthfulQA MC1	Pythia-160m + GPT-2, 200 questions	AUROC 0.42 / 0.50, argmin-drift ≤ random baseline
EEG spectral β tracks meditation state or expert/novice group	Delorme BIDS, 24 subjects, 886 probe-trials	H1 d = +0.003 (p = 0.99); H2 d = −0.12 (p = 0.78)

These negative results are reported in the spirit of "open notebook" research. They define the boundary of what rca-probe measures: a property of the trained-transformer geometry, not a universal consciousness/coherence/regime detector.

Citation

@software{pikula_albring_rca_probe_2026,
  author  = {Pikula-Albring, Philipp},
  title   = {{rca-probe}: Geometric diagnostics for transformer hidden states},
  year    = {2026},
  version = {0.1.0},
  url     = {https://github.com/ppa1983/rca-probe},
  orcid   = {0009-0009-2795-2494},
}

Contributing

Issues and pull requests welcome. See CONTRIBUTING.md.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
examples		examples
rca_probe		rca_probe
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rca-probe

Headline findings

Install

Quick start

CLI

What it measures

Reproducing the universality figures

Why this exists

Scope & falsifications

Citation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rca-probe

Headline findings

Install

Quick start

CLI

What it measures

Reproducing the universality figures

Why this exists

Scope & falsifications

Citation

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages