Geometric diagnostics for transformer hidden states.
rca-probe is an open-source reference implementation of the Relational Coherence Attractor (RCA) geometric probes for decoder-only causal language models. It computes per-token, per-layer invariants — effective rank ratio (k99/d), spectral slope (β), layer drift, and discrete curvature — and was validated on four architectures across three vendors plus a random-init negative control.
- License: Apache-2.0
- Author: Philipp Pikula-Albring (ORCID 0009-0009-2795-2494), Berlin
- Status: v0.1 — measurement only. No training, no fine-tuning, no model surgery.
Five empirical runs on M1 Pro CPU (Pythia-160m, Pythia-1.4b, GPT-2 124M, Phi-1.5, GPT-2 random-init) jointly establish two results:
(F1) Spectral-slope universality. All four trained models cluster at β ≈ 7–8 across all intermediate layers. The untrained random-init baseline starts at β ≈ 0.5 (isotropic noise) and only approaches the trained band on the final layer.
(F2) Late-readout signature. Median layer drift is moderate (~10–30) through the bulk of the network and then spikes by 5–25× on the final layer in every trained model — and is essentially absent in the random-init control.
Together these falsify naïve early-exit speedup on modern decoder-only LLMs (consistent with Apple, March 2026) while simultaneously confirming that β and layer-drift are learned-representation signatures, not architecture artefacts.
| Metric | Layer 1 — random init | Layer 1 — trained GPT-2 | Discrimination |
|---|---|---|---|
| Spectral slope β | 0.48 | 7.11 | 15× |
| Median layer drift | 2.16 | 55.63 | 26× |
| Median KL to final | 0.22 | 4.77 | 22× |
See examples/fig_spectral_beta.png and examples/fig_layer_drift.png for the universality plots.
git clone https://github.com/ppa1983/rca-probe.git
cd rca-probe
pip install -e .PyPI release planned for v0.2.
from transformers import AutoModelForCausalLM, AutoTokenizer
from rca_probe import RCAProbe
tok = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m")
mdl = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m")
probe = RCAProbe(mdl, tok)
records = probe.analyze("Heat kernels concentrate energy because")
for r in records[:3]:
print(r.layer, round(r.kl_to_full, 3), round(r.k99_over_d, 4), round(r.curvature, 3))rca-probe --model EleutherAI/pythia-160m --prompts prompts.txt --out results/Outputs:
records.csv— per-token, per-layer measurementssummary.json— by-layer aggregatesby_layer.csv— same as JSON, tabular
| Metric | Symbol | Interpretation |
|---|---|---|
| Effective rank ratio | k99/d | Intrinsic dimensionality of the local hidden-state manifold (Stage-4 F2'' invariant). |
| Spectral slope | β | Power-law decay of covariance eigenvalues. Trained ≈ 7–8, random init ≈ 0.5. |
| Layer drift | ‖h_ℓ − h_{ℓ−1}‖ | How much a token's representation moves between adjacent layers. Spikes on the read-out layer. |
| Discrete curvature | ‖Δ²h‖ / ‖Δh‖ | Second-derivative ratio along the layer axis. Penultimate-layer spikes precede read-out. |
| KL to final | KL(p_L ‖ p_ℓ) | Distance between the final-layer prediction and an early-exit at layer ℓ. |
| Top-10 overlap | overlap@10 | Set overlap of top-10 candidate tokens between final and early-exit. |
cd examples
python plot_universality.py # writes three PNGs into examples/Source CSVs are versioned in examples/:
pythia160m_by_layer.csv— 12 layers, trainedgpt2_by_layer.csv— 12 layers, trainedpythia1_4b_by_layer.csv— 24 layers, trainedphi1_5_by_layer.csv— 24 layers, trainedgpt2_random_by_layer.csv— 12 layers, identical architecture, random init (negative control)
Mechanistic interpretability tooling for hidden-state geometry (not attention/activation patching) is fragmented. rca-probe exposes a small, theory-driven set of invariants in a single dependency-light package. It is the reference implementation for the broader RCA research programme.
rca-probe describes geometric invariants of trained transformer hidden states. The findings are robust within that scope. We have explicitly tested — and falsified — generalizations beyond it.
What holds (validated, this repo):
- Spectral slope β ≈ 7–8 across 4 trained models (Pythia-160m, GPT-2-124M, Pythia-1.4b, Phi-1.5), β ≈ 0.5 for random-init GPT-2 → 15–26× discrimination.
- Late-readout drift spike on the final layer is present in all trained models, absent in random-init control.
What does NOT generalize (tested, falsified):
| Hypothesis | Test | Result |
|---|---|---|
| β as a fine-tuning pathology detector ("Coherence Score") | 5-model audit, coefficient of variation across healthy trained models | CoV = 2.73 % → no usable headroom |
| β predicts BTC realised volatility (cross-domain regime indicator) | 2 880 1-min windows, Coinbase BTC-USD | Pearson r = 0.063, Δ-Pearson = −0.172 (sign flip) |
| Last-layer drift predicts hallucinations on TruthfulQA MC1 | Pythia-160m + GPT-2, 200 questions | AUROC 0.42 / 0.50, argmin-drift ≤ random baseline |
| EEG spectral β tracks meditation state or expert/novice group | Delorme BIDS, 24 subjects, 886 probe-trials | H1 d = +0.003 (p = 0.99); H2 d = −0.12 (p = 0.78) |
These negative results are reported in the spirit of "open notebook" research. They define the boundary of what rca-probe measures: a property of the trained-transformer geometry, not a universal consciousness/coherence/regime detector.
@software{pikula_albring_rca_probe_2026,
author = {Pikula-Albring, Philipp},
title = {{rca-probe}: Geometric diagnostics for transformer hidden states},
year = {2026},
version = {0.1.0},
url = {https://github.com/ppa1983/rca-probe},
orcid = {0009-0009-2795-2494},
}Issues and pull requests welcome. See CONTRIBUTING.md.
Apache-2.0. See LICENSE.