Skip to content
Peter Lodri edited this page Jun 10, 2026 · 1 revision

FAQ

Short, honest answers about what this project does and does not claim. If a question here is wrong or incomplete, corrections are welcome — open an issue or email cabotage@pm.me.

See also: Home · The Research · Results and Findings · Engine and API · Dataset · Reproduce · Contributing and Experiments


Is the Barendregt–Geuvers–Klop (BGK) conjecture solved?

No. The BGK conjecture — does weak normalization (WN) imply strong normalization (SN) in pure type systems? — is open, and has been since the 1990s. It is TLCA Open Problem #9. Nothing in this repository resolves it. If you came here hoping for a proof or refutation, this is not that; it is an open dataset and an experimental engine that study the mechanism around the conjecture.

So is this a proof of anything?

No — it corroborates a mechanism, it does not prove the conjecture. The accepted intuition is that erasure is the obstruction to WN ⟹ SN: a function that throws away its argument (an erasing $\lambda K$ redex) can hide a non-terminating subterm and still reach a normal form. We make that concrete and measurable:

  • In the erasure-free $\lambda I$-calculus, WN ⟺ SN is a theorem (Church 1941; Barendregt Thm 9.1.5) — there is no gap to find. Our BGK lab checks this empirically: 0 separators across 3.13 M $\lambda I$ terms.
  • The instant erasure is allowed (the full $\lambda K$-calculus), separators appear. The smallest closed WN-but-not-SN term in our census has natural size 13: $(\lambda a.\lambda b.,b),\Omega$ — an erasing function applied to $\Omega$.

These are empirical measurements and an exhaustive check of a known theorem, not a new theorem. We are careful throughout to separate proven (cited) from empirical (computed here), and to state the undecided remainder explicitly. See The Research for the full framing.

What exactly is a "separator"?

A separator is a closed term that is weakly normalizing but not strongly normalizing — i.e. it sits in the WN ∖ SN gap. Some reduction strategy reaches a true normal form, but another reduction sequence runs forever. Separators are exactly the witnesses that would be relevant to a WN-vs-SN distinction. Our four classes per term are SN, SEP (separator = WN ∧ ¬SN), NWN (no normal form at all), and UND (undecided within the engine's caps). See Results and Findings.

Why does the decided-SN fraction stay so high (≥ 99% up to n = 55) if SN-density is proven to go to 0?

This is the most important honest caveat, and the answer is the decidability horizon.

Bendkowski–Grygiel–Lescanne–Zaionc (2017, Cor. 4) prove that in exactly the natural-size de Bruijn model we use, SN-density → 0 — asymptotically almost every term is not strongly normalizing (indeed not typeable, not in normal form). Yet our decided SN fraction stays above 99% through $n = 55$. There is no contradiction:

  • SN-ness is undecidable, and our engine only decides the easy terms, which are overwhelmingly SN at these sizes.
  • The non-SN mass that the theorem promises lives in the undecided (UND) tail, which our data shows growing monotonically with $n$ (from ~0% toward ~10% at $n = 55$).

So our numbers are consistent with SN-density → 0; they simply cannot witness the asymptotic collapse directly. What they do witness is where the difficulty concentrates — the decidable frontier and the growing hard core. The high decided-SN figure is a statement about decidability, not about the true SN-density.

Why does the answer "flip" between papers — SN-density → 0 vs → 1?

Because the size convention controls the answer. Two results that look contradictory are measuring two different probability spaces:

Model Variable cost Result
Natural-size de Bruijn (this repo; Bendkowski–Grygiel–Lescanne–Zaionc 2017) $\lvert k\rvert = k+1$ (a de Bruijn index of depth $k$ costs $k+1$) SN-density → 0
David–Grygiel–Kozik–Raffalli–Theyssier–Zaionc (2013) variables cost 0 SN-density → 1

When variables are free, large terms are dominated by cheap variable structure and tend to terminate; when each variable carries its de Bruijn depth as a cost, the measure shifts toward the non-terminating bulk. Neither is "the" right convention — they answer different questions. We use the natural-size convention because it matches OEIS A275057 (Lescanne 2016), which our closed-term counts reproduce term-for-term. Worth stressing: the asymptotic WN density and the size of the WN ∖ SN gap are not computed in either paper — they remain open, and that gap is what our separator-density curve probes empirically. See The Research and Dataset.

Why did the Monte-Carlo run stall at n = 70?

The run hit the decidability horizon for real, not just in theory. As $n$ grows, the fraction of terms the bounded engine cannot decide rises, and the per-term classification cost is dominated by those hard terms — the engine explores larger and larger bounded reduction graphs before giving up. Around $n = 70$ the per-term cost exploded and the run became intractable, so we report through $n = 55$. It is an accidental but vivid demonstration of the very phenomenon the data describes: the hard, non-SN mass concentrating in the tail. (Mechanically the caps are node_cap = 3000, size_cap = 200; see Engine and API.)

Is the size-13 separator really new?

As far as we can tell, yes — it appears unrecorded, and we'd genuinely like to be proven wrong. The smallest closed separator we find is at natural size 13: $(\lambda a.\lambda b.,b),\Omega$, in de Bruijn form $\lambda\lambda 0;(\lambda(0,0);\lambda(0,0))$ — an erasing function applied to $\Omega$ (which is itself the smallest closed non-WN term, at natural size 9). The combinatorics-of-$\lambda$-terms literature tabulates SN-density and term counts, but we did not find the WN ∖ SN gap or its smallest witness recorded anywhere.

Caveat on the number "13": it is tied to our size convention (natural-size, closed, de Bruijn) and our enumeration order. The companion BGK lab reports a separator onset at size 12 under its own ad-hoc $\lambda K$ enumeration — a different counting scheme, so the numbers are not directly comparable. If you know a prior reference for either, please tell us; a citation is a better outcome than a claim of novelty.

How confident are the Monte-Carlo numbers?

The exact rows ($n \le 16$) are a full enumeration — no sampling error, and validated against OEIS A275057. The sampled rows ($n \in {17,\dots,55}$) use uniform rank-unranking Monte-Carlo with $K = 50{,}000$ samples per size, and each sampled density in the dataset carries a Wilson 95% confidence interval (in the JSON). The separator-density curve rises from roughly 0.014% at $n = 13$ to about 0.4% of decided terms at $n = 55$. See Dataset and Reproduce.

Can I trust the classifier? How does it decide SN / WN / NWN?

The engine builds a bounded, α-quotiented (de Bruijn-keyed) reduction graph and reads normalization off its shape: SN ⟺ finite acyclic; ¬SN ⟺ a β-cycle exists; WN ⟺ a true normal form is reachable. One subtlety bit us and we fixed it: a node whose reducts were all dropped by the size_cap is not a normal form — counting it as one would falsely flag WN. Whenever the graph hits a cap before deciding, the term is honestly reported as UND rather than guessed. Details and the term representation are in Engine and API.

This is "machine-assisted research" — what does that mean for trust?

It means the experiments were designed, run, verified, and written up with substantial AI orchestration, then cross-checked against primary sources (cited in docs/). Every quantitative claim is computed, not asserted, and the code is pure-stdlib Python with self-tests you can run yourself (Reproduce). We deliberately distinguish proven theorems (with citations) from our empirical measurements, and we always state the undecided remainder. Reproduce it, break it, and report what you find — that is the point.

What is crabcc?

crabcc is a separate tool by the same author: a Rust symbol-index and call-graph engine built for AI agents (github.com/crabcc-labs/crabcc). It is not part of the normalization census — it appears here only as a bonus panel: crabcc v6.3 indexed the Linux 6.18 LTS kernel (3.87 M symbols, 68,875 files, ~177 s; kmalloc → 4,889 callers; 328 mutual-recursion SCCs), and those stats ship in data/crabcc_kernel_index.json and the interactive report. It is included as a demonstration of large-scale, agent-oriented static analysis tooling, not as evidence about the λ-calculus.

What is Vaked?

Vaked is another project by the same author, mentioned alongside crabcc and this research in the shared opt-in newsletter / cross-project hub at crabcc.app/about. It is not part of this repository and has no bearing on the census results; we list it only so the cross-project context is clear. For details, see the newsletter page.

What are the licenses, and how do I cite this?

Code is MIT; data and docs are CC BY 4.0. Please cite this repository (see CITATION.cff) and the primary sources — Bendkowski–Grygiel–Lescanne–Zaionc (2017); David et al. (2013); Lescanne / OEIS A275057. More in Contributing and Experiments.

How do I contribute, report a correction, or get in touch?