-
Notifications
You must be signed in to change notification settings - Fork 0
The Research
This page explains, from the ground up, what this project is about. You do not need any background in type theory or the λ-calculus — just a little basic computer science (you know what a function and an infinite loop are). We'll build up the ideas gently, keep them correct, and be honest about what is proven, what is measured, and what is still open.
If you want the numbers and plots, jump to Results and Findings. If you want to run it yourself, see Reproduce. For the data files, see Dataset; for the code, see Engine and API. New here? Start at Home.
The λ-calculus is the smallest interesting programming language. It has exactly three kinds of expression (called terms):
-
A variable — a name, like
x. -
A function —
λx. Mmeans "a function that takes an argumentxand returns the bodyM." (Theλis just the way we write "function of".) -
An application —
M Nmeans "call the functionMwith the argumentN."
That's the whole language. No numbers, no if, no loops — and yet it is Turing-complete: anything a computer can compute, you can express here. It is the theoretical ancestor of every functional programming language (Lisp, Haskell, the lambdas in Python/JavaScript).
Running a program means simplifying it by a single rule called β-reduction ("beta reduction"). When a function meets an argument, you substitute:
(λx. M) N→Mwith everyxreplaced byN.
That's it. You keep applying this rule wherever you can. A term you can no longer simplify is in normal form — that's the "answer."
Here's the subtle part, and it's the heart of everything. In a real program there's often more than one place you could take a step. The order can matter. This gives two different notions of "this program terminates":
-
Weakly Normalizing (WN): there exists at least one way of choosing steps that reaches a normal form. "There is a path to an answer." Think of it as: a clever scheduler could finish this program.
-
Strongly Normalizing (SN): every possible way of choosing steps eventually reaches a normal form. "All paths lead to an answer, you cannot get stuck looping no matter what you do." Think of it as: this program finishes no matter how you run it.
Clearly, SN is the stronger property: if every path terminates, then certainly some path does. So SN ⟹ WN always. In set terms, SN ⊆ WN.
The interesting question is the reverse: if a program can finish (WN), must it always finish (SN)? In general, no — and the reason why is the whole story.
Consider the famous looping term:
Read it as: "a function that calls its argument on itself," applied to a copy of itself. Take one β-step and you get...
Now consider this term, where we apply a constant function to
The function λx. y ignores its argument entirely — it always returns y, throwing x away. So this term has two very different behaviors depending on the order you reduce in:
-
One scheduler says: "fire the outer call first." The function discards
$\Omega$ and returnsyin a single step. Done. So the term is WN — a path to an answer exists. -
Another scheduler says: "let me simplify the argument
$\Omega$ first." It reduces$\Omega \to \Omega \to \Omega \to \cdots$ forever and never gets to fire the outer call. So the term is not SN — some path runs forever.
This is a separator: a term that is WN but not SN. It separates the two notions. And the mechanism that makes it possible is erasure — the function λx. y erases its argument. Erasure lets a non-terminating subterm get "parked" in a position that will be thrown away. A patient scheduler discards it (finishes); an eager scheduler dives in (loops forever).
The slogan of this whole field: erasure is the only way a reduction step can lose an infinite computation. Take away the ability to discard arguments, and the WN/SN distinction collapses.
What happens if we forbid erasure? That is the λI-calculus ("lambda-I"): the sub-language where every function λx. M is required to actually use its argument (x must appear in the body M). No function is allowed to throw its input away.
In this no-erasure world, something beautiful happens — the Conservation Theorem (originally Church, 1941; it appears as Theorem 9.1.5 in Barendregt's standard textbook):
In the λI-calculus, WN ⟺ SN. If a no-erasure term can finish, then it must finish, on every path.
The intuition is exactly the dual of §3: if you can never discard a subterm, then a looping subterm can never be hidden away and quietly dropped. Wherever a divergence lives, every scheduler must eventually confront it, so "some path finishes" forces "all paths finish." The deep machinery behind this (Bergstra–Klop's perpetual reductions, the Finite Developments theorem) makes the slogan precise: a non-erasing step can neither create nor destroy an infinite reduction.
There's even a translation, Klop's ι-translation, that takes an erasing term and rewrites it into a non-erasing one by remembering whatever would have been erased (it stashes the discarded argument in a little "memory pair" [T, U] instead of throwing it out). This transports the well-behaved λI world's guarantees back toward the general calculus, and is the technical engine behind many of the known partial results. We test it empirically in the BGK Lab.
So in the untyped world the situation is completely understood: erasure is exactly the obstruction, and removing it makes WN and SN coincide.
So far so good for untyped programs. But most languages have types — disciplines that rule out nonsensical programs ("you can't add a string to a function"). The richest, most general framework for studying type systems uniformly is Pure Type Systems (PTS) — a single template that, with different settings, gives you simply-typed λ-calculus, System F, the Calculus of Constructions (the basis of proof assistants like Coq/Lean), and a whole infinite family of others.
The Barendregt–Geuvers–Klop (BGK) conjecture asks the §2 question for every type system at once:
Conjecture (BGK). In every pure type system, weak normalization implies strong normalization. That is: there is no typed setting in which some program can finish but might loop forever.
In plain terms: the conjecture says that the type discipline always closes the erasure loophole. Even though the function λx. y can erase its argument, a good type system should make it impossible to type-check a program that exploits erasure to be WN-but-not-SN. Every typable program that can finish, must finish — on every path.
This conjecture has been open since the 1990s. It is problem #9 on the TLCA List of Open Problems, and listed among the unsolved problems in computer science. It is proven for several restricted classes (Sørensen 1997; Barthe–Hatcliff–Sørensen 2001; Roux–van Doorn 2014; Mull 2022/23), but no general proof and no counterexample is known.
Why it's hard, in one sentence: the known proofs work by translating the type system into a non-erasing (λI-style) world where Conservation (§4) gives you SN for free — but this translation must preserve types, and that breaks down in the presence of dependent types (where types can mention the very values that are still being reduced), making the translation ill-defined. Closing that gap is the live obstacle. (A full technical account, with citations, lives in the deep-dive doc.)
Nothing in this project resolves the conjecture. It is open, and we don't claim otherwise. What we do is corroborate the mechanism — that erasure is the obstruction — with exhaustive, reproducible computation, and contribute new open data along the way.
The conjecture is about typed programs, and it's a yes/no question. We turn to a neighboring, quantitative question in the untyped world that we can compute, and that nobody seems to have measured before:
Among all closed λ-terms of a given size, how common are separators — terms that are WN but not SN?
To ask this precisely you need a notion of "size." We use the standard natural size in de Bruijn notation (a way of writing terms without variable names, using numeric indices). Every piece of syntax costs 1, and a de Bruijn index k costs k+1:
A census then means: for each size n, enumerate all closed terms of that size and sort each one into a bucket:
| Class | Meaning |
|---|---|
| SN | strongly normalizing — finishes on every path |
| SEP | a separator: WN but not SN (the "WN∖SN gap") |
| NWN | not even weakly normalizing — no normal form at all (e.g. Ω) |
| UND | undecided — our bounded analyzer couldn't settle it within its limits |
The headline facts the census establishes (full numbers in Results and Findings):
-
Our closed-term counts are exactly OEIS A275057 (Lescanne, 2016):
0, 1, 1, 3, 6, 17, 41, 116, 313, 895, 2550, 7450, 21881, …, validated term-for-term. This proves our model is the standard one, not a private variant. - The smallest closed non-WN term is Ω, at natural size 9.
-
The smallest closed separator appears at natural size 13: it is
$(\lambda a.\lambda b., b),\Omega$ — an erasing two-argument function applied to Ω, the §3 villain in its smallest closed form. This constant appears to be undocumented in the literature.
Why size 13? Because 13 = 3 (smallest closed erasing function) + 9 (smallest looping term, Ω) + 1 (the application). The arithmetic of the smallest separator is literally "an eraser, plus a loop, glued together."
How does a program tell whether a term is SN, WN, or neither, when the underlying question (does this loop?) is in general undecidable? The trick is to build a bounded reduction graph:
- Each node is a term (stored in a name-free, de Bruijn-keyed canonical form so that terms equal "up to renaming" are recognized as the same node).
- Each edge is one β-step. We explore outward up to safety limits (a cap on number of nodes and on term size).
Then we read off the answer from the shape of the graph:
- SN ⟺ the graph is finite and acyclic — there's nowhere to loop, so every path terminates.
- ¬SN ⟺ there's a β-cycle — a term that reduces back to itself, a guaranteed infinite path.
- WN ⟺ some true normal form is reachable — a path to an answer exists.
- UND ⟺ we hit a cap before settling the question.
A real bug we caught (and fixed): a node whose every reduct was dropped because it exceeded the size cap is not a normal form — it still has reducts, we just refused to expand them. An early version mistook such nodes for normal forms and falsely reported terms as WN. The fix: only count a node as a normal form if it genuinely has no β-redex. This kind of subtlety is exactly why every quantitative claim here is computed and self-tested, not asserted. Details in Engine and API.
For sizes up to n=16 we enumerate every closed term exactly. Beyond that the population explodes, so for n in 17..55 we draw uniform random samples (50,000 per size) using a rank/unrank sampler over the counting recurrence — uniform sampling is what makes the resulting densities meaningful. (Past n≈70 even per-term classification becomes intractable — an accidental but vivid demonstration of the "decidability horizon" described next.)
Across the whole range, the decided-SN fraction stays above 99% — almost every term we can decide is strongly normalizing. Meanwhile, as size grows, both the separator density rises (≈0.014% of decided terms at n=13, climbing toward ≈0.4% by n=55) and the undecided fraction climbs (from 0 toward ~10% at n=55).
There is an apparent paradox here, and we resolve it honestly:
- Bendkowski–Grygiel–Lescanne–Zaionc (2017) prove that in this exact natural-size model, the SN-density goes to 0 asymptotically — almost every large term is not SN.
- Yet our decided-SN fraction stays ≥ 99%.
These are reconciled by the decidability horizon: SN-ness is undecidable, and our bounded engine can only settle the easy terms, which are overwhelmingly SN at these sizes. The non-SN mass that the theorem promises hides in the growing undecided tail — exactly the UND fraction we watch climb. Our data is therefore consistent with SN-density → 0; it simply can't witness the collapse directly. It maps the frontier of decidability and quantifies where the hard cases concentrate.
The size convention controls the answer. Tellingly, David et al. (2013) prove the opposite — SN-density → 1 — but in a different model where variables cost nothing. When variables are free, they can float arbitrarily far from their binders, which forces benign (SN) shapes; in our natural model deep indices are expensive, so variables localize and loop-laden subterms like Ω come to dominate. The asymptotic density of WN itself — and hence the size of the WN∖SN gap in the limit — is not computed in either paper. It remains open. Our census contributes the first exact small-
nmeasurements of that gap.
Because this is machine-assisted research, we are explicit about the epistemic status of every claim:
- Proven (cited theorems, not our work): Church's Conservation Theorem (λI: WN ⟺ SN); Bergstra–Klop perpetual reductions; the OEIS A275057 counting sequence; SN-density → 0 in the natural model (BGKZ 2017); SN-density → 1 in the variables-free model (David et al. 2013); the partial BGK results (Sørensen, BHS, Roux–van Doorn, Mull).
- Empirical (our computed contributions): the census buckets and densities; the smallest closed separator at natural size 13; the separator-density curve; the decidability-horizon observation; the BGK Lab experiments (e.g. λI exhaustively yields 0 separators across 3.13M terms, corroborating Conservation; the perpetual-strategy law holds with 0 violations / 762k terms; the ι-translation preserves SN with 0 violations / 711k terms).
- Open (genuinely unresolved): the BGK conjecture itself; the asymptotic density of WN (and the limiting WN∖SN gap) in the natural model; whether the separator-density curve has a closed form.
Every quantitative figure here is computed and self-checked (the engine's selftest validates counts against A275057). We distinguish what is proven from what we measured, and we always state the undecided remainder.
- Results and Findings — the full tables, densities, the smallest-separator witness, the curve.
- Engine and API — how the term representation and bounded reduction-graph analyzer work, with the deep-dive citations.
-
Dataset — the open data files (
census_dataset.{json,csv}, BGK lab results) and their schema. - Reproduce — run the census and lab yourself (pure Python 3.9+, no dependencies).
- Contributing and Experiments — the seven BGK Lab experiments and how to add your own.
- FAQ — quick answers, and Home for the overview.
This is open research. Code is MIT; data and docs are CC BY 4.0. Questions, corrections, and contributions are warmly welcome — see Contributing and Experiments or reach out at cabotage@pm.me. Cross-project updates: crabcc.app/about. If this is useful to you, sponsorship supports continued open compute and data.