Reproduce

ary# Reproduce

Everything in this project is pure Python 3.9+ standard library — no pip install, no virtualenv, no compiled extensions, no network access. If you have a working python3, you can reproduce every number in the Dataset and every experiment in the BGK lab from scratch.

This page is the exact, step-by-step recipe. It covers:

Prerequisites
Self-test — validate the engine against OEIS A275057 (seconds)
The census run — classify closed λ-terms by natural size
Build the dataset — JSON + CSV with Wilson CIs and the separator witness
The BGK conservation lab — the 7 companion experiments
View the report — the self-contained offline page
Notes: parallelism and the n ≥ 70 horizon

Honesty note up front: the underlying Barendregt–Geuvers–Klop (BGK) conjecture is open (does weak normalization imply strong normalization in pure type systems? — TLCA problem #9, open since the 1990s). Nothing you run here resolves it; the experiments corroborate the mechanism (erasure as the obstruction to WN ⟹ SN). What you reproduce is computed data, with the undecided remainder stated explicitly. See The Research for what is proven vs. empirical.

0. Prerequisites

Python 3.9 or newer. Check with python3 --version.
Nothing else. No dependencies. The interactive report's libraries (ECharts, Cytoscape, KaTeX) are vendored under report/vendor/ and load offline.

Clone the repo:

git clone https://github.com/peterlodri-sec/lambda-normalization-census
cd lambda-normalization-census

All commands below are run from the repo root.

1. Self-test (validate the engine against OEIS)

Always start here. The self-test is fast (seconds) and proves the counter, enumerator, and sampler are correct before you trust any classification.

python3 src/debruijn_census.py selftest

It performs three checks:

Closed-term counts. Prints L(n,0) for n = 1..20 and asserts that the exhaustive enumerator's term count equals the recurrence's count for n ≤ 12. These counts are exactly OEIS A275057 (Lescanne 2016, natural-size de Bruijn): 0,1,1,3,6,17,41,116,313,895,2550,7450,21881,… — validated term-for-term.
Sampler sanity. Draws 3000 samples at each of n ∈ {10,14,18,30} and confirms every sample has the requested natural size and is closed.
Classification preview. Tabulates SN / SEP / NWN / UND counts for n = 2..12.

If the asserts pass and the count line matches A275057, the engine is sound. (The natural-size convention: |index k| = k+1, |λM| = 1+|M|, |MN| = 1+|M|+|N|. The size convention is load-bearing — it controls the asymptotics; see The Research.)

The four normalization classes used everywhere:

Class	Meaning
SN	strongly normalizing (every reduction path terminates)
SEP	separator = WN ∧ ¬SN (weakly normalizing but not strongly — the WN∖SN gap)
NWN	not weakly normalizing (no normal form at all)
UND	undecided within the engine's bounded caps (`node_cap=3000`, `size_cap=200`)

A second, independent micro-validation harness reduces a curated family of λ-terms by executable β-reduction (no asserted claims):

python3 src/normalize_check.py

It demonstrates the canonical separator (λx.y)Ω is WN but not SN, that SN ⟺ the α-quotiented reduction graph is finite and acyclic, the perpetual strategy walking into an erasable diverging argument, and conservation on small λI terms. See Engine and API for the internals.

2. The census run

The census classifies closed untyped λ-terms (de Bruijn, natural size) into SN / SEP / NWN / UND, then writes one JSON document to stdout:

python3 src/debruijn_census.py census > my_census.json

The default run (defined at the bottom of src/debruijn_census.py) does:

Exact, exhaustive enumeration for n = 2..16 (ex_max=16) — every closed term of each size is classified.
Uniform Monte-Carlo sampling for n ∈ {17,18,20,22,25,28,32,38,45,55} with K = 50000 samples per size (rank-unranking gives genuinely uniform draws over closed terms of exact natural size n).

Progress is streamed to stderr (so it doesn't pollute the JSON on stdout), one line per size, e.g.:

[ex n=13] tot=  21881 SN=21803 SEP=3 NWN=51 UND=24 (…s)
[mc n= 55] K=50000 SN=44739 SEP=197 NWN=224 UND=4840 (…s)

What the run measures (the headline findings, all reproducible):

Smallest closed non-WN term = Ω at natural size 9.
Smallest closed separator = natural size 13: (λa.λb.b) Ω, i.e. λλ0 (λ(0 0) λ(0 0)) — an erasing function applied to Ω. Apparently undocumented in the combinatorics-of-λ literature.
Decided-SN stays ≥ 99% through n=55, while the undecided fraction climbs (0 → ~10% at n=55) and separator density rises (~0.014% of decided at n=13 → ~0.4% at n=55).

This is consistent with Bendkowski–Grygiel–Lescanne–Zaionc (2017), who prove SN-density → 0 asymptotically in this natural-size model (Cor. 4): our high decided-SN reflects a decidability horizon — the non-SN mass hides in the growing undecided tail. It is the opposite of David et al. (2013), whose convention (variables cost 0) gives SN-density → 1. See Results and Findings.

Tuning the run

The plan lives at the bottom of src/debruijn_census.py in the run_census(...) call. To push further or run faster, edit those arguments:

ex_max=16 — the largest size enumerated exactly. Raising it increases cost super-exponentially (A275057 grows fast).
mc_plan=[…] — the list of Monte-Carlo sizes.
K=50000 — samples per Monte-Carlo size. Larger K → tighter Wilson confidence intervals.

The engine caps (node_cap=3000, size_cap=200) bound each term's reduction-graph analysis; raising them shrinks the undecided fraction at the cost of time. Improving the deciders / caps is the single biggest open lever for contributors (see Contributing and Experiments).

3. Build the dataset (JSON + CSV)

src/build_census_dataset.py assembles the publishable Dataset: per-size densities, Wilson 95% confidence intervals on the Monte-Carlo rows, the closed-term counts (A275057), and the freshly recomputed smallest-separator witness.

Important — staged paths. The builder is wired to read/write under /tmp/bgk: it does sys.path.insert(0, '/tmp/bgk') to import debruijn_census, optionally reads /tmp/bgk/census_results.json (the full census run output, which supplies the n=45..90 tail when present), and writes /tmp/bgk/census_dataset.json and /tmp/bgk/census_dataset.csv. If that staged run JSON is absent it falls back to a set of in-hand rows baked into the script (n=2..55). So stage the files first:

mkdir -p /tmp/bgk
cp src/debruijn_census.py /tmp/bgk/
cp my_census.json /tmp/bgk/census_results.json     # optional: use YOUR fresh run for the tail
python3 src/build_census_dataset.py

It prints the smallest separator, the row count and n range, then writes the two files. Copy them back into the repo's data/ to update the published dataset:

cp /tmp/bgk/census_dataset.json /tmp/bgk/census_dataset.csv data/

CSV columns: n, method, total, SN, SEP, NWN, UND, decided, dens_SN_of_total, dens_SN_of_decided, dens_SEP_of_decided, frac_undecided. method is exact (full enumeration, n ≤ 16) or montecarlo (K=50000 uniform samples). The JSON additionally carries Wilson 95% CIs (SN_of_decided_ci95, SEP_of_decided_ci95) on the Monte-Carlo rows, the A275057 counts, and the witness. Full column spec on the Dataset page.

Per the contributing ground rules: don't hand-edit the CSV/JSON — always regenerate them through this builder.

4. The BGK conservation lab

The companion lab (src/bgk_lab.py) runs 7 experiments around weak vs. strong normalization. Its single argument is the wall-clock budget per experiment, in seconds, and it emits one JSON document on stdout (and incrementally persists to /tmp/bgk_results.json after each experiment).

# full run, 240s per experiment
python3 src/bgk_lab.py 240 > bgk_results.json

# quick smoke test, tiny budget (good for a first check)
python3 src/bgk_lab.py 8

The seven experiments (E1–E7):

	Experiment	What it checks (reproduced results)
E1	λI conservation	exhaustive λI enumeration: 0 separators / 3.13 M terms (WN ⟺ SN holds)
E2	λK separator onset	smallest λK separator at size 12 (this lab's own ad-hoc enumeration — distinct from the census's closed-de-Bruijn size 13)
E3	Reduction geometry	longest vs. shortest reductions, de Vrijer's L(·)
E4	Perpetual-strategy law	`diverges ⟺ ¬SN`: 0 violations / 762 k
E5	ι-translation transfer	Klop's ι-translation: `SN(M) ⟺ SN(ι(M))`: 0 violations / 711 k
E6	Parallel speedup	multiprocessing speedup benchmark
E7	Reduction-graph zoo	small reduction graphs for the report viz

E1–E5 run inside a process Pool (all cores); E6/E7 run separately (E7 is capped at min(budget, 60) seconds). Because the lab is wall-clock-budgeted, exact counts vary with the budget and core count — give it more seconds for more coverage. The published data/bgk_lab_results.json is a full run on a 16-core box.

Why this matters: conservation (Church 1941; Barendregt Thm 9.1.5) says WN ⟺ SN in the erasure-free λI-calculus, and Klop's ι-translation transports erasing λK terms into the non-erasing λI₍,₎ calculus (a memory pair, the π-rule). E1/E5 are the executable corroboration of exactly that mechanism. See The Research for the theory.

5. View the interactive report

The report is a single self-contained, offline, dark-theme page. No server needed — just open the file:

open report/index.html         # macOS
xdg-open report/index.html     # Linux
# or just double-click report/index.html in a file browser

It vendors ECharts, Cytoscape, and KaTeX under report/vendor/ (see report/vendor/THIRD_PARTY.md for their licenses), so it works fully offline. It shows the conservation experiments, the separator-density curve, the reduction-graph zoo, and a crabcc × Linux-kernel bonus panel.

To regenerate the report's data, the page reads report/data.js (shape: window.BGK_DATA = {...}). Produce fresh dataset + lab outputs via the steps above, then update data.js to match that shape. The plotting/UX code is plain HTML/JS/CSS in report/{index.html,app.js,style.css} — design PRs welcome (see Contributing and Experiments).

6. Notes: parallelism and the n ≥ 70 horizon

Embarrassingly parallel. Both the census and the lab use Python multiprocessing and a Pool(cpu_count()). Classification of distinct terms is independent — each term's bounded reduction-graph analysis is a self-contained unit of work — so it scales near-linearly with core count. The published runs used a 16-core box, but the code adapts to any core count automatically (no flags to set). More cores → proportionally faster; results are identical regardless of core count (Monte-Carlo draws use a seeded RNG).

The n ≥ 70 decidability horizon. The Monte-Carlo plan stops at n=55 on purpose. As terms grow, the share that the bounded engine can decide shrinks (the undecided tail grows), and the per-term reduction-graph analysis itself becomes intractable: the run became impractical around n=70. This is the same "decidability horizon" that explains why decided-SN stays high even though true SN-density → 0 asymptotically (Bendkowski–Grygiel–Lescanne–Zaionc 2017) — the non-SN mass is hiding in the part the engine can't yet decide. Pushing past it cleanly requires stronger SN/WN deciders, not just more compute; that is the headline open lever for contributors. See Results and Findings and FAQ.

Quick reference

# 1. validate the engine (seconds)
python3 src/debruijn_census.py selftest
python3 src/normalize_check.py

# 2. run the census  -> JSON on stdout, progress on stderr
python3 src/debruijn_census.py census > my_census.json

# 3. build the dataset (staged under /tmp/bgk)
mkdir -p /tmp/bgk
cp src/debruijn_census.py /tmp/bgk/
cp my_census.json /tmp/bgk/census_results.json    # optional: your fresh tail
python3 src/build_census_dataset.py
cp /tmp/bgk/census_dataset.{json,csv} data/

# 4. run the BGK lab (arg = seconds/experiment)
python3 src/bgk_lab.py 240 > bgk_results.json

# 5. view the report (offline)
open report/index.html

Reproduced something? Found a result we call "unrecorded" already in the literature, or built a stronger decider that shrinks the undecided tail? Please open an issue or PR — see Contributing and Experiments. Code is MIT, data and docs are CC BY 4.0. Contact: cabotage@pm.me · newsletter and cross-project info at crabcc.app/about · sponsor. Back to Home.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce

0. Prerequisites

1. Self-test (validate the engine against OEIS)

2. The census run

Tuning the run

3. Build the dataset (JSON + CSV)

4. The BGK conservation lab

5. View the interactive report

6. Notes: parallelism and the n ≥ 70 horizon

Quick reference

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally