Skip to content

Engine and API

Peter Lodri edited this page Jun 10, 2026 · 1 revision

Engine and API

A developer's tour of src/. This page explains the term representation, the bounded reduction-graph analyzer analyze() (including the SN / WN / ¬WN decision rules and the one genuinely tricky subtlety we had to fix), the de Bruijn census counter/sampler/classifier, and the seven BGK lab experiments. There are short, copy-pasteable examples of importing and calling the key functions.

Everything is pure Python 3.9+ standard library — no dependencies, no pip install. If you want the project at a glance first, start at Home; for the mathematics see The Research; for the numbers see Results and Findings; to run it yourself see Reproduce; for the data layout see Dataset; to add your own experiment see Contributing and Experiments.

Honesty note up front: this engine decides SN / WN on a bounded reduction graph. Inside the bound the answers are exact (proven by construction below). Outside the bound it answers None ("undecided"). It does not, and cannot, settle the open Barendregt–Geuvers–Klop conjecture — it gathers evidence about the mechanism. Where a claim is empirical we say so.

Files in src/

File Role
bgk_lab.py The core λ-calculus engine: term ops, β-reduction, the analyze() reduction-graph decider, plus the seven experiments (E1–E7).
debruijn_census.py The headline open-data contribution: a natural-size de Bruijn counter, exact enumerator, uniform sampler, and classifier that reuses analyze().
build_census_dataset.py Assembles the publishable dataset (per-size densities, Wilson CIs, smallest-separator witness) into JSON + CSV. See Dataset.
normalize_check.py A standalone, narrated validation harness (the C1–C4 checks) that demonstrates the conservation story by executable reduction, not assertion.

bgk_lab.py is the engine; debruijn_census.py imports from it. normalize_check.py is an earlier, self-contained sibling of the same machinery used for the human-readable walkthrough — note its analyze() carries the older size-cap behaviour (see the subtlety below), so treat bgk_lab.analyze as canonical.

The term representation

A λ-term is a small tagged tuple. Four constructors:

('v', name)        # variable, name is a string
('l', x, body)     # abstraction  λx.body
('a', f, x)        # application  (f x)
('p', T, U)        # Klop "memory pair" [T,U] — only used by the ι-translation / λI[,] calculus

bgk_lab.py exposes named constructors so call sites read like math:

from bgk_lab import V, L, A, show, free_vars, size

I     = L('x', V('x'))                    # λx.x
Delta = L('x', A(V('x'), V('x')))         # λx.(x x)
Omega = A(Delta, Delta)                   # (λx.xx)(λx.xx)  — the canonical non-terminating term
print(show(Omega))                        # λx.x x (λx.x x)
print(free_vars(Omega), size(Omega))      # set()  7

Notes:

  • free_vars, subst, db_key, etc. all handle the 'p' pair case so the extended λI[,] calculus shares one code path.
  • size(t) here is the structural size (variables, λ, app each contribute, by 1 + …). This is not the de Bruijn natural size used by the census — that lives in debruijn_census.dbsize (|index k| = k+1). Keep the two straight: size is the engine's internal node measure; natural size is the census's classification axis.
  • Capture-avoiding substitution (subst) renames a binder via fresh() only when the substituend's free variables would be captured, so α-equivalence is respected.

α-canonical keys (db_key)

To compare terms up to α-equivalence, the engine maps each term to a de Bruijn key:

('b', i)            # bound variable, i = de Bruijn index into the binder env
('f', name)         # free variable
('l', body_key)
('a', f_key, x_key)
('p', T_key, U_key)

Two α-equivalent terms produce the same db_key. This key is what the reduction graph is keyed on — so the graph is an α-quotient: distinct-but-α-equal reducts collapse to one node.

β-reduction

from bgk_lab import is_redex, contract, one_step

is_redex(('a', ('l', 'x', body), arg))   # True: a top-level (λx.body) arg
contract(redex)                            # β-contract the top redex: body[x := arg]
one_step(t)                                # list of ALL terms reachable by one β-step (any redex)

one_step does full (non-deterministic) β-reduction: it returns every one-step reduct, contracting the head redex if present and recursing into every sub-position. This is what lets analyze build the entire reachable graph rather than following a single strategy.

For strategy-specific walks there are also leftmost-outermost (lo_path / reduce_at / normal_order_len) and the perpetual strategy (perpetual_step), discussed under E4 below.

analyze() — the bounded reduction-graph decider

This is the heart of the engine.

from bgk_lab import analyze
a = analyze(term, node_cap=4000, size_cap=220)
# a = {'sn': True|False|None, 'wn': True|False|None,
#      'nodes': int, 'trunc': bool, 'nodes_map': {...}, 'succ': {...}, 'nf': [...]}

What it builds

graph(t, node_cap, size_cap) performs a breadth-first exploration from t's α-key, following one_step. It records a successor map succ (α-key → list of α-keys) and a node map (α-key → representative term). Two caps keep it finite:

  • size_cap — drop any reduct whose structural size exceeds the cap (and flag trunc).
  • node_cap — stop once the graph exceeds this many nodes (and flag trunc).

The graph is α-quotiented (keyed on db_key), so cycles in reduction become real cycles in the graph.

The decision rules (and why they are correct within the bound)

SN  ⟺  the reachable reduction graph is finite and acyclic
¬SN ⟺  the graph contains a β-cycle           (a cycle is a witness of an infinite reduction)
WN  ⟺  a TRUE normal form is reachable
¬WN ⟺  the graph is fully explored (not truncated) and contains no normal form

Concretely:

  • SN. has_cycle(succ) runs an iterative (stack-based, recursion-safe) DFS. If it finds a back-edge to a node on the current DFS stack, there is a cycle → sn = False. If there is no cycle and the graph was fully explored (not trunc), the reachable graph is a finite DAG, so every reduction sequence terminates → sn = True. If there's no cycle but we truncated, we cannot rule out divergence beyond the cap → sn = None.
  • WN. If any explored node is a true normal form, wn = True (we have witnessed a normalizing path). If there is no normal form and we fully explored, then no normal form exists → wn = False. If there is no normal form but we truncated → wn = None.

A None in either field means "the caps were hit before a decision could be forced." The census reports such terms as UND (undecided).

The size-cap normal-form subtlety (a real bug we fixed)

This is the one place where a naive implementation is silently wrong, so it deserves emphasis for contributors.

A graph node with no recorded successors looks like a normal form. But it might only look terminal because all of its reducts were dropped by size_cap. Such a node is not a normal form — it still has a redex; we just refused to follow it because the result was too big. Counting it as a normal form would falsely set wn = True, manufacturing phantom weakly-normalizing terms.

The fix in bgk_lab.analyze re-checks each apparently-terminal node:

true_nf = []
for k in succ:
    if not succ[k]:
        if one_step(nodes[k]):     # it HAD reducts — they were pruned by size_cap
            trunc = True           # so this node is not terminal; mark truncated
        else:
            true_nf.append(k)      # genuinely no redex → a real normal form
wn = True if true_nf else (None if trunc else False)

So WN is decided only from true normal forms (one_step returns []), and a node whose reducts were merely too large flips the result to truncated/None rather than to a false True. The earlier analyze in normalize_check.py predates this fix and treats any successor-less node as an NF — fine for its curated, hand-checked examples, but not safe for blind enumeration. The census uses the corrected bgk_lab.analyze.

Worked examples

from bgk_lab import V, L, A, analyze

Delta = L('x', A(V('x'), V('x'))); Omega = A(Delta, Delta)
analyze(Omega)['sn'], analyze(Omega)['wn']      # (False, False) — Ω→Ω is a 1-cycle: no NF, not SN

KIO = A(L('x', V('y')), Omega)                  # (λx.y)Ω  — erasing fn applied to Ω
r = analyze(KIO)
r['wn'], r['sn']                                # (True, False) — the canonical SEPARATOR

(λx.y)Ω is weakly normalizing (leftmost-outermost erases the diverging argument and reaches y) but not strongly normalizing (you can also reduce inside Ω forever). That WN ∧ ¬SN combination is what we call a separator, and erasure (a bound variable not occurring in the body) is exactly what manufactures one. See The Research for why this is the crux of the conjecture.

debruijn_census.py — counter, enumerator, sampler, classifier

The census classifies closed untyped λ-terms by natural size in the de Bruijn model:

|index k| = k + 1        (0-based de Bruijn index)
|λ M|     = 1 + |M|
|M N|     = 1 + |M| + |N|

Internal de Bruijn term repr (distinct from the engine's named repr): ('i', k) | ('l', M) | ('a', M, N).

Counting — L(n, m)

L(n, m) is the number of de Bruijn terms of natural size n valid in a context of m binders (indices < m), via the obvious recurrence (memoized with functools.lru_cache). Closed terms of size n are L(n, 0):

from debruijn_census import closed_count
[closed_count(n) for n in range(1, 14)]
# [0, 1, 1, 3, 6, 17, 41, 116, 313, 895, 2550, 7450, 21881]

These counts are exactly OEIS A275057 (Lescanne 2016), validated term-for-term by the selftest. (Note: closed_count(1) = 0 — there is no closed term of natural size 1, since the smallest index 0 already has size 1 but is not closed.)

Exact enumeration — enum(n, m)

from debruijn_census import enum
terms13 = enum(13, 0)          # every closed de Bruijn term of natural size 13
len(terms13)                   # 21881  (== closed_count(13))

len(enum(n, 0)) == closed_count(n) is asserted in the selftest for n ≤ 12.

Uniform sampling — sample(n, m, rng)

For sizes too large to enumerate, sample draws a term uniformly at random among all size-n terms in context m, top-down by the same counts (L) used for counting — a standard rank/unrank construction. sample_idx is the deterministic enumeration-rank companion kept consistent with sample.

import random
from debruijn_census import sample, dbsize
rng = random.Random(12345)
t = sample(40, 0, rng)
dbsize(t)                      # 40 — every sample has exactly the requested natural size

Classification — classify(dbterm, node_cap=3000, size_cap=200)

Bridges to the engine: to_named converts the de Bruijn term to the engine's named repr (using a binder stack so de Bruijn 0 is the innermost binder), then calls bgk_lab.analyze, then maps (sn, wn) to one of four labels:

from debruijn_census import classify, enum
classify(enum(9, 0)[0])        # e.g. 'SN'
Label Meaning Decided from
SN strongly normalizing sn is True
SEP separator = WN ∧ ¬SN sn is False and wn is True
NWN no normal form (not weakly normalizing) sn is False and wn is False
UND undecided within engine caps otherwise (a None)

The run and the selftest

python3 src/debruijn_census.py selftest    # counts vs A275057, sampler size checks, n≤12 classification preview
python3 src/debruijn_census.py census      # full run → JSON on stdout (uses all cores)

run_census(ex_max, mc_plan, K, seed) enumerates exhaustively for n up to ex_max (default 16) and Monte-Carlo samples (default K = 50000) for the sizes in mc_plan (default up to n = 55); it uses a multiprocessing.Pool across all cores. The plan stops at 55 because the bounded analysis becomes intractable around n = 70 — the decidability horizon. The run also records the smallest separator witness. See Results and Findings and Reproduce.

Implementation detail for contributors: debruijn_census.py imports bgk_lab via sys.path.insert(0, '/tmp/bgk') / '/tmp'. If you run from a checkout, make sure bgk_lab.py is importable (run from src/, set PYTHONPATH=src, or adjust the path insert). See Reproduce.

bgk_lab.py — the seven experiments (E1–E7)

python3 src/bgk_lab.py [seconds_per_idea] runs all seven, each wall-clock-budgeted (default 540s per experiment), and emits one JSON document {meta, E1..E7} on stdout (also incrementally persisted to /tmp/bgk_results.json). Most experiments fan out over a multiprocessing.Pool. The headline run was on a 16-core box.

# Function What it checks Headline outcome
E1 E1_conservation Exhaustively enumerate small λI terms (no erasing abstractions) and classify each. Conservation predicts no separators in λI. 0 separators across ~3.13M λI terms.
E2 E2_separators Exhaustive λK enumeration by size; record separator onset and density. Separator onset at size 12 (in this experiment's own ad-hoc named enumeration — distinct from the census's closed-de-Bruijn natural size 13).
E3 E3_geometry For SN terms, measure reduction geometry: longest vs. shortest path to NF (de Vrijer-style L(·)), max term blow-up, graph size. Scatter + per-size stats for the report.
E4 E4_perpetual Test the law "perpetual strategy diverges ⟺ ¬SN" over enumerated terms. 0 violations / ~762k checks.
E5 E5_transfer Apply Klop's ι-translation and test SN(M) ⟺ SN(ι(M)). 0 violations / ~711k checks.
E6 E6_speedup Parallel-speedup benchmark of the analyze workload across core counts. Speedup/efficiency table.
E7 E7_zoo Build small reduction graphs for a curated zoo (I, K, S, Ω, the separator (λx.y)Ω, S K K, …) for the interactive report. Node/edge graphs per term.

The exact counts above (3.13M, 762k, 711k; onset sizes) are empirical results of a particular run with the given caps and budget — see Results and Findings. Reproductions will match the qualitative findings (zero violations, the onset sizes); the totals depend on how far each budget gets.

Supporting machinery used by the experiments

  • Perpetual strategy. perpetual_step implements a maximal/perpetual one-step strategy: at an outermost erasing redex (λx.A)B with x ∉ FV(A) and B not yet normal, it steps into the doomed argument B instead of discarding it — exposing the infinite path that normal-order reduction hides. perpetual_diverges(t, bound) reports whether that strategy fails to normalize. E4 checks it agrees with analyze's SN verdict on every term.
  • ι-translation. iota(t) transports an erasing λK term into the non-erasing λI[,] calculus: an erasing λx.body becomes λx.[ι(body), x] — the memory pair ('p', …) "parks" the otherwise-discarded x, making the abstraction a legal λI term. is_lambda_I(t) checks the λI property; one_step_ext / analyze_ext add the π-rule [T,U] T' → [T T', U] for reducing the extended calculus. E5 checks SN is preserved across the translation.
  • λI enumeration. gen_by_size(n, alphabet, lambda_I) (built on gen_terms) enumerates named terms of structural size n, optionally restricted to λI (every binder's variable must occur in its body). E1/E2 drive this.
  • Graph metrics. longest_path (DP over the DAG; only meaningful for SN terms), shortest_to_nf (BFS), and the nodes_map / succ returned by analyze feed E3 and E7.

Calling a single experiment in isolation

from multiprocessing import Pool, cpu_count
import bgk_lab
with Pool(cpu_count()) as pool:
    res = bgk_lab.E1_conservation(budget=60.0, pool=pool, alpha=['y'])
print(res['total_terms'], res['total_separators'])   # expect (..., 0)

normalize_check.py — the narrated walkthrough

A standalone script (python3 src/normalize_check.py) that prints the conservation story step by step from executable reductions: C1/C2 show (λx.y)Ω is the canonical WN-but-not-SN separator and Ω is a 1-cycle; C3 shows the perpetual strategy walking into the erased argument; C4 shows λI is separator-free (exhaustive, consistent with Conservation) while a curated λK family produces separators. It is a good first read for contributors who want the intuition before diving into analyze. (Its in-file analyze is the pre-fix version — see the subtlety above; use bgk_lab.analyze for any quantitative work.)

Where to go next


Code is MIT; data and docs are CC BY 4.0. Contact: cabotage@pm.me. Newsletter & cross-project info: crabcc.app/about. Support: github.com/sponsors/peterlodri-sec.