-
Notifications
You must be signed in to change notification settings - Fork 0
Contributing and Experiments
This is a small, dependency-free research repo — pure Python 3.9+, standard library only, no build step — so it is easy to hack on. Contributions of code, data, experiments, and corrections are all welcome.
Two ways in:
- Open a PR / issue — for anyone. Push the census higher, add a size model, sharpen the deciders, fix the report, or correct a citation.
- Sponsor the $25 "Contributor" tier — submit one experiment per month and we run it on the cluster and publish the open data with credit to you.
If you are new here, read The Research for the BGK conjecture context, Results and Findings for what we have measured, and Engine and API plus Reproduce for the code.
This project lives or dies on being honest about what is proven versus what is measured. Every contribution — yours or ours — must clear four rules. These are the same rules stated in the repo's CONTRIBUTING.md, restated here because they are the whole point.
-
Computed, not asserted. Every quantitative claim must be reproducible from code in
src/. Include the exact command that produces the number. No hand-typed counts, no "approximately", no rounding that you cannot regenerate. -
State the undecided remainder. Never report a density without its decided / undecided split, and — for any Monte-Carlo number — a confidence interval. Our engine has bounded caps (
node_cap=3000,size_cap=200), so some terms come back UND (undecided); that fraction is data, not an embarrassment to hide. See Results and Findings on the "decidability horizon." - Cite primary sources for any theorem, and clearly separate proven (cited) from empirical (your measurement). The BGK conjecture is open (TLCA problem #9, open since the 1990s). Nothing in this repo resolves it; the experiments corroborate the mechanism (erasure as the obstruction to WN ⟹ SN). Keep that line crisp.
- Don't conflate models. Always say which size convention a number is in. The asymptotics flip depending on it: Bendkowski–Grygiel–Lescanne–Zaionc (2017) prove SN-density → 0 in the natural-size de Bruijn model we use; David et al. (2013) prove SN-density → 1 in a model where variables cost 0. A density with no stated model is meaningless here.
If a number cannot meet all four, it does not go in the Dataset.
We especially want these three, because each one extends the science rather than polishing it:
Add other size conventions so densities can be compared across models in one place — e.g. the variables-cost-0 convention of David et al. (2013), or binary-λ-calculus / Tromp size. The headline tension in this field is that the size convention controls the asymptotic answer, and having multiple models side-by-side, computed by the same engine, is exactly the comparison the literature does not currently lay out together. A new model means: a counter (analogous to L(n,m) in debruijn_census.py), an enumerator, a uniform sampler, and a selftest that pins the counts to a known reference sequence (ours matches OEIS A275057 term-for-term).
This is the single biggest open lever. The undecided fraction climbs with size (0 at small n → roughly 10% by n=55), and every term we cannot decide is non-SN mass we cannot see directly. A stronger SN/WN decider — perpetual-reduction bounds, type-based shortcuts, smarter cycle detection, or just larger caps run on more cores — directly improves the dataset and tightens the WN∖SN-gap measurements. Watch the load-bearing subtlety in analyze: a node whose reducts were all dropped by size_cap is not a normal form, and counting it as one falsely flags WN. That was a real bug we fixed; any new decider must respect it.
The exact (exhaustive) enumeration currently goes to n ≤ 16; beyond that we switch to uniform Monte-Carlo (K=50000) up to n=55, and the MC run itself became intractable around n=70 (the decidability horizon). Pushing the exact boundary even one or two sizes higher removes sampling error from those rows entirely. The census is embarrassingly parallel (multiprocessing) — ours ran on a 16-core box but scales to any core count.
Also welcome:
-
New experiments / theory checks — anything that computes a normalization-theoretic quantity and ships the data (the BGK lab in
src/bgk_lab.pyis the template: λI exhaustive, λK separator onset, perpetual-strategy law, Klop's ι-translation transfer, reduction geometry). -
Report and viz —
report/is plain HTML/JS/CSS with vendored ECharts / Cytoscape / KaTeX; UX and design PRs welcome. -
Citations and corrections — if a result we call "apparently undocumented" (e.g. the size-13 smallest closed separator
(λa.λb.b) Ω) is in fact published, open an issue with the reference and we will correct it gladly.
No dependencies, Python 3.9+. Clone, then run the self-test before you touch anything.
python3 src/debruijn_census.py selftest # MUST pass: counts match OEIS A275057, sampler exact
python3 src/bgk_lab.py 8 # quick smoke (tiny per-experiment budget, in seconds)Never hand-edit the data files. New census rows are regenerated through the build script, which recomputes the smallest-separator witness locally and attaches Wilson 95% confidence intervals to the Monte-Carlo rows:
python3 src/build_census_dataset.py # writes census_dataset.{json,csv}See Reproduce for the full command list and Engine and API for the term representation and the analyze / classify contract.
-
python3 src/debruijn_census.py selftestpasses. - Any new data was regenerated with
build_census_dataset.py— the CSV/JSON are not hand-edited. - Every claim cites a source or a reproducing command.
- The model / size convention is stated explicitly for every number.
- Code stays stdlib-only (no new dependencies) unless discussed in an issue first.
Match the surrounding code: small pure functions, explicit term tuples (('v',name) | ('l',x,body) | ('a',f,x) | ('p',T,U)), and comments where a step is subtle. The two genuinely load-bearing subtleties — the bounded reduction-graph caps, and the "true normal form vs size-capped dead-end" distinction in analyze — are documented inline; preserve those notes.
By contributing you agree your code is MIT-licensed and your data and docs are CC BY 4.0 — the same split as the rest of the repo.
We have access to a 16-core research box and run a weekly experiment cadence. The 🔧 Contributor tier ($25/mo) on GitHub Sponsors turns that into a service for you:
Submit one research experiment per month — we run it on the cluster (experiments execute on a weekly cadence) and publish the open data and a write-up, crediting you. Plus priority issue triage and roadmap voting.
- Sponsor at the $25 tier at github.com/sponsors/peterlodri-sec.
-
Propose an experiment — open an issue describing the normalization-theoretic quantity you want measured: the size model, the term class, what gets classified, and what the output table should look like. The more it reads like an
src/bgk_lab.pyexperiment, the faster it runs. - We run it on the weekly cadence on the cluster, with the engine's caps (or larger caps where feasible).
- We publish the open data and write-up — same four research-bar rules apply: computed not asserted, undecided remainder stated, sources cited, model named — and you are credited.
The output lands as open data (CC BY 4.0) just like the existing Dataset, so your experiment becomes a citable, reproducible artifact, not a one-off favor.
- A density curve in a new size model for direct comparison against the natural-size results.
- Pushing a specific quantity (e.g. the WN∖SN separator density, which is not computed asymptotically in either of the two key papers — it is open) to higher
nthan we have published. - A targeted decider improvement measured against the current UND fraction at a fixed size.
- Replicating a BGK-lab invariant (perpetual-strategy law, ι-translation transfer) on a larger or different term population.
The $25 tier is the experiment tier. Higher tiers ($100 Backer, $500 Sponsor, $2,500 Partner) add README credit, early dataset access, office hours, and direction input; $5 Supporter and one-time tips help keep the compute and open-data hosting running. Full tier list on the Sponsors profile.
- Sponsor: github.com/sponsors/peterlodri-sec
- Newsletter and cross-project news: crabcc.app/about
- Contact: cabotage@pm.me
-
Data practices: minimal collection, no ad trackers, removal on request — see the repo
PRIVACY.md.
See also: Home · The Research · Results and Findings · Engine and API · Dataset · Reproduce · FAQ