A Python implementation of the concepts from "Artificial Intelligence and the Structure of Mathematics" by Barkeshli, Douglas & Freedman (2026) — arXiv:2604.06107.
This framework models mathematics as a directed acyclic hypergraph and implements an Automated Mathematical Discovery (AMD) agent that explores, proves, and abstracts within formal systems.
| Metric | Value |
|---|---|
| Propositional Logic: 3 axioms, depth 3 | 6,798 nodes |
| Growth ratios (per layer) | 4.0 → 8.5 → 65.5 |
Backward search: proves 2 + 3 = 5 |
Success |
| Auto-discovered abstractions | 5 lemmas, 4.91x avg compression |
| AMD evaluation criteria (Figure 1) | 9/10 PASS |
The explosive growth ratios (4 → 8.5 → 65.5) empirically confirm the double-exponential proliferation predicted by the paper.
src/
hypergraph.py # Universal Proof Hypergraph (U) & Structural Hypergraph (S)
complexity.py # Proof complexity c(G), efficiency E(P), interestingness
axioms.py # Propositional Logic & Peano Arithmetic axiom systems
proof_search.py # Forward/backward chaining proof search
abstraction.py # Pattern detection, compression, auto-abstraction
amd_agent.py # AMD agent with goal generation → proof → learn → curate loop
visualization.py # Hypergraph, growth, and discovery progress plots
main.py # Runs all 4 demos end-to-end
Models all provable propositions as vertices and deduction rules as hyperedges in a directed acyclic hypergraph. Axioms sit at depth 0; each layer applies rules to produce the next.
- Proof complexity
c(G): total cost of edges and inputs in a proof subgraph - Minimal complexity
m(s): cheapest known proof of statements - Efficiency
E(P) = m(P) / l(P): high ratio = "deep" theorem (short statement, long proof) - Hub / bottleneck scores: structural importance in the graph
| System | Axioms | Rules |
|---|---|---|
| Propositional Logic | Atomic propositions P, Q, R, ... | AND-intro, OR-intro, Modus Ponens, Double Negation |
| Peano Arithmetic | Zero, Successor | Addition, Multiplication, Equality Symmetry, Commutativity |
- Forward search: breadth-first layer expansion from axioms, demonstrating combinatorial explosion
- Backward search: goal-directed — checks existing graph, then iteratively deepens
Detects recurring deduction patterns and compresses them into reusable lemma nodes, reducing proof complexity. Inspired by DreamCoder/Lilo library learning.
Implements the paper's discovery loop:
repeat:
1. Generate goals — conjectures from random combination, frontier, analogy
2. Attempt proofs — backward search with time budget
3. Learn & abstract — detect patterns, create lemmas
4. Curate — rank by interestingness, prune knowledge base
Evaluated against the 10 AMD criteria from Figure 1 of the paper.
# Requirements: Python 3.10+, matplotlib, numpy
pip install matplotlib numpy
# Run all demos
python main.pyOutput visualizations are saved to output/.
3 axioms (P, Q, R) expand into 6,798 nodes across 3 depth layers. The tree-like fan-out visually demonstrates combinatorial explosion.
Layer sizes: {0: 3, 1: 12, 2: 102, 3: 6681} — each layer grows faster than the last, approaching the double-exponential bound k^(2^j) predicted by the paper.
Starting from 0 and S, the system derives addition and multiplication tables, commutativity instances, and successfully proves 2 + 3 = 5 via backward search.
The agent autonomously generates conjectures, proves novel theorems, and discovers abstractions over multiple iterations.
Barkeshli, M., Douglas, M. R., & Freedman, M. H. (2026). Artificial Intelligence and the Structure of Mathematics. arXiv:2604.06107 [cs.AI].
The paper proposes modeling mathematics as a hypergraph to understand:
- What is the structure of all provable mathematics?
- What characterizes the small region explored by humans?
- Can AI discover and expand this region?
This implementation provides a concrete, runnable prototype of these ideas.
MIT