You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a structured, interlinked Markdown knowledge base that converts all Swarm publications from papers.ethswarm.org into wiki pages — maintained by LLM agents, not humans. The bee repo would reference this knowledge base via AGENTS.md or CLAUDE.md, giving AI coding agents deep Swarm context when working on the codebase.
Motivation
AI coding agents (Claude Code, Codex, Cursor, etc.) are increasingly used to contribute to open-source projects. These agents work best when they have structured domain knowledge — not just code, but the concepts, architecture, protocols, and design rationale behind the code.
Swarm has excellent source material across multiple publications, but today this knowledge is locked in PDFs and scattered docs. An AI agent working on bee has no efficient way to understand why things are designed the way they are, what trade-offs were made, or how the protocol layers interact. A developer asking an AI to work on the redistribution game, for example, would benefit enormously from the agent having access to the formal definitions (Definitions 23–43 from the spec), the design rationale (Book of Swarm Chapter 3), and how the game interacts with postage stamps, reserve sampling, and the price oracle — all cross-referenced in one place.
How It Would Work
The approach follows the LLM Wiki pattern described by Andrej Karpathy — a persistent, compounding knowledge base maintained by LLM agents rather than humans.
Layer 1 — Raw Sources: All PDFs from papers.ethswarm.org converted to Markdown (using tools like Datalab/Marker). These are immutable — the LLM reads but never modifies them.
Layer 2 — Wiki Pages: Structured pages organized by type (concepts, protocols, incentives, integration, papers, queries). Each page has YAML frontmatter, cross-references, glossary terms, formal definitions where applicable, and source citations.
Layer 3 — Schema: A CLAUDE.md that defines the wiki's structure, page format conventions, and three core workflows:
Ingest — process a source document into wiki pages, update cross-references, update index, log the operation
Query — answer questions by reading relevant wiki pages; file valuable answers back as new pages
Lint — two-layer health check: automated structural checks, then LLM semantic analysis (contradictions, gaps, missing connections)
What this produces
Starting from 9 source documents (~13,000 lines of raw material from papers.ethswarm.org), the ingestion process would build approximately 17+ wiki pages (~2,250+ lines of structured content) from just the first two major sources alone (Book of Swarm and Formal Specification Chapter 1). The full set of sources would produce significantly more.
Cross-referenced: Every page links to related pages. A redistribution page would link to postage stamps, price oracle, DISC, chunks, Kademlia, and pull-sync. An agent reading any page discovers the full context web.
Formally enriched: Mathematical definitions from the Formal Specification get integrated into the relevant concept pages. For instance, chunks.md would include the formal BMT hash definition (Δ[H,n]), CAC/SOC/PAC constructors, and segment inclusion proof structures — woven into the concept explanation, not isolated in a separate spec page.
Glossary-enriched: The ~180 formal terms from the Book of Swarm glossary get distributed across all topic pages, each term placed where it's most relevant.
Auditable: Every page has YAML frontmatter (title, type, sources, tags, last_updated), a Sources section citing the exact raw documents and sections it was derived from, and the operation log records every ingestion step.
Suggested approach for ingestion
Given that the Book of Swarm is ~6,500 lines and cannot fit in a single LLM context window, the ingestion should be split into chapter-level steps. A practical plan:
Phase 1 — Book of Swarm (8 steps): Ingest chapter by chapter. Each step reads one chapter, creates/updates the relevant concept/protocol/incentive pages, updates cross-references, and runs the linter. The first step creates the backbone (overview, index, paper summary); subsequent steps enrich it.
Phase 2 — Formal Specification (3 steps): Ingest by chapter. Chapter 1 (Definitions 1–43) enriches existing concept pages with mathematical formalism. Chapter 2 (Definitions 44–102) provides implementation-level data types and algorithms. Appendices add density estimation, randomness analysis, and parameter constants.
Phase 3 — Remaining papers (7 steps): Each smaller paper (whitepaper, protocol spec, erasure coding papers, price oracle, batch utilisation, DREAM) is ingested in a single step, creating its paper summary page and enriching relevant wiki pages.
After each step: update index.md, update overview.md, append to log.md, run the structural linter, commit.
Tooling
Three scripts support the workflow:
convert_docs.py — PDF/DOCX to Markdown conversion (e.g., via Datalab API)
wiki_lint.py — 9 automated structural checks (broken links, orphan pages, stale content, missing frontmatter, placeholder detection, tag consistency)
wiki_search.py — BM25 search with title/tag/body boosting; CLI and optional Flask web UI
Integration with bee
Add an AGENTS.md (or CLAUDE.md) to the bee repository pointing to the knowledge base repo. When an AI agent opens bee, it discovers the knowledge base and can:
Understand protocol design rationale before modifying code
Look up formal specifications when implementing features
Understand the relationship between protocol layers (e.g., how push-sync interacts with postage stamp validation)
Get context on Swarm-specific concepts (chunks, neighborhoods, postage, redistribution, etc.)
Answer "why" questions — not just "what does this function do" but "why was it designed this way"
Proposed Scope
Phase 1 — Foundation
Create the knowledge base repo with the three-layer structure
Convert all 9 papers from papers.ethswarm.org to Markdown
Write the schema (CLAUDE.md) with page format, workflows, conventions
Build tooling (lint, search, converter)
Ingest The Book of Swarm (all chapters + glossary, ~8 steps)
Summary
Create a structured, interlinked Markdown knowledge base that converts all Swarm publications from papers.ethswarm.org into wiki pages — maintained by LLM agents, not humans. The
beerepo would reference this knowledge base viaAGENTS.mdorCLAUDE.md, giving AI coding agents deep Swarm context when working on the codebase.Motivation
AI coding agents (Claude Code, Codex, Cursor, etc.) are increasingly used to contribute to open-source projects. These agents work best when they have structured domain knowledge — not just code, but the concepts, architecture, protocols, and design rationale behind the code.
Swarm has excellent source material across multiple publications, but today this knowledge is locked in PDFs and scattered docs. An AI agent working on
beehas no efficient way to understand why things are designed the way they are, what trade-offs were made, or how the protocol layers interact. A developer asking an AI to work on the redistribution game, for example, would benefit enormously from the agent having access to the formal definitions (Definitions 23–43 from the spec), the design rationale (Book of Swarm Chapter 3), and how the game interacts with postage stamps, reserve sampling, and the price oracle — all cross-referenced in one place.How It Would Work
The approach follows the LLM Wiki pattern described by Andrej Karpathy — a persistent, compounding knowledge base maintained by LLM agents rather than humans.
Architecture — Three Layers
Layer 1 — Raw Sources: All PDFs from papers.ethswarm.org converted to Markdown (using tools like Datalab/Marker). These are immutable — the LLM reads but never modifies them.
Layer 2 — Wiki Pages: Structured pages organized by type (concepts, protocols, incentives, integration, papers, queries). Each page has YAML frontmatter, cross-references, glossary terms, formal definitions where applicable, and source citations.
Layer 3 — Schema: A
CLAUDE.mdthat defines the wiki's structure, page format conventions, and three core workflows:What this produces
Starting from 9 source documents (~13,000 lines of raw material from papers.ethswarm.org), the ingestion process would build approximately 17+ wiki pages (~2,250+ lines of structured content) from just the first two major sources alone (Book of Swarm and Formal Specification Chapter 1). The full set of sources would produce significantly more.
Here is the kind of page structure that emerges:
Key qualities of the output
Cross-referenced: Every page links to related pages. A redistribution page would link to postage stamps, price oracle, DISC, chunks, Kademlia, and pull-sync. An agent reading any page discovers the full context web.
Formally enriched: Mathematical definitions from the Formal Specification get integrated into the relevant concept pages. For instance,
chunks.mdwould include the formal BMT hash definition (Δ[H,n]), CAC/SOC/PAC constructors, and segment inclusion proof structures — woven into the concept explanation, not isolated in a separate spec page.Glossary-enriched: The ~180 formal terms from the Book of Swarm glossary get distributed across all topic pages, each term placed where it's most relevant.
Auditable: Every page has YAML frontmatter (title, type, sources, tags, last_updated), a Sources section citing the exact raw documents and sections it was derived from, and the operation log records every ingestion step.
Suggested approach for ingestion
Given that the Book of Swarm is ~6,500 lines and cannot fit in a single LLM context window, the ingestion should be split into chapter-level steps. A practical plan:
Phase 1 — Book of Swarm (8 steps): Ingest chapter by chapter. Each step reads one chapter, creates/updates the relevant concept/protocol/incentive pages, updates cross-references, and runs the linter. The first step creates the backbone (overview, index, paper summary); subsequent steps enrich it.
Phase 2 — Formal Specification (3 steps): Ingest by chapter. Chapter 1 (Definitions 1–43) enriches existing concept pages with mathematical formalism. Chapter 2 (Definitions 44–102) provides implementation-level data types and algorithms. Appendices add density estimation, randomness analysis, and parameter constants.
Phase 3 — Remaining papers (7 steps): Each smaller paper (whitepaper, protocol spec, erasure coding papers, price oracle, batch utilisation, DREAM) is ingested in a single step, creating its paper summary page and enriching relevant wiki pages.
After each step: update
index.md, updateoverview.md, append tolog.md, run the structural linter, commit.Tooling
Three scripts support the workflow:
convert_docs.py— PDF/DOCX to Markdown conversion (e.g., via Datalab API)wiki_lint.py— 9 automated structural checks (broken links, orphan pages, stale content, missing frontmatter, placeholder detection, tag consistency)wiki_search.py— BM25 search with title/tag/body boosting; CLI and optional Flask web UIIntegration with
beeAdd an
AGENTS.md(orCLAUDE.md) to thebeerepository pointing to the knowledge base repo. When an AI agent opensbee, it discovers the knowledge base and can:Proposed Scope
Phase 1 — Foundation
CLAUDE.md) with page format, workflows, conventionsPhase 2 — Protocol Pages and Integration
AGENTS.mdto thebeerepoPhase 3 — Community and Automation
Benefits