PTG v0.3.0 — Structured lateral exchange (validated)
v0.3.0 — Structured lateral exchange (validated)
The first release with a statistically validated answer to PTG's core
question: does decentralized lateral exchange between cortical columns improve
answer quality over a monolithic equal-compute baseline?
Short answer: the raw lateral-text medium is quality-neutral to negative
and does not scale. The structured medium (bounded claim-excerpts + a
synthesis directive), on a 4B-class model, is quality-positive and stable across
4 → 150 columns (~80–85% win over the equal-call no-lateral control, p ≈ 0).
This is a research release, not a production claim. Every number is directional,
pre-registered, and caveated in the findings docs.
Highlights
The mechanism: structured lateral exchange
LateralContextMode::{Raw, Structured}inptg-runtime. Structured mode
injects a bounded, char-safe claim-excerpt of each neighbor's prediction plus
a synthesis directive — never the full verbatim prediction.--lateral-mode raw|structuredinptgandptg-bench.
The evidence arc (raw → structured, 4 → 150 cols)
| Run | lateral win | echo | notes |
|---|---|---|---|
| raw, 4-col e2b | coin flip (11v12) | 25% | mechanism activates, no quality gain |
| raw, 150-col e2b | 14% | 57% | catastrophic at scale |
| structured, 4-col e4b | 78.4% | 11% | the medium, not the concept |
| structured, 50-col e4b (powered) | 85.1% | 6.7% | p ≈ 10⁻⁶ |
| structured, 150-col e4b (powered) | 82.4% | 8.4% | p ≈ 0, CI [78%, 86%] |
Length-confounding ruled out at every scale (lateral wins even when its draft is
shorter). The effect saturates at ~80–85%; it does not keep strengthening
past ~50 columns.
Infrastructure: bounded column concurrency
The 150-column ceiling that blocked e4b for most of development was not
server capacity — it was unbounded client fan-out (join_all over all columns
fired 150 concurrent requests at a 4-slot server). Fixed with
CorticalMesh.max_concurrent_column_ticks + ptg-bench --column-concurrency.
Benchmark + judge methodology
ptg-bench (conditions, per-tick observability, routing, scale flags) and
ptg-judge (programmatic perturbation delta primary + blind LLM corroborating
judge, echo screen, determinism gate, length control). Pre-registered decision
bars set before every run.
Honest caveats (please read before citing)
- Survivorship at 150 cols: 3/15 mesh runs failed (persistent HTTP 500 in
MATH columns, retry-exhausted) and were excluded from the powered judge. If
those would have been low-quality, exclusion inflates the 82.4%. This is the
most important caveat and the top open item. - Single model: only
gemma-4-e4btested at scale. - Temperature-0 nondeterminism: the server is not perfectly deterministic at
temp 0; some control pairs were excluded as unstable. - The 150-col 1p1r run's 93% was small-sample optimism; the powered 82.4% is the
figure to cite. ptg-belief(typed belief/evidence layer) is deferred — structured text
exchange works without it.
What's next
- Survivorship follow-up (a 0-mesh-failure run).
- A4 explicit self-revision control (lateral exchange vs "reconsider your
answer" at equal call budget). - Semantic embedding convergence (§9.3), blocked on the embeddings endpoint.
Full evidence + methodology: see docs/ROADMAP.md and the
docs/STRUCTURED_LATERAL_*.md series.