Skip to content

PTG v0.3.0 — Structured lateral exchange (validated)

Choose a tag to compare

@dirvine dirvine released this 28 Jun 08:25
· 4 commits to main since this release

v0.3.0 — Structured lateral exchange (validated)

The first release with a statistically validated answer to PTG's core
question: does decentralized lateral exchange between cortical columns improve
answer quality over a monolithic equal-compute baseline?

Short answer: the raw lateral-text medium is quality-neutral to negative
and does not scale. The structured medium (bounded claim-excerpts + a
synthesis directive), on a 4B-class model, is quality-positive and stable across
4 → 150 columns (~80–85% win over the equal-call no-lateral control, p ≈ 0).

This is a research release, not a production claim. Every number is directional,
pre-registered, and caveated in the findings docs.

Highlights

The mechanism: structured lateral exchange

  • LateralContextMode::{Raw, Structured} in ptg-runtime. Structured mode
    injects a bounded, char-safe claim-excerpt of each neighbor's prediction plus
    a synthesis directive — never the full verbatim prediction.
  • --lateral-mode raw|structured in ptg and ptg-bench.

The evidence arc (raw → structured, 4 → 150 cols)

Run lateral win echo notes
raw, 4-col e2b coin flip (11v12) 25% mechanism activates, no quality gain
raw, 150-col e2b 14% 57% catastrophic at scale
structured, 4-col e4b 78.4% 11% the medium, not the concept
structured, 50-col e4b (powered) 85.1% 6.7% p ≈ 10⁻⁶
structured, 150-col e4b (powered) 82.4% 8.4% p ≈ 0, CI [78%, 86%]

Length-confounding ruled out at every scale (lateral wins even when its draft is
shorter). The effect saturates at ~80–85%; it does not keep strengthening
past ~50 columns.

Infrastructure: bounded column concurrency

The 150-column ceiling that blocked e4b for most of development was not
server capacity — it was unbounded client fan-out (join_all over all columns
fired 150 concurrent requests at a 4-slot server). Fixed with
CorticalMesh.max_concurrent_column_ticks + ptg-bench --column-concurrency.

Benchmark + judge methodology

ptg-bench (conditions, per-tick observability, routing, scale flags) and
ptg-judge (programmatic perturbation delta primary + blind LLM corroborating
judge, echo screen, determinism gate, length control). Pre-registered decision
bars set before every run.

Honest caveats (please read before citing)

  • Survivorship at 150 cols: 3/15 mesh runs failed (persistent HTTP 500 in
    MATH columns, retry-exhausted) and were excluded from the powered judge. If
    those would have been low-quality, exclusion inflates the 82.4%. This is the
    most important caveat and the top open item.
  • Single model: only gemma-4-e4b tested at scale.
  • Temperature-0 nondeterminism: the server is not perfectly deterministic at
    temp 0; some control pairs were excluded as unstable.
  • The 150-col 1p1r run's 93% was small-sample optimism; the powered 82.4% is the
    figure to cite.
  • ptg-belief (typed belief/evidence layer) is deferred — structured text
    exchange works without it.

What's next

  • Survivorship follow-up (a 0-mesh-failure run).
  • A4 explicit self-revision control (lateral exchange vs "reconsider your
    answer" at equal call budget).
  • Semantic embedding convergence (§9.3), blocked on the embeddings endpoint.

Full evidence + methodology: see docs/ROADMAP.md and the
docs/STRUCTURED_LATERAL_*.md series.