Releases: saorsa-labs/brain
PTG v0.3.1 — Setup & serve phases
v0.3.1 — Setup & serve phases
A focused follow-on to v0.3.0: the released ptg binary can now prepare its own
runtime environment, so running a mesh is a one-command affair rather than a
manual server/model setup.
What's new
ptg setup --yes # detect llama-server, download the gated Gemma QAT model (~2.7 GB), write config
ptg serve # launch the inference server in the foreground (config is remembered)
ptg --probe # then just run a mesh — no --vllm-url / --model flags neededptg setupdetectsllama-server(flag →PTG_LLAMA_SERVER→
LLAMA_SERVER→~/.cache/ptg/bin→~/llama-spike/.../build/bin→ PATH),
downloads the verified Gemma QAT model into the cache, and writes
~/.config/ptg/config.toml(%APPDATA%on Windows). Model download prefers
thehf/huggingface-clitool (handles Gemma gating best) and falls back
to a native resumable download (HF_TOKEN bearer auth,.part→ rename) if
the CLI is absent or broken. The token is used only for the download and is
never persisted.ptg servereads the config, short-circuits if the server is already
running, else launchesllama-serverin the foreground with the exact verified
flags.--dry-runprints the command. It does not daemonize.- Config-aware run path: the mesh command (
ptgwith no subcommand) now
resolves server URL and model asflag → env → setup config → bundled default,
so "setup once, then justptg" works. The subcommand is optional, so all
existing invocations are unchanged.
Honest prerequisites (documented in the README, not hidden)
llama-servermust be installed.ptg setupdetects it and prints exact
install instructions if missing, but deliberately does not build it
(GPU/platform-specific). Install from llama.cpp.- Gemma is gated. Accept the license at the model's HuggingFace page, then
authenticate (hf loginor setHF_TOKEN).
Validation
- 114 unit/integration tests pass; clippy/doc clean; panic-free (
-D warnings
with panic/unwrap/expect denies). - Live-tested locally:
setup --dry-run(detects the real server),serve --dry-run(exact server flags), config-aware run, and a real native
download (brokenhfshim → graceful fallback → file downloaded + renamed). - The release workflow now smoke-tests
ptg setup --dry-runon all four shipped
targets (Linux, macOS arm64/x86_64, Windows), gating packaging on the setup
command actually running on every platform.
No research changes
This release adds tooling only. The benchmarked structured-lateral findings and
the runtime concurrency fix from v0.3.0 are unchanged.
PTG v0.3.0 — Structured lateral exchange (validated)
v0.3.0 — Structured lateral exchange (validated)
The first release with a statistically validated answer to PTG's core
question: does decentralized lateral exchange between cortical columns improve
answer quality over a monolithic equal-compute baseline?
Short answer: the raw lateral-text medium is quality-neutral to negative
and does not scale. The structured medium (bounded claim-excerpts + a
synthesis directive), on a 4B-class model, is quality-positive and stable across
4 → 150 columns (~80–85% win over the equal-call no-lateral control, p ≈ 0).
This is a research release, not a production claim. Every number is directional,
pre-registered, and caveated in the findings docs.
Highlights
The mechanism: structured lateral exchange
LateralContextMode::{Raw, Structured}inptg-runtime. Structured mode
injects a bounded, char-safe claim-excerpt of each neighbor's prediction plus
a synthesis directive — never the full verbatim prediction.--lateral-mode raw|structuredinptgandptg-bench.
The evidence arc (raw → structured, 4 → 150 cols)
| Run | lateral win | echo | notes |
|---|---|---|---|
| raw, 4-col e2b | coin flip (11v12) | 25% | mechanism activates, no quality gain |
| raw, 150-col e2b | 14% | 57% | catastrophic at scale |
| structured, 4-col e4b | 78.4% | 11% | the medium, not the concept |
| structured, 50-col e4b (powered) | 85.1% | 6.7% | p ≈ 10⁻⁶ |
| structured, 150-col e4b (powered) | 82.4% | 8.4% | p ≈ 0, CI [78%, 86%] |
Length-confounding ruled out at every scale (lateral wins even when its draft is
shorter). The effect saturates at ~80–85%; it does not keep strengthening
past ~50 columns.
Infrastructure: bounded column concurrency
The 150-column ceiling that blocked e4b for most of development was not
server capacity — it was unbounded client fan-out (join_all over all columns
fired 150 concurrent requests at a 4-slot server). Fixed with
CorticalMesh.max_concurrent_column_ticks + ptg-bench --column-concurrency.
Benchmark + judge methodology
ptg-bench (conditions, per-tick observability, routing, scale flags) and
ptg-judge (programmatic perturbation delta primary + blind LLM corroborating
judge, echo screen, determinism gate, length control). Pre-registered decision
bars set before every run.
Honest caveats (please read before citing)
- Survivorship at 150 cols: 3/15 mesh runs failed (persistent HTTP 500 in
MATH columns, retry-exhausted) and were excluded from the powered judge. If
those would have been low-quality, exclusion inflates the 82.4%. This is the
most important caveat and the top open item. - Single model: only
gemma-4-e4btested at scale. - Temperature-0 nondeterminism: the server is not perfectly deterministic at
temp 0; some control pairs were excluded as unstable. - The 150-col 1p1r run's 93% was small-sample optimism; the powered 82.4% is the
figure to cite. ptg-belief(typed belief/evidence layer) is deferred — structured text
exchange works without it.
What's next
- Survivorship follow-up (a 0-mesh-failure run).
- A4 explicit self-revision control (lateral exchange vs "reconsider your
answer" at equal call budget). - Semantic embedding convergence (§9.3), blocked on the embeddings endpoint.
Full evidence + methodology: see docs/ROADMAP.md and the
docs/STRUCTURED_LATERAL_*.md series.
PTG v0.2.0 — Phase 3: convergence depth + lateral routing
PTG v0.2.0 — Phase 3: convergence depth + lateral routing
⚠️ Pre-release. A research artifact for internal team experimentation, not a stable API. PTG requires a livellama-serverand a gated Gemma model.
Phase 3 lands two new mechanisms built around the lateral homogenization signal we surfaced in v0.1.0: the mesh tends to converge toward the dominant interpretation and erase minority frames. v0.2.0 gives you tools to measure that and control it.
What's new since v0.1.0
1. Diversity-preserving lateral routing (--routing-policy) — Phase 3B
Until now every column heard from all its neighbors. That homogenizes: a confident column overwrites its neighbors' opinions. v0.2.0 lets a column listen selectively:
# only hear the 2 most confident neighbors
--routing-policy confidence-top-k --routing-k 2
# diversity mode: hear neighbors that say DIFFERENT things, not just confident ones
--routing-policy diversity --routing-k 2Diversity mode is the homogenization mitigation. It anchors on the most confident neighbor, then deliberately picks neighbors that disagree (by token overlap) with what it's already heard — so dissident/niche frames survive instead of being voted away. Every routing decision is captured per-tick in tick_outputs.routes (route_weight + confidence per source), so homogenization can now be measured, not just observed.
2. Prediction-stability convergence (--min-prediction-similarity) — Phase 3A
The mesh used to stop when self-reported confidence said "I'm sure." But a confident model just says 0.95 every tick — the old signal couldn't tell a settled mesh from a stuck one. Added a second stop signal based on whether predictions have actually stopped changing (word overlap), independent of confidence:
--min-prediction-similarity 0.85The CLI now prints which criterion stopped the epoch:
convergence: prediction token-similarity stabilized
Pilot signal (single run — a lead, not a result)
On the operator/automation-fault prompt, the 9-column torus behaved differently under the two routing policies:
all: the psych column collapsed to physics "catastrophic failure threshold" language at 0.98 confdiversity --routing-k 2: the psych column retained its own operator frame at 0.92 conf
That's the central Phase 3 hypothesis — topology/routing controls homogenization — showing up on a live run. Do not cite this as a finding; it's one run. It's a lead that needs the A3 scaled benchmark to become evidence.
Quick start (unchanged from v0.1.0)
git clone https://github.com/saorsa-labs/brain && cd brain
git checkout v0.2.0
scripts/start-gemma4-qat.sh # http://127.0.0.1:18136
cargo run -p ptg-cli --bin ptg -- \
--vllm-url http://127.0.0.1:18136 --model gemma-4-e2b-qat \
--column-pack examples/column-packs/abstraction-ladder-9.toml \
--topology torus --torus-width 3 --torus-height 3 --columns 9 \
--routing-policy diversity --routing-k 2 \
--min-ticks 2 --ticks 3 --max-tokens 2048 --temperature 0 \
--input "<your prompt>"Three things to try
# A) Baseline: everyone hears everyone (the old behavior)
--topology torus --torus-width 3 --torus-height 3 --columns 9
# B) Same prompt, diversity routing — do niche columns survive?
--topology torus --torus-width 3 --torus-height 3 --columns 9 \
--routing-policy diversity --routing-k 2
# C) Deliberately ambiguous prompt (jumbled words) — does diversity keep the
# mesh from collapsing to one reading?What we'd love to hear back: does diversity routing help on your prompts (more useful, less groupthink) or hurt (columns can't agree)? Your prompts will tell us more than ours did.
What's in this release (cumulative)
- 5-crate Rust workspace, panic-free, clippy-clean (
-D warnings), 95 tests ptgCLI:--topology {default,ring,ring-bi,torus,fully-connected,small-world},--column-pack,--routing-policy {all,confidence-top-k,diversity},--routing-k,--min-ticks,--min-prediction-similarity,--max-tokens,--temperature,--dry-run,--probe, multimodal--image-url- Convergence: confidence (mean/delta/cosine) + model-independent prediction-stability, with
convergence_reasonreporting - Routing: All / ConfidenceTopK / diversity-preserving (MMR via token-Jaccard), observable per-tick via
TickOutputs.routes scripts/start-gemma4-qat.shportable launcherptg-bench+ptg-judgebinaries- Docs: Tutorial, Specification, Architecture, Roadmap, Benchmarking
What's NOT proven yet (being honest)
- All homogenization/routing observations are single runs, not statistics. Real on the prompts we tried; not enough runs to call a result.
- Routing decisions are in the runtime's
tick_outputs.routesand will surface inptg-benchJSON once extended. The plainptgrun shows the effect in predictions, not a route log. - Semantic cosine convergence (§9.3) remains blocked — the live server returns HTTP 501 on
/v1/embeddings.
Validation
cargo fmt --all --check✅cargo clippy --all-features --all-targets -- -D warnings -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used✅ (0 issues)cargo check --workspace --all-targets✅cargo test✅ (95 passed, 0 failed)- Live-validated against
gemma-4-e2b-qat(both new mechanisms, multiple topologies).
License
Dual-licensed under MIT OR Apache-2.0.
PTG v0.1.0 — team test scaffold
PTG v0.1.0 — team test scaffold
⚠️ Pre-release. A research artifact for internal team experimentation, not a stable API. PTG requires a livellama-serverand a gated Gemma model — it is not a drop-in library.
This is the first tagged build of Project Thousand-Gemma: a distributed, prompt-based cortical mesh simulator in Rust implementing Jeff Hawkins' Thousand Brains Theory. A teammate can now clone → start the QAT server → run a mesh → swap column prompts to experiment with abstraction levels.
Quick start
git clone https://github.com/saorsa-labs/brain && cd brain
# 1. Start the verified Gemma 4 QAT model server (downloads ~2.6 GB on first run)
scripts/start-gemma4-qat.sh # serves on http://127.0.0.1:18136
# 2. Probe it
cargo run -p ptg-cli --bin ptg -- --probe \
--vllm-url http://127.0.0.1:18136 --model gemma-4-e2b-qat
# 3. See the mesh wiring (no inference)
cargo run -p ptg-cli --bin ptg -- --dry-run \
--column-pack examples/column-packs/abstraction-ladder-9.toml \
--topology torus --torus-width 3 --torus-height 3 --columns 9
# 4. Run it live (9 columns, 3 abstraction levels, 3x3 torus)
cargo run -p ptg-cli --bin ptg -- \
--vllm-url http://127.0.0.1:18136 --model gemma-4-e2b-qat \
--column-pack examples/column-packs/abstraction-ladder-9.toml \
--topology torus --torus-width 3 --torus-height 3 --columns 9 \
--min-ticks 2 --ticks 3 --max-tokens 2048 --temperature 0 \
--input "A satellite in decaying orbit shows rising thermal load, decreasing altitude, and intermittent guidance resets. What will happen next?"Full walkthrough (prerequisites, 3-tier model setup, topologies, experiment recipes, de-risk ladder, common failures): docs/TUTORIAL.md.
What's in this release
- 5-crate Rust workspace (panic-free, clippy-clean under
-D warnings, 78 tests) ptgCLI with--topology {default,ring,ring-bi,torus,fully-connected,small-world},--column-pack,--min-ticks,--max-tokens,--temperature,--dry-run,--probe, multimodal--image-url- Pluggable topologies with degeneracy guardrails (Phase 3)
- Column packs (TOML) for abstraction-level experiments +
abstraction-ladder-9.tomlexample scripts/start-gemma4-qat.shportable launcherptg-bench+ptg-judgebinaries for benchmark methodology and the A2 mechanism-activation judge- Docs: Specification, Architecture, Roadmap, Benchmarking methodology, Tutorial
Pilot research signal (not a benchmarked result)
From a single live QAT run on the 9-column abstraction ladder — captured as a research direction in the README:
- Confidence stratifies by abstraction level: high-level columns reported mean confidence ~0.92 vs ~0.68 for low-level columns on a causal prompt.
- Low-level drift without divergence: on ambiguous token input, low-level columns drifted toward token prediction, but lateral exchange pulled them toward the dominant physics framing.
- Topology changes propagation: a niche "context" column's framing propagated fast on a 4-neighbor torus (0.98 conf) but stayed isolated on a 1-neighbor ring (0.85 conf).
Central tension worth investigating: lateral consensus is a homogenizing force — great when the dominant frame is correct, a failure mode when the dissenting view is the one that matters. See the README's Early research signal section.
Known limits
- Gated model required. Gemma is gated; you must accept the license and authenticate with
hf loginbefore the launcher can download it. llama-serverrequired (built from llama.cpp). The launcher detects it but does not build it.- No integration LLM yet. The mesh emits per-column outputs; "voting" is confidence-threshold filtering only.
- Embeddings blocked. The live server returns HTTP 501 on
/v1/embeddings, so semantic-cosine convergence is deferred. - Fail-fast. One column emitting truncated/malformed JSON aborts the epoch (use the de-risk ladder in the tutorial).
- Confidence is self-reported. An overconfident model converges prematurely;
--min-ticksis the workaround.
Validation
cargo fmt --all --check✅cargo clippy --all-features --all-targets -- -D warnings -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used✅ (0 issues)cargo check --workspace --all-targets✅cargo test✅ (78 passed, 0 failed)- End-to-end validated against a live
gemma-4-e2b-qatserver.
License
Dual-licensed under MIT OR Apache-2.0.