Skip to content

Releases: saorsa-labs/brain

PTG v0.3.1 — Setup & serve phases

28 Jun 11:07

Choose a tag to compare

v0.3.1 — Setup & serve phases

A focused follow-on to v0.3.0: the released ptg binary can now prepare its own
runtime environment, so running a mesh is a one-command affair rather than a
manual server/model setup.

What's new

ptg setup --yes    # detect llama-server, download the gated Gemma QAT model (~2.7 GB), write config
ptg serve          # launch the inference server in the foreground (config is remembered)
ptg --probe        # then just run a mesh — no --vllm-url / --model flags needed
  • ptg setup detects llama-server (flag → PTG_LLAMA_SERVER
    LLAMA_SERVER~/.cache/ptg/bin~/llama-spike/.../build/bin → PATH),
    downloads the verified Gemma QAT model into the cache, and writes
    ~/.config/ptg/config.toml (%APPDATA% on Windows). Model download prefers
    the hf / huggingface-cli tool (handles Gemma gating best) and falls back
    to a native resumable download
    (HF_TOKEN bearer auth, .part → rename) if
    the CLI is absent or broken. The token is used only for the download and is
    never persisted.
  • ptg serve reads the config, short-circuits if the server is already
    running, else launches llama-server in the foreground with the exact verified
    flags. --dry-run prints the command. It does not daemonize.
  • Config-aware run path: the mesh command (ptg with no subcommand) now
    resolves server URL and model as flag → env → setup config → bundled default,
    so "setup once, then just ptg" works. The subcommand is optional, so all
    existing invocations are unchanged.

Honest prerequisites (documented in the README, not hidden)

  • llama-server must be installed. ptg setup detects it and prints exact
    install instructions if missing, but deliberately does not build it
    (GPU/platform-specific). Install from llama.cpp.
  • Gemma is gated. Accept the license at the model's HuggingFace page, then
    authenticate (hf login or set HF_TOKEN).

Validation

  • 114 unit/integration tests pass; clippy/doc clean; panic-free (-D warnings
    with panic/unwrap/expect denies).
  • Live-tested locally: setup --dry-run (detects the real server), serve --dry-run (exact server flags), config-aware run, and a real native
    download
    (broken hf shim → graceful fallback → file downloaded + renamed).
  • The release workflow now smoke-tests ptg setup --dry-run on all four shipped
    targets (Linux, macOS arm64/x86_64, Windows), gating packaging on the setup
    command actually running on every platform.

No research changes

This release adds tooling only. The benchmarked structured-lateral findings and
the runtime concurrency fix from v0.3.0 are unchanged.

PTG v0.3.0 — Structured lateral exchange (validated)

28 Jun 08:25

Choose a tag to compare

v0.3.0 — Structured lateral exchange (validated)

The first release with a statistically validated answer to PTG's core
question: does decentralized lateral exchange between cortical columns improve
answer quality over a monolithic equal-compute baseline?

Short answer: the raw lateral-text medium is quality-neutral to negative
and does not scale. The structured medium (bounded claim-excerpts + a
synthesis directive), on a 4B-class model, is quality-positive and stable across
4 → 150 columns (~80–85% win over the equal-call no-lateral control, p ≈ 0).

This is a research release, not a production claim. Every number is directional,
pre-registered, and caveated in the findings docs.

Highlights

The mechanism: structured lateral exchange

  • LateralContextMode::{Raw, Structured} in ptg-runtime. Structured mode
    injects a bounded, char-safe claim-excerpt of each neighbor's prediction plus
    a synthesis directive — never the full verbatim prediction.
  • --lateral-mode raw|structured in ptg and ptg-bench.

The evidence arc (raw → structured, 4 → 150 cols)

Run lateral win echo notes
raw, 4-col e2b coin flip (11v12) 25% mechanism activates, no quality gain
raw, 150-col e2b 14% 57% catastrophic at scale
structured, 4-col e4b 78.4% 11% the medium, not the concept
structured, 50-col e4b (powered) 85.1% 6.7% p ≈ 10⁻⁶
structured, 150-col e4b (powered) 82.4% 8.4% p ≈ 0, CI [78%, 86%]

Length-confounding ruled out at every scale (lateral wins even when its draft is
shorter). The effect saturates at ~80–85%; it does not keep strengthening
past ~50 columns.

Infrastructure: bounded column concurrency

The 150-column ceiling that blocked e4b for most of development was not
server capacity — it was unbounded client fan-out (join_all over all columns
fired 150 concurrent requests at a 4-slot server). Fixed with
CorticalMesh.max_concurrent_column_ticks + ptg-bench --column-concurrency.

Benchmark + judge methodology

ptg-bench (conditions, per-tick observability, routing, scale flags) and
ptg-judge (programmatic perturbation delta primary + blind LLM corroborating
judge, echo screen, determinism gate, length control). Pre-registered decision
bars set before every run.

Honest caveats (please read before citing)

  • Survivorship at 150 cols: 3/15 mesh runs failed (persistent HTTP 500 in
    MATH columns, retry-exhausted) and were excluded from the powered judge. If
    those would have been low-quality, exclusion inflates the 82.4%. This is the
    most important caveat and the top open item.
  • Single model: only gemma-4-e4b tested at scale.
  • Temperature-0 nondeterminism: the server is not perfectly deterministic at
    temp 0; some control pairs were excluded as unstable.
  • The 150-col 1p1r run's 93% was small-sample optimism; the powered 82.4% is the
    figure to cite.
  • ptg-belief (typed belief/evidence layer) is deferred — structured text
    exchange works without it.

What's next

  • Survivorship follow-up (a 0-mesh-failure run).
  • A4 explicit self-revision control (lateral exchange vs "reconsider your
    answer" at equal call budget).
  • Semantic embedding convergence (§9.3), blocked on the embeddings endpoint.

Full evidence + methodology: see docs/ROADMAP.md and the
docs/STRUCTURED_LATERAL_*.md series.

PTG v0.2.0 — Phase 3: convergence depth + lateral routing

25 Jun 14:59

Choose a tag to compare

PTG v0.2.0 — Phase 3: convergence depth + lateral routing

⚠️ Pre-release. A research artifact for internal team experimentation, not a stable API. PTG requires a live llama-server and a gated Gemma model.

Phase 3 lands two new mechanisms built around the lateral homogenization signal we surfaced in v0.1.0: the mesh tends to converge toward the dominant interpretation and erase minority frames. v0.2.0 gives you tools to measure that and control it.

What's new since v0.1.0

1. Diversity-preserving lateral routing (--routing-policy) — Phase 3B

Until now every column heard from all its neighbors. That homogenizes: a confident column overwrites its neighbors' opinions. v0.2.0 lets a column listen selectively:

# only hear the 2 most confident neighbors
--routing-policy confidence-top-k --routing-k 2

# diversity mode: hear neighbors that say DIFFERENT things, not just confident ones
--routing-policy diversity --routing-k 2

Diversity mode is the homogenization mitigation. It anchors on the most confident neighbor, then deliberately picks neighbors that disagree (by token overlap) with what it's already heard — so dissident/niche frames survive instead of being voted away. Every routing decision is captured per-tick in tick_outputs.routes (route_weight + confidence per source), so homogenization can now be measured, not just observed.

2. Prediction-stability convergence (--min-prediction-similarity) — Phase 3A

The mesh used to stop when self-reported confidence said "I'm sure." But a confident model just says 0.95 every tick — the old signal couldn't tell a settled mesh from a stuck one. Added a second stop signal based on whether predictions have actually stopped changing (word overlap), independent of confidence:

--min-prediction-similarity 0.85

The CLI now prints which criterion stopped the epoch:

convergence: prediction token-similarity stabilized

Pilot signal (single run — a lead, not a result)

On the operator/automation-fault prompt, the 9-column torus behaved differently under the two routing policies:

  • all: the psych column collapsed to physics "catastrophic failure threshold" language at 0.98 conf
  • diversity --routing-k 2: the psych column retained its own operator frame at 0.92 conf

That's the central Phase 3 hypothesis — topology/routing controls homogenization — showing up on a live run. Do not cite this as a finding; it's one run. It's a lead that needs the A3 scaled benchmark to become evidence.

Quick start (unchanged from v0.1.0)

git clone https://github.com/saorsa-labs/brain && cd brain
git checkout v0.2.0

scripts/start-gemma4-qat.sh        # http://127.0.0.1:18136

cargo run -p ptg-cli --bin ptg -- \
    --vllm-url http://127.0.0.1:18136 --model gemma-4-e2b-qat \
    --column-pack examples/column-packs/abstraction-ladder-9.toml \
    --topology torus --torus-width 3 --torus-height 3 --columns 9 \
    --routing-policy diversity --routing-k 2 \
    --min-ticks 2 --ticks 3 --max-tokens 2048 --temperature 0 \
    --input "<your prompt>"

Three things to try

# A) Baseline: everyone hears everyone (the old behavior)
--topology torus --torus-width 3 --torus-height 3 --columns 9

# B) Same prompt, diversity routing — do niche columns survive?
--topology torus --torus-width 3 --torus-height 3 --columns 9 \
  --routing-policy diversity --routing-k 2

# C) Deliberately ambiguous prompt (jumbled words) — does diversity keep the
#    mesh from collapsing to one reading?

What we'd love to hear back: does diversity routing help on your prompts (more useful, less groupthink) or hurt (columns can't agree)? Your prompts will tell us more than ours did.

What's in this release (cumulative)

  • 5-crate Rust workspace, panic-free, clippy-clean (-D warnings), 95 tests
  • ptg CLI: --topology {default,ring,ring-bi,torus,fully-connected,small-world}, --column-pack, --routing-policy {all,confidence-top-k,diversity}, --routing-k, --min-ticks, --min-prediction-similarity, --max-tokens, --temperature, --dry-run, --probe, multimodal --image-url
  • Convergence: confidence (mean/delta/cosine) + model-independent prediction-stability, with convergence_reason reporting
  • Routing: All / ConfidenceTopK / diversity-preserving (MMR via token-Jaccard), observable per-tick via TickOutputs.routes
  • scripts/start-gemma4-qat.sh portable launcher
  • ptg-bench + ptg-judge binaries
  • Docs: Tutorial, Specification, Architecture, Roadmap, Benchmarking

What's NOT proven yet (being honest)

  • All homogenization/routing observations are single runs, not statistics. Real on the prompts we tried; not enough runs to call a result.
  • Routing decisions are in the runtime's tick_outputs.routes and will surface in ptg-bench JSON once extended. The plain ptg run shows the effect in predictions, not a route log.
  • Semantic cosine convergence (§9.3) remains blocked — the live server returns HTTP 501 on /v1/embeddings.

Validation

  • cargo fmt --all --check
  • cargo clippy --all-features --all-targets -- -D warnings -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used ✅ (0 issues)
  • cargo check --workspace --all-targets
  • cargo test ✅ (95 passed, 0 failed)
  • Live-validated against gemma-4-e2b-qat (both new mechanisms, multiple topologies).

License

Dual-licensed under MIT OR Apache-2.0.

PTG v0.1.0 — team test scaffold

24 Jun 22:17

Choose a tag to compare

Pre-release

PTG v0.1.0 — team test scaffold

⚠️ Pre-release. A research artifact for internal team experimentation, not a stable API. PTG requires a live llama-server and a gated Gemma model — it is not a drop-in library.

This is the first tagged build of Project Thousand-Gemma: a distributed, prompt-based cortical mesh simulator in Rust implementing Jeff Hawkins' Thousand Brains Theory. A teammate can now clone → start the QAT server → run a mesh → swap column prompts to experiment with abstraction levels.

Quick start

git clone https://github.com/saorsa-labs/brain && cd brain

# 1. Start the verified Gemma 4 QAT model server (downloads ~2.6 GB on first run)
scripts/start-gemma4-qat.sh        # serves on http://127.0.0.1:18136

# 2. Probe it
cargo run -p ptg-cli --bin ptg -- --probe \
    --vllm-url http://127.0.0.1:18136 --model gemma-4-e2b-qat

# 3. See the mesh wiring (no inference)
cargo run -p ptg-cli --bin ptg -- --dry-run \
    --column-pack examples/column-packs/abstraction-ladder-9.toml \
    --topology torus --torus-width 3 --torus-height 3 --columns 9

# 4. Run it live (9 columns, 3 abstraction levels, 3x3 torus)
cargo run -p ptg-cli --bin ptg -- \
    --vllm-url http://127.0.0.1:18136 --model gemma-4-e2b-qat \
    --column-pack examples/column-packs/abstraction-ladder-9.toml \
    --topology torus --torus-width 3 --torus-height 3 --columns 9 \
    --min-ticks 2 --ticks 3 --max-tokens 2048 --temperature 0 \
    --input "A satellite in decaying orbit shows rising thermal load, decreasing altitude, and intermittent guidance resets. What will happen next?"

Full walkthrough (prerequisites, 3-tier model setup, topologies, experiment recipes, de-risk ladder, common failures): docs/TUTORIAL.md.

What's in this release

  • 5-crate Rust workspace (panic-free, clippy-clean under -D warnings, 78 tests)
  • ptg CLI with --topology {default,ring,ring-bi,torus,fully-connected,small-world}, --column-pack, --min-ticks, --max-tokens, --temperature, --dry-run, --probe, multimodal --image-url
  • Pluggable topologies with degeneracy guardrails (Phase 3)
  • Column packs (TOML) for abstraction-level experiments + abstraction-ladder-9.toml example
  • scripts/start-gemma4-qat.sh portable launcher
  • ptg-bench + ptg-judge binaries for benchmark methodology and the A2 mechanism-activation judge
  • Docs: Specification, Architecture, Roadmap, Benchmarking methodology, Tutorial

Pilot research signal (not a benchmarked result)

From a single live QAT run on the 9-column abstraction ladder — captured as a research direction in the README:

  • Confidence stratifies by abstraction level: high-level columns reported mean confidence ~0.92 vs ~0.68 for low-level columns on a causal prompt.
  • Low-level drift without divergence: on ambiguous token input, low-level columns drifted toward token prediction, but lateral exchange pulled them toward the dominant physics framing.
  • Topology changes propagation: a niche "context" column's framing propagated fast on a 4-neighbor torus (0.98 conf) but stayed isolated on a 1-neighbor ring (0.85 conf).

Central tension worth investigating: lateral consensus is a homogenizing force — great when the dominant frame is correct, a failure mode when the dissenting view is the one that matters. See the README's Early research signal section.

Known limits

  • Gated model required. Gemma is gated; you must accept the license and authenticate with hf login before the launcher can download it.
  • llama-server required (built from llama.cpp). The launcher detects it but does not build it.
  • No integration LLM yet. The mesh emits per-column outputs; "voting" is confidence-threshold filtering only.
  • Embeddings blocked. The live server returns HTTP 501 on /v1/embeddings, so semantic-cosine convergence is deferred.
  • Fail-fast. One column emitting truncated/malformed JSON aborts the epoch (use the de-risk ladder in the tutorial).
  • Confidence is self-reported. An overconfident model converges prematurely; --min-ticks is the workaround.

Validation

  • cargo fmt --all --check
  • cargo clippy --all-features --all-targets -- -D warnings -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used ✅ (0 issues)
  • cargo check --workspace --all-targets
  • cargo test ✅ (78 passed, 0 failed)
  • End-to-end validated against a live gemma-4-e2b-qat server.

License

Dual-licensed under MIT OR Apache-2.0.