Skip to content

v1.26.5.0 fix wave: gbrain ingest writer (hybrid frontmatter) + gbrain-valid source ids#1344

Merged
garrytan merged 9 commits intomainfrom
garrytan/fix-wave-gbrain-ingest
May 7, 2026
Merged

v1.26.5.0 fix wave: gbrain ingest writer (hybrid frontmatter) + gbrain-valid source ids#1344
garrytan merged 9 commits intomainfrom
garrytan/fix-wave-gbrain-ingest

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented May 6, 2026

Summary

Two unrelated bugs blocked everyone who ran /setup-gbrain or /sync-gbrain since v1.26.0.0 shipped. Both ride together in this fix wave because the failure shape is the same — the headline v1.26 features ended setup green but did nothing.

  1. gstack-memory-ingest called a non-existent gbrain CLI verb (put_page instead of put <slug>), failing 153/153 transcripts on every clean install. The put_page name is the MCP tool / internal op name; the CLI surface is gbrain put <slug> with content via stdin and metadata in YAML frontmatter. Filed in v1.26.0+ memory-ingest calls non-existent gbrain put_page CLI command (153/153 fail) #1336, gstack-memory-ingest: CLI call format incompatible with every released gbrain (+ note on /gstack-upgrade pin staleness) #1305, gstack 1.26.0.0 memory-ingest fails: gbrain put_page not exposed in gbrain v0.18.2 CLI #1299.
  2. gstack-gbrain-sync derived source IDs that violated gbrain's validator ([a-z0-9-]{1,32}). Every github.com/<org>/<repo> produced gstack-code-github.com-<org>-<repo> (38–60 chars, contained .). Filed in gstack-gbrain-sync.ts auto-generates invalid source IDs from github remote URLs (>32 chars, contains dots) #1331, /sync-gbrain: generated source-id exceeds gbrain v0.20+ validation (32-char limit, no dots) #1323, gstack-gbrain-sync: cwd source-id derivation produces IDs that violate gbrain validation #1322, deriveCodeSourceId in gstack-gbrain-sync.ts produces invalid gbrain source IDs for github.com remotes #1320.

After this wave: clean-install transcripts land in gbrain with title/type/tags intact and any github-hosted repo registers a code source on the first try.

Smoke evidence (real gbrain v0.25.1)

$ git checkout origin/main; bun bin/gstack-gbrain-sync.ts --code-only --dry-run
SKIP code  would: gbrain sources add gstack-code-github.com-garrytan-gstack ...
                                                  ^^^^^^^^^^^                  (38 chars, contains '.', INVALID)

$ git checkout fix-wave-branch; bun bin/gstack-gbrain-sync.ts --code-only --dry-run
SKIP code  would: gbrain sources add gstack-code-garrytan-gstack ...
                                                                   (27 chars, valid per gbrain regex)

Memory-ingest writer correctness verified by strengthened regression tests in test/gstack-memory-ingest.test.ts — these stand up a real gbrain CLI shim on PATH, run the actual --bulk ingest pipeline against a planted Claude Code session, capture put stdin, and assert title/type/tags arrive in the frontmatter (not just agent:). The original PR #1341 tests would have passed even with title/type/tags missing — the strengthening closes the test-quality gap that let v1.26.0.0 ship green-but-broken.

Process notes

This started as a "merge two PRs" wave. The original plan was to merge #1341 (Alex Medina) as the cluster-1 base because it had better tests and a fail-fast probe, then cherry-pick the production hardening (timeout, maxBuffer, stderr surface) from #1328 (Joshua Smith). Codex outside-voice plan review caught a real ship-blocker by inspecting the actual code: buildTranscriptPage writes frontmatter without title/type/tags, and PR #1341's "wrap only when frontmatter is absent" branch silently dropped those fields on every transcript page. The plan pivoted to #1328 as the base, with a hybrid writer commit on top to handle the artifact-page case (which #1328 alone misses), plus #1341's tests strengthened to actually assert title/type/tags arrive. Three additional bugs surfaced during the strengthen pass (PR #1328's inject branch searching for \n---\n when buildTranscriptPage emits \n---<body> directly; constrainSourceId returning gstack-code- for empty-slug input). All caught and fixed.

Closes

Supersedes the originating PRs:

All three authors credited via Co-Authored-By: in the relevant commits and via Contributed by @<handle> in the CHANGELOG.

Test plan

Follow-up TODOs filed


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

smithjoshua and others added 9 commits May 5, 2026 07:03
`put_page` is the MCP tool name, not a CLI subcommand. The actual
gbrain verb is `put <slug>` with content via stdin and tags in YAML
frontmatter. Every transcript / memory ingest fails today on clean
installs.

Switch to the right verb and inject title/type/tags into the
frontmatter that buildTranscriptPage / buildArtifactPage already
produce.

Bundled in the same function:

- timeout: 30s → 60s. Auto-link reconciliation hits 30s once the
  brain has a few hundred pages.
- maxBuffer: 1MB → 16MB. Without it Node truncates gbrain's stderr
  and callers see only `Command failed:` with no detail.
- Surface stderr/stdout in the returned error instead of the bare
  exception.

Verified: bun test test/gstack-memory-ingest.test.ts -> 15/15 pass.
bun test on the three test files touching this path -> 362/362.
…s or long names

`deriveCodeSourceId` previously concatenated the canonicalized remote with only `/`
and whitespace stripped, leaving dots from hostnames (`github.com`) and no length
cap. gbrain rejects any source id containing characters outside [a-z0-9-] or longer
than 32 chars, so `github.com/<org>/<repo>` produced `gstack-code-github.com-<org>-<repo>`
(40 chars, plus dots) and registration failed:

    code  source registration failed: Invalid source id
          "gstack-code-github.com-radubach-platform". Must be 1-32 lowercase alnum
          chars with optional interior hyphens.

Fix:
- Drop the host segment (`github.com` is the same for nearly every user and just
  consumes the 32-char budget). Use only the last two path segments (org-repo).
- Sanitize any remaining non-alnum to hyphens, then collapse and trim.
- For genuinely long org/repo names that still exceed the budget, keep the tail
  (most distinctive end of the slug) and append a 6-char sha1 hash for collision
  resistance.

Adds a regression test that spawns the CLI in temp git repos with controlled
remotes (dot in hostname, SCP-style, multi-dot host, long names forcing
hash-truncation) and asserts every derived id is ≤32 chars and matches the
gbrain validator regex.
…t frontmatter inject + 60s timeout + 16MB buffer + stderr surface)
…lability probe

PR #1328 (merged in the prior commit) correctly injects title/type/tags
into the YAML frontmatter that buildTranscriptPage already prepends. But
buildArtifactPage emits raw markdown without frontmatter, so design-docs,
learnings, and builder-profile-entries were landing in gbrain with empty
title/type/tags. Add the no-frontmatter wrap branch so artifact pages get
the same metadata the inject branch provides for transcripts.

Also bring in gbrainAvailable()'s --help probe (originally proposed in
PR #1341 by Alex Medina), with the regex tightened from /(^|\s)put(\s|$)/m
to /^\s+put\s/m. Anchoring on the indented subcommand format gbrain's
help actually uses keeps the probe from matching "put" appearing as
prose in help text, while still failing fast with one clean error if a
future gbrain renames or removes the put subcommand.

Updates the V1.5 NOTE doc block at the top of the file to describe the
current put-via-stdin shape rather than the legacy put_page flag form.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>
…malformed-close frontmatter

Imports the shim-based regression tests from PR #1341 (Alex Medina) and
strengthens them to assert title, type, and tags actually arrive in put
stdin — not just `agent: claude-code`. Asserting the metadata fields
matches the regression class that's caused this fix wave: writers can
"succeed" while metadata is silently lost. The original PR #1341 tests
would have passed even with title/type/tags missing.

Strengthening the test surfaced a deeper issue. buildTranscriptPage joins
frontmatter array elements with "\n" and does not append a trailing
newline, so the close fence is "\n---<content>" directly, not "\n---\n".
PR #1328's inject branch searched for "\n---\n" and never matched —
which means even with PR #1328 alone, transcript pages were landing in
gbrain with no title/type/tags. Two-line fix: search for "\n---" only,
since the inject lands before the close fence regardless of what
follows it.

Also imports PR #1341's V1.5 NOTE doc-block update and the section
comment refresh so the prose stays accurate against the new writer
shape.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>
…dd no-origin and basename-empty regression tests

PR #1330 (merged in the prior commit) addressed the dot-in-host and
length-overflow cases for source-id derivation, but constrainSourceId
silently returned "${prefix}-" when the input sanitized to an empty
slug — invalid per gbrain's `^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$`
validator on the trailing hyphen. Adds an explicit empty-slug branch
that falls back to a sha1-prefixed id ("gstack-code-<6hex>") so the
output stays gbrain-valid for every input shape.

Two new regression tests cover the corners PR #1330's coverage left
exposed:
- no-origin fallback: a cwd repo with no `origin` remote configured
  must still derive a valid id from the basename.
- basename-sanitizes-to-empty: a repo whose path basename is all
  non-alnum (e.g. "___") must produce the hash-only fallback, not
  an invalid trailing-hyphen id.

Both run the CLI inside temp git repos for genuine end-to-end
coverage (matches the pattern PR #1330 established for its own four
remote-shape cases).

Co-Authored-By: Richard Dubach <radubach@gmail.com>
PATCH bump. Three bug fixes (memory-ingest put_page CLI verb mismatch,
hybrid frontmatter writer for transcripts AND artifacts, gbrain-valid
source-id derivation for github-hosted repos), no new user capability.

CHANGELOG release-summary leads with what users can now do (clean-
install transcripts populate the brain, github-hosted repos register
code sources) and tabulates before/after numbers from real gbrain
v0.25.1 smoke output. Itemized changes credit @smithjoshua, @AZ-1224,
and @radubach for the originating PRs plus the additional hybrid
branch + strengthened tests added on top per Codex plan-review.
…ost-collision) follow-ups

Two follow-ups surfaced during the v1.26.5.0 fix-wave plan review.

P2 — Issue #1305 part 2: bin/gstack-gbrain-install pins gbrain to
v0.18.2 (commit 08b3698) but doesn't move when gstack ships features
that depend on newer gbrain ops or schema. Fresh /setup-gbrain on
v1.26.x lands users on schema 24 with v1.26 features expecting 32+.
Captured for a future fix-wave.

P3 — Codex P1.3 from the v1.26.5.0 plan review: deriveCodeSourceId
drops the host segment to fit gbrain's 32-char source-id budget,
which means github.com/acme/foo and gitlab.com/acme/foo collapse to
the same source id. Real but rare; PR #1330 author explicitly
considered this and chose budget over cross-host uniqueness. Captured
as a long-tail concern.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

E2E Evals: ✅ PASS

0/0 tests passed | $0 total cost | 12 parallel runners

Suite Result Status Cost

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

@garrytan garrytan merged commit c7aefc1 into main May 7, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment