v1.33.1.0 fix(learnings): token-OR query + task-shaped retrieval in 3 long skills by garrytan · Pull Request #1442 · garrytan/gstack

garrytan · 2026-05-12T01:59:35Z

Summary

For the last 30+ versions, every gstack skill loaded learnings the same way — gstack-learnings-search --limit 10 at the top of the skill, generic top-10, no query, no refresh. Short skills are fine; long skills (/investigate, /qa, /ship) drift away from whatever was loaded at minute zero. This PR fixes that for the three long skills, plus a root-cause fix to the binary's --query semantics that was silently broken.

Headline changes:

bin/gstack-learnings-search — --query now uses token-OR matching. Multi-word queries split on whitespace; a learning matches if ANY token appears in ANY of key/insight/files. The old whole-string substring behavior was a silent footgun — --query "debug investigation" returned nothing on real-world stores.
{{LEARNINGS_SEARCH:query=KEYWORD}} macro — parameterized resolver with shell-injection guard. Query value must match ^[A-Za-z0-9 _-]+$ at gen-skill-docs time, so $(), backticks, semicolons, quotes throw a build error.
/investigate, /qa, /ship — top-of-skill pull keyed to their theme (e.g. ship → release ship version changelog merge pr). Each gains a mid-flow refresh at the phase boundary (/investigate Phase 1→2, /qa triage→fix-loop, /ship pre-VERSION) with a concrete keyword recipe + worked examples that enforce alphanumeric-only keywords to sidestep shell quoting.

Other 13 skills are byte-identical in their generated SKILL.md output — backwards-compat verified via diff.

Numbers (real measurements on this project's learnings.jsonl, 35 entries)

Query	After (token-OR)	Δ
`"debug investigation root cause"`	5	+5
`"qa testing bug regression"`	2	+2
`"release ship version changelog"`	8	+8
`"skill resolver"`	12	+12

Without the binary fix, the rest of this PR would have shipped silent — the bash would run, the binary would exit 0 with no output, and the skill would render the same empty section it always rendered.

Pre-Landing Review

Eng Review (PLAN) ran on 2026-05-11: CLEAR, 2 issues found (D1 superseded by D4 root-cause binary fix, D2 covered by 5 test assertions), 0 critical gaps, 0 unresolved.

Codex Outside Voice (plan-time) caught the load-bearing substring-vs-token-OR semantics flaw that invalidated the original D1 reasoning. Adopted as D4 (root-cause binary fix).

Codex Adversarial (diff-time, this PR): caught a shell-injection vector in the resolver — queryArg was being interpolated into bash without escaping. Fixed by adding QUERY_SAFE_RE = /^[A-Za-z0-9 _-]+$/ validation at gen-skill-docs time. New test assertion covers $(whoami), backticks, ;, &, ", \, $x payloads. Recommendation upgraded from "block" to "ship".

Test Coverage

test/gstack-learnings-search.test.ts (NEW): 3 assertions on token-OR semantics — multi-token, single-token, no-query backwards compat.
test/gen-skill-docs.test.ts: 5 NEW assertions on the resolver — no-args, claude+query=foo bar (both bash branches), codex host variant, empty value falls through, shell-injection payloads throw.
Golden ship snapshot tests pass (claude + codex + factory hosts).
Full bun test suite passes.

Plan Completion

All 8 plan items DONE (verified against /Users/garrytan/.claude/plans/system-instruction-you-are-working-immutable-sunrise.md):

✅ Binary token-OR (Step 1) — commit a0539b5
✅ Resolver args parameter (Step 2) — commit 7a4f714
✅ 3 long skill queries (Step 3) — commit 513c966
✅ Mid-flow refresh checkpoints with concrete recipe (Step 4) — commit 513c966
✅ 5 resolver assertions (Step 5) — commit 7a4f714 (4 originally planned + 1 injection guard added at ship time)
✅ 3 binary assertions (Step 6) — commit a0539b5
✅ Regenerate SKILL.md (Step 7) — commit 513c966 + 7968290 (golden refresh)
✅ VERSION + CHANGELOG bump (Step 8) — commit accc864

Scope Drift

Clean. Plan intent: token-shaped learnings retrieval for long skills + binary fix. Delivered: exactly that, plus one defense-in-depth shell-injection guard added during ship-time Codex review.

For contributors

Contributed by @Fergtic (chronicle-write-up) who flagged the load-once + no-refresh pattern and the spend-per-success data point that motivated this work.

Test plan

bun test test/gstack-learnings-search.test.ts — 3 pass
bun test test/gen-skill-docs.test.ts -t "LEARNINGS_SEARCH" — 14 pass (5 LEARNINGS_SEARCH + 9 related)
bun test test/host-config.test.ts -t "golden" — 3 pass
Manual smoke: bin/gstack-learnings-search --query "skill resolver" --limit 5 returns 12 token-OR matches
Backwards-compat: 13 short skills' generated SKILL.md unchanged (verified via git diff after bun run gen:skill-docs --host all)

🤖 Generated with Claude Code

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Split the query on whitespace into tokens; a learning matches if ANY token appears as a substring in ANY of key/insight/files. Previously the whole query was a single substring, so multi-word queries like "debug investigation" only matched learnings whose insight contained that exact contiguous phrase, which is usually nothing. Whitespace-only query falls through to no-query (matches today's no-flag behavior). Single-word queries behave exactly as before. Adds test/gstack-learnings-search.test.ts: 3 assertions covering multi-token, single-token, and no-query backwards compat. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…uard The {{LEARNINGS_SEARCH}} macro now accepts a query=KEYWORD argument that gets interpolated as --query "<keyword>" into the generated bash. Empty value falls through to no-query (principle of least surprise: a stray {{LEARNINGS_SEARCH:query=}} placeholder gets today's behavior, not a build failure). Pattern reuses the parameterized-macro parsing from composition.ts. The 13 templates that don't pass a query stay byte-identical in their generated SKILL.md output. Shell-injection guard: the query value is whitelisted to ^[A-Za-z0-9 _-]+$ at gen-skill-docs time. Any \$(), backticks, semicolons, or quotes throw a loud build error instead of emitting executable bash. Static template queries are safe by inspection; this defends against future contributors writing dangerous values. Adds 5 assertions to test/gen-skill-docs.test.ts covering no-args, claude+query=foo bar on both cross-project and project-scoped branches, codex host variant, empty value semantics, and shell-injection payloads (\$(whoami), backticks, ;, &, ", \\, \$x) throwing build errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…/qa /ship The three long skills now pull learnings keyed to their theme at the top, then re-pull at phase boundaries as work shifts to new sub-tasks. Top-of-skill queries (5-6 token unions, token-OR matched): - investigate: "debug investigation root cause hypothesis bug fix" - qa: "qa testing bug regression flake fixture" - ship: "release ship version changelog merge pr" Mid-flow refresh blocks (concrete keyword recipe + worked examples): - investigate: between Phase 1 (hypothesis) and Phase 2 (analysis), keyed to the hypothesis noun. Examples: auth-cookie, session-expiry. - qa: between Phase 7 (triage) and Phase 8 (fix loop), keyed to the buggy component name. Examples: checkout-button, signup-form. - ship: just before Step 12 (VERSION bump), keyed to the headline feature. Examples: learnings-search, pacing, worktree-ship. Keyword recipe enforces alphanumeric+hyphen only (no quotes, slashes, dots, colons) so dynamic queries cannot inject shell metacharacters. The other 13 short-lived skills keep the bare {{LEARNINGS_SEARCH}} form. Backwards-compat verified via diff: their generated SKILL.md output is byte-identical to before this change. Golden ship fixtures regenerated to match the new ship/SKILL.md output. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Follow-up to 513c966 — the codex and factory host outputs needed regeneration too, missed in the initial commit because gen:skill-docs was only run for the claude host. Now matches gen:skill-docs --host all. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-12T02:12:54Z

E2E Evals: ✅ PASS

13/13 tests passed | $2.13 total cost | 12 parallel runners

Suite	Result	Status	Cost
e2e-qa-workflow	2/2	✅	$0.7
e2e-review	2/2	✅	$0.46
e2e-workflow	2/2	✅	$0.17
llm-judge	5/5	✅	$0.1
e2e-qa-workflow	2/2	✅	$0.7

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Brings in: - v1.33.2.0: setup guard against Conductor worktree pollution (garrytan#1446) - v1.33.1.0: learnings token-OR query + task-shaped retrieval (garrytan#1442) - v1.33.0.0: /sync-gbrain memory stage batch-import refactor (garrytan#1432) VERSION stays at 1.34.0.0 (no collision — queue-aware check confirms). CHANGELOG: our [1.34.0.0] entry at top, three new main entries below. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

garrytan and others added 5 commits May 11, 2026 18:58

chore: bump version and changelog (v1.33.1.0)

accc864

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

garrytan merged commit 1a4f0c9 into main May 12, 2026
23 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.33.1.0 fix(learnings): token-OR query + task-shaped retrieval in 3 long skills#1442

v1.33.1.0 fix(learnings): token-OR query + task-shaped retrieval in 3 long skills#1442
garrytan merged 5 commits into
mainfrom
garrytan/santiago-v1

garrytan commented May 12, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 12, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Numbers (real measurements on this project's learnings.jsonl, 35 entries)

Pre-Landing Review

Test Coverage

Plan Completion

Scope Drift

For contributors

Test plan

Uh oh!

github-actions Bot commented May 12, 2026

E2E Evals: ✅ PASS

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented May 12, 2026 •

edited by blacksmith-sh Bot

Loading