Show: v022-polish (overnight) — wire retrieval/recall/dual-embed + tests + plugin polish — SLICE BEFORE MERGE by ohdearquant · Pull Request #399 · ohdearquant/khive

ohdearquant · 2026-05-25T10:35:17Z

Show: v022-polish (overnight autonomous run, 2026-05-25 02:11 → 06:32 EDT)

Multi-play DAG executed against main. 9 plays merged into show/v022-polish/integration. Per Ocean's directive 2026-05-25 02:19: this PR is intended to be sliced into 5-20 smaller PRs and codex-reviewed before merging — do not merge as-is.

Plays merged

Play	Playbook	Status	Key delivery
recon	arch-discovery	✅ APPROVE	7-axis state-of-codebase report (`_recon/state.md`, 69 lines, file:line evidence)
cli-tests	test-coverage	✅ APPROVE-with-fixes (2 MIN)	79 new tests, 476 total, golden files for help text
plugin-polish	product-polish	✅ APPROVE	234/234 examples validated, 3 new SKILL.md (propose/review/withdraw), KG/GTD/Memory plugins bumped to 0.2.2
wire-retrieval	feature	✅ APPROVE (0.95)	`khive-pack-memory` consumes `khive-retrieval::fuse_search_results`; issue #309 fixed; +5 tests
wire-recall-pipeline	feature	✅ APPROVE (0.92)	`top_k`/`fusion_strategy`/`score_floor` knobs on recall; ADR-033 §6 documented; +5 tests
python-tests	test-coverage	⚠️ PARTIAL (timeout)	`tests/khive-contract/` pytest package, 63 tests across 11 files, golden/benchmark TODO
dual-embedding	feature	⚠️ PARTIAL→manual-fix (timeout)	Multi-model registry; V16 migration; recall scoped by model; 3 test rebaselines applied by orchestrator post-timeout
close-issues	resolve-issues	✅ APPROVE	0 closures (correct verdict — wiring issues are implementation work, not closure-ready); audit log committed
param-tuning	test-coverage	⚠️ PARTIAL	Grid search infra works (116 configs in 0.75s); synthetic eval set has ceiling (recall@10 = 0.93 for all configs); 3 config nudges applied

Aggregate stats

18 commits ahead of main (9 feature/chore + 9 integration merges)
+3300 LOC roughly (recon report + cli tests + python-tests skeleton + wire code + dual-embedding + tune infra)
Workspace tests: 66 test crates pass, 0 fail (verified post-integration)
Worktrees pruned after each merge — only adr-001-015-alignment-integration (on this PR's HEAD) + adr-001-015-alignment-impl-c16 (deferred c16) remain

Suggested slicing (for `/codex-pr-review` workflow)

PR 1: docs(recon) — recon report (1 file, 69 lines) — context for reviewers
PR 2: test(cli) — 11 cli-tests files (+963 lines)
PR 3: chore(marketplace) — plugin-polish (20 files)
PR 4: feat(retrieval-composer) — wire-retrieval (8 files, ADR-011)
PR 5: feat(recall-knobs) — wire-recall-pipeline (3 files, ADR-033)
PR 6: test(contract) — python-tests package (13 files)
PR 7: feat(embedding-registry) — dual-embedding (17 files, ADR-043 + V16 migration)
PR 8: tune(recall) — param-tuning grid + config nudges (6 files)
PR 9: chore(audit) — close-issues log (1 file)

Each slice maps to a single play's commits and can be codex-reviewed independently. Recommended sequential merge order = listed above (matches dependency chain).

Known gaps / follow-ups (not blockers)

python-tests skeleton needs golden snapshots + benchmark baselines (timed out before that phase)
param-tuning needs a harder eval corpus (embed-enabled, synonym queries) to actually ground defaults — current eval set has corpus ceiling
dual-embedding post-timeout test fixes applied by orchestrator (V16 migration version updates); no critic gate ran on those specific fixes, but cargo test --workspace is green
15 wiring-related GitHub issues remain open — recon's "wiring" category is more accurately "implementation follow-ups with cross-crate deps"; not closeable from this show
npm publish for v0.2.2 still blocked (NPM_TOKEN scope issue; out of show scope)

How to verify locally

cd /Users/lion/khive-work/worktrees/adr-001-015-alignment-integration
git pull origin show/v022-polish/integration
cd crates && cargo build --workspace && cargo test --workspace
cd ../cli && deno test --allow-all tests/
cd ../tests/khive-contract && uv run pytest -v

🤖 Generated overnight by orchestrate:show / dynamic /loop pacing

- Add cli/tests/helpers.ts with subprocess runner, golden file comparator, JSON shape validator, and makeTempRepo() fixture with valid KG structure - Add cli/tests/behavior/exit_code_test.ts (31 cases): exit 0/1 for all top-level flags, unknown commands, kg/pack/auth subcommands, in-repo ops - Add cli/tests/behavior/error_test.ts (13 cases): error messages, --help hints, not-implemented stubs, invalid NDJSON, out-of-repo commands - Add cli/tests/behavior/parse_test.ts (21 cases): flag parsing for stats (--json), validate (--format json, --quiet, --no-rules), doctor (--json), log (-n, --json), diff (--json, --name-only), pack stubs - Add cli/tests/contract/help_test.ts (17 cases): golden file comparisons for --help at top-level, kg, pack, auth groups; content assertions - Add cli/tests/contract/output_test.ts (8 cases): version semver check, kg stats --json shape, kg validate --format json shape, kg doctor --json - Add golden files: help_toplevel.txt, help_kg.txt, help_pack.txt, help_auth.txt - Add deno.json tasks: test:behavior and test:contract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

7-axis survey (orphan crates, ADR alignment, marketplace, open issues, embedding surface, test inventory, CLI) with file:line evidence per claim and prioritized backlog for downstream plays. Key findings: - khive-bm25/hnsw/fusion already consumed by khive-retrieval (mfst+src) - khive-retrieval itself is the unconsumed facade; downstream must wire it - lattice-embed 0.2.4 has both MiniLM + paraphrase as 384-d local models + dual-write/routing/migration primitives — dual embedding is a runtime exposure problem, not a lattice gap - khive-runtime has ONE OnceCell embedder; need a model registry - Memory recall subhandlers exist (recall.embed/candidates/fuse/rerank/score); composability is there but not all wired - ADR-043 schema ownership drift: spec says runtime, impl is in db

…verb surface Audited all three marketplace plugins against the actual pack handler registrations and fixed every stale example, count, and arg reference. KG plugin (14 files touched): - Fixed 10 P0 broken examples: positional query() → keyword, missing kind= on update/delete, placeholder batches, unsupported filter/status/tags - Added 3 new SKILL.md files for ADR-046 verbs: propose, review, withdraw - Updated all stale counts: 6→8 entity kinds, 13→15 edge relations, 11→14 verbs GTD plugin: bumped version, added start?/end? to assign docs, listed process and plan skills in README. Memory plugin: bumped version, documented parameter aliases (importance/salience, decay_factor/decay, source_id/source). All three plugin.json versions bumped to 0.2.2. New tooling: - marketplace/_validators/check_examples.py: stdlib-only validator (234 examples checked, 0 invalid) - marketplace/INSTALL.md: installation and verification guide Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Route pack-memory's fuse_candidates through khive_retrieval::fuse_search_results, making khive-retrieval a real consumed facade instead of an orphan crate. - Add khive-retrieval dep to khive-pack-memory/Cargo.toml - Replace direct fuse_with_strategy call with retrieval adapter (CandidateMeta side-map, HybridConfig builder, FusionStrategy conversion) - Fix issue #309: resolve --all-features compile failures in khive-retrieval (stale SqliteStore imports, missing NodeId/LinkStore imports) - Add 5 integration tests (3 fusion_surface, 2 pack-memory recall adapter) - RRF k=1 discriminator test proves strategy propagation (30x score gap) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…recall (ADR-033 §6) - Add three optional per-request fields to RecallParams: top_k (usize), fusion_strategy (string), and score_floor (f32) - fusion_strategy validated against {"rrf","weighted","union"}; clear error with valid values on invalid input - top_k overrides the result limit for a single call (capped at 100) - score_floor applied as a post-filter on the composite score after compute_score - Add parse_fusion_strategy_str helper; wire override into cfg.fuse_strategy before passing to fuse_candidates - Add 4 integration tests: default_identity, top_k_override, fusion_strategy_override (including rejection), score_floor - Document knobs in ADR-033 §6.1 with table, semantics, and example DSL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ADR-organized contract tests (63 collected, 11 files, 2433 LOC): - test_adr_001_entity_kind.py — 8 entity kinds CRUD - test_adr_002_edge_ontology.py — 15 edge relations + endpoint contracts - test_adr_014_curation.py — update/delete/merge semantics - test_adr_019_note_kind.py — 5 note kinds - test_adr_020_request_dsl.py — single + parallel + chain ops, error envelope - test_adr_023_verb_taxonomy.py — 15 product verb reachability - test_adr_027_single_tool_mcp.py — only `request` tool exposed - test_contract_behaviors.py — GQL property projection rules - test_manifest.py — verb coverage assertions - test_namespace_isolation.py — cross-namespace read/write boundaries - test_smoke.py — kg/gtd/memory end-to-end happy path Package structure (uv-managed): - pyproject.toml + pytest.ini + README.md - conftest.py with shared fixtures - khive_contract/ lib (client, schema, fixtures, benchmark) Run: `uv run pytest tests/khive-contract -v` PARTIAL: play timed out at 1h before golden snapshots + benchmark baselines could be captured. Skeleton + ADR-organized tests are real and runnable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sts)

Multi-model embedding support landed across the runtime + storage + memory stack. Workspace dual-embedding now reachable end-to-end: khive-runtime: - RuntimeConfig.additional_embedding_models: Vec<EmbeddingModel> - Replaces single OnceCell<embedder> with HashMap<model_name, embedder> - default_embedder_name() + embedder(name) public methods - KHIVE_ADDITIONAL_EMBEDDING_MODELS env-var parsing - configured_embedding_models() helper enumerates active set khive-db: - V16 migration: add `embedding_model TEXT NOT NULL DEFAULT '<default>'` column to vectors table with backfill + composite index - VectorStore.insert / search scoped by embedding_model khive-storage: - VectorRecord carries model tag - vector search params include model scope khive-pack-memory: - recall + remember accept optional embedding_model arg - validation: must be a registered model name kkernel: - engine list now returns real loaded models (no longer empty Vec) - engine migrate / drift-check still return not-implemented (#380/#385) Notes: - 16 files changed, +582/-138 lines - Tests rebaselined for V16 (failed_migration_rolls_back tests V17 now; store_ddl_then_event_migration_is_idempotent expects V16 head) - Workspace: cargo build + cargo test + clippy clean + fmt clean Lattice gap status: N/A — lattice-embed 0.2.4 already exposes both MiniLM + paraphrase as 384-d local models with EmbeddingRoutingConfig primitives. khive-runtime now uses these directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…try)

Adds closure log for the close-issues play. Result: 0 closed, 15 skipped. All 15 wiring-category candidates verified against actual code paths — none had sufficient commit SHA + file:line proof for a safe permanent close. Three issues (#397, #385, #380) are closest to resolved but still have explicitly-deferred implementation sections in source comments.

…k was structural, not closure-ready)

ohdearquant · 2026-05-25T10:50:20Z

Overnight run complete — CI green ✅

9 plays merged into integration
style(adr-033): deno fmt re-pad recall knob table cleanup committed after first CI run flagged the format drift
CI on PR HEAD 32f853c:
- CI (macos-latest): ✅ pass (3m11s)
- CI (ubuntu-latest): ✅ pass (3m15s)
- Docs lint: ✅ pass (6s)

Ready for slicing whenever you're up. PR remains draft per directive — slice into 5-20 smaller PRs, codex-review each, sequentially merge.

Full overnight summary at $HOME/khive-work/shows/v022-polish/_overnight_summary.md.

ohdearquant · 2026-05-25T15:26:41Z

Sliced — superseded by 9 smaller PRs

Per the slicing directive in this PR body, the integration has been sliced into 9 reviewable PRs:

Independent (mergeable in parallel, targets `main`)

docs(recon): state-of-codebase report for v022-polish show #400 — docs(recon): state-of-codebase report
test(cli): comprehensive behavior + contract test suite (79 cases) #401 — test(cli): 79 new behavior+contract cases
chore(marketplace): polish kg + gtd + memory plugins to match v0.2.2 verb surface #402 — chore(marketplace): kg/gtd/memory plugin polish (234/234 examples valid)
test(contract): proper Python pytest package at tests/khive-contract #403 — test(contract): pytest package at tests/khive-contract/
chore(issues): close wiring issues — audit log only (0 closures) #404 — chore(issues): 0-closure audit log

Stacked chain (sequential merge order)

feat(pack-memory): wire khive-retrieval as recall composer (ADR-011/021) #405 — feat(pack-memory): wire khive-retrieval composer (ADR-011/021) → targets main
feat(pack-memory): expose top_k/fusion_strategy/score_floor knobs on recall (ADR-033 §6) #406 — feat(pack-memory): recall knobs (ADR-033 §6) → targets feat(pack-memory): wire khive-retrieval as recall composer (ADR-011/021) #405
feat(embedding): dual-model registry (MiniLM + paraphrase) per ADR-043 #407 — feat(embedding): dual-model registry (ADR-043) → targets feat(pack-memory): expose top_k/fusion_strategy/score_floor knobs on recall (ADR-033 §6) #406
tune(recall): grid search infra + PARTIAL default changes #408 — tune(recall): grid search infra → targets feat(embedding): dual-model registry (MiniLM + paraphrase) per ADR-043 #407

When stacked PRs merge, GitHub will auto-rebase children onto main.

Review plan

Firing codex on each slice now. Will iterate to APPROVE per /codex-pr-review skill. This PR stays open as the historical reference and will close after the slice cycle completes.

ohdearquant · 2026-05-25T17:33:41Z

Show complete — all 9 slices merged

Closing as superseded by the 9 sliced PRs that landed:

Final state

PR	Slice	Status
#400	docs(recon)	✅ merged
#401	test(cli) 79 cases	✅ merged
#402	chore(marketplace)	✅ merged
#403	test(contract) pytest	✅ merged
#404	chore(issues) audit log	✅ merged
#405	feat(pack-memory) wire-retrieval	✅ merged (codex REJECT → fixed → APPROVE)
#406	feat(pack-memory) recall knobs	✅ merged (codex REQ-CHG → fixed)
#407	feat(embedding) dual-model	✅ merged (codex REJECT → ADR amend + 3 data-safety fixes → APPROVE round 2)
#408	tune(recall) param-tuning	✅ merged (codex REQ-CHG → defaults reverted + README)

Show stats

Codex contributions: 4 PRs reviewed across 2 rounds. Caught: broken doctest closing fix: khive-retrieval feature-gated test failures (persist, storage-adapters) #309, top_k cast overflow, ADR-043/V16 mismatch + 3 data-safety bugs, unjustified default changes against flat eval. Every finding led to a real fix landed in main.
Verification: 5 self-reviews on docs/test PRs, codex round 1 on 4 Rust PRs, codex round 2 on feat(embedding): dual-model registry (MiniLM + paraphrase) per ADR-043 #407 only. All findings addressed.
Local cargo test: 62 + 27 = 89 tests pass on each Rust slice; full workspace clean.
HC-7 gate: Ocean explicitly approved at three decision points (merge wave 1, codex round 2 strategy, merge-with-local-verify).

Follow-ups (tracked separately)

ADR-043 §1.1: per-model V16 backfill + sqlite-vec preserving rebuild (operator backup warning in place)
tests/khive-contract/ golden snapshots + benchmark baselines (skeleton in test(contract): proper Python pytest package at tests/khive-contract #403)
Harder eval corpus (embed-enabled, synonyms, partial matches) to ground recall defaults (tune(recall): grid search infra + PARTIAL default changes #408's REPORT.md)
15 remaining "wiring" GitHub issues — these are implementation work (EmbedderRegistry trait, ProposalApplyWorker, recall.rerank model wiring, ADR-043 startup backfill [c21 follow-up] engine migrate: implement EmbedMigrationWorker + actual queueing (ADR-043 D2-D6) #380/[c20 follow-up] ADR-043 §8 startup backfill (steps 2-4) #385/[adr-031] EmbedderRegistry implementation (integration codex MAJ-3) #397)

Thank you to codex for the substantial review work — caught real bugs that wouldn't have shown up otherwise.

ohdearquant and others added 19 commits May 25, 2026 02:25

Show v022-polish: integrate recon

9da5ad3

Show v022-polish: integrate cli-tests

0f6f5c5

Show v022-polish: integrate plugin-polish

9050bf4

Show v022-polish: integrate wire-retrieval (ADR-011/021)

941acad

Show v022-polish: integrate wire-recall-pipeline (ADR-033 §6 knobs)

13a529e

Show v022-polish: integrate python-tests (PARTIAL — skeleton + ADR te…

06047d3

…sts)

Show v022-polish: integrate dual-embedding (ADR-043 multi-model regis…

d66ec58

…try)

Show v022-polish: integrate close-issues (audit log, 0 closures — wor…

c5db199

…k was structural, not closure-ready)

tune(recall): grid search infra + PARTIAL default changes

2e28339

Show v022-polish: integrate param-tuning

93b4b8d

style(adr-033): deno fmt re-pad recall knob table (post-merge cleanup)

32f853c

ohdearquant closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show: v022-polish (overnight) — wire retrieval/recall/dual-embed + tests + plugin polish — SLICE BEFORE MERGE#399

Show: v022-polish (overnight) — wire retrieval/recall/dual-embed + tests + plugin polish — SLICE BEFORE MERGE#399
ohdearquant wants to merge 19 commits into
mainfrom
show/v022-polish/integration

ohdearquant commented May 25, 2026

Uh oh!

ohdearquant commented May 25, 2026

Uh oh!

ohdearquant commented May 25, 2026

Uh oh!

ohdearquant commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohdearquant commented May 25, 2026

Show: v022-polish (overnight autonomous run, 2026-05-25 02:11 → 06:32 EDT)

Plays merged

Aggregate stats

Suggested slicing (for /codex-pr-review workflow)

Known gaps / follow-ups (not blockers)

How to verify locally

Uh oh!

ohdearquant commented May 25, 2026

Overnight run complete — CI green ✅

Uh oh!

ohdearquant commented May 25, 2026

Sliced — superseded by 9 smaller PRs

Independent (mergeable in parallel, targets main)

Stacked chain (sequential merge order)

Review plan

Uh oh!

ohdearquant commented May 25, 2026

Show complete — all 9 slices merged

Final state

Show stats

Follow-ups (tracked separately)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Suggested slicing (for `/codex-pr-review` workflow)

Independent (mergeable in parallel, targets `main`)