Show: v022-polish (overnight) — wire retrieval/recall/dual-embed + tests + plugin polish — SLICE BEFORE MERGE#399
Closed
ohdearquant wants to merge 19 commits into
Closed
Show: v022-polish (overnight) — wire retrieval/recall/dual-embed + tests + plugin polish — SLICE BEFORE MERGE#399ohdearquant wants to merge 19 commits into
ohdearquant wants to merge 19 commits into
Conversation
- Add cli/tests/helpers.ts with subprocess runner, golden file comparator, JSON shape validator, and makeTempRepo() fixture with valid KG structure - Add cli/tests/behavior/exit_code_test.ts (31 cases): exit 0/1 for all top-level flags, unknown commands, kg/pack/auth subcommands, in-repo ops - Add cli/tests/behavior/error_test.ts (13 cases): error messages, --help hints, not-implemented stubs, invalid NDJSON, out-of-repo commands - Add cli/tests/behavior/parse_test.ts (21 cases): flag parsing for stats (--json), validate (--format json, --quiet, --no-rules), doctor (--json), log (-n, --json), diff (--json, --name-only), pack stubs - Add cli/tests/contract/help_test.ts (17 cases): golden file comparisons for --help at top-level, kg, pack, auth groups; content assertions - Add cli/tests/contract/output_test.ts (8 cases): version semver check, kg stats --json shape, kg validate --format json shape, kg doctor --json - Add golden files: help_toplevel.txt, help_kg.txt, help_pack.txt, help_auth.txt - Add deno.json tasks: test:behavior and test:contract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7-axis survey (orphan crates, ADR alignment, marketplace, open issues, embedding surface, test inventory, CLI) with file:line evidence per claim and prioritized backlog for downstream plays. Key findings: - khive-bm25/hnsw/fusion already consumed by khive-retrieval (mfst+src) - khive-retrieval itself is the unconsumed facade; downstream must wire it - lattice-embed 0.2.4 has both MiniLM + paraphrase as 384-d local models + dual-write/routing/migration primitives — dual embedding is a runtime exposure problem, not a lattice gap - khive-runtime has ONE OnceCell embedder; need a model registry - Memory recall subhandlers exist (recall.embed/candidates/fuse/rerank/score); composability is there but not all wired - ADR-043 schema ownership drift: spec says runtime, impl is in db
…verb surface Audited all three marketplace plugins against the actual pack handler registrations and fixed every stale example, count, and arg reference. KG plugin (14 files touched): - Fixed 10 P0 broken examples: positional query() → keyword, missing kind= on update/delete, placeholder batches, unsupported filter/status/tags - Added 3 new SKILL.md files for ADR-046 verbs: propose, review, withdraw - Updated all stale counts: 6→8 entity kinds, 13→15 edge relations, 11→14 verbs GTD plugin: bumped version, added start?/end? to assign docs, listed process and plan skills in README. Memory plugin: bumped version, documented parameter aliases (importance/salience, decay_factor/decay, source_id/source). All three plugin.json versions bumped to 0.2.2. New tooling: - marketplace/_validators/check_examples.py: stdlib-only validator (234 examples checked, 0 invalid) - marketplace/INSTALL.md: installation and verification guide Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route pack-memory's fuse_candidates through khive_retrieval::fuse_search_results, making khive-retrieval a real consumed facade instead of an orphan crate. - Add khive-retrieval dep to khive-pack-memory/Cargo.toml - Replace direct fuse_with_strategy call with retrieval adapter (CandidateMeta side-map, HybridConfig builder, FusionStrategy conversion) - Fix issue #309: resolve --all-features compile failures in khive-retrieval (stale SqliteStore imports, missing NodeId/LinkStore imports) - Add 5 integration tests (3 fusion_surface, 2 pack-memory recall adapter) - RRF k=1 discriminator test proves strategy propagation (30x score gap) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…recall (ADR-033 §6)
- Add three optional per-request fields to RecallParams: top_k (usize),
fusion_strategy (string), and score_floor (f32)
- fusion_strategy validated against {"rrf","weighted","union"}; clear error
with valid values on invalid input
- top_k overrides the result limit for a single call (capped at 100)
- score_floor applied as a post-filter on the composite score after compute_score
- Add parse_fusion_strategy_str helper; wire override into cfg.fuse_strategy
before passing to fuse_candidates
- Add 4 integration tests: default_identity, top_k_override,
fusion_strategy_override (including rejection), score_floor
- Document knobs in ADR-033 §6.1 with table, semantics, and example DSL
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ADR-organized contract tests (63 collected, 11 files, 2433 LOC): - test_adr_001_entity_kind.py — 8 entity kinds CRUD - test_adr_002_edge_ontology.py — 15 edge relations + endpoint contracts - test_adr_014_curation.py — update/delete/merge semantics - test_adr_019_note_kind.py — 5 note kinds - test_adr_020_request_dsl.py — single + parallel + chain ops, error envelope - test_adr_023_verb_taxonomy.py — 15 product verb reachability - test_adr_027_single_tool_mcp.py — only `request` tool exposed - test_contract_behaviors.py — GQL property projection rules - test_manifest.py — verb coverage assertions - test_namespace_isolation.py — cross-namespace read/write boundaries - test_smoke.py — kg/gtd/memory end-to-end happy path Package structure (uv-managed): - pyproject.toml + pytest.ini + README.md - conftest.py with shared fixtures - khive_contract/ lib (client, schema, fixtures, benchmark) Run: `uv run pytest tests/khive-contract -v` PARTIAL: play timed out at 1h before golden snapshots + benchmark baselines could be captured. Skeleton + ADR-organized tests are real and runnable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Multi-model embedding support landed across the runtime + storage + memory
stack. Workspace dual-embedding now reachable end-to-end:
khive-runtime:
- RuntimeConfig.additional_embedding_models: Vec<EmbeddingModel>
- Replaces single OnceCell<embedder> with HashMap<model_name, embedder>
- default_embedder_name() + embedder(name) public methods
- KHIVE_ADDITIONAL_EMBEDDING_MODELS env-var parsing
- configured_embedding_models() helper enumerates active set
khive-db:
- V16 migration: add `embedding_model TEXT NOT NULL DEFAULT '<default>'`
column to vectors table with backfill + composite index
- VectorStore.insert / search scoped by embedding_model
khive-storage:
- VectorRecord carries model tag
- vector search params include model scope
khive-pack-memory:
- recall + remember accept optional embedding_model arg
- validation: must be a registered model name
kkernel:
- engine list now returns real loaded models (no longer empty Vec)
- engine migrate / drift-check still return not-implemented (#380/#385)
Notes:
- 16 files changed, +582/-138 lines
- Tests rebaselined for V16 (failed_migration_rolls_back tests V17 now;
store_ddl_then_event_migration_is_idempotent expects V16 head)
- Workspace: cargo build + cargo test + clippy clean + fmt clean
Lattice gap status: N/A — lattice-embed 0.2.4 already exposes both
MiniLM + paraphrase as 384-d local models with EmbeddingRoutingConfig
primitives. khive-runtime now uses these directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds closure log for the close-issues play. Result: 0 closed, 15 skipped. All 15 wiring-category candidates verified against actual code paths — none had sufficient commit SHA + file:line proof for a safe permanent close. Three issues (#397, #385, #380) are closest to resolved but still have explicitly-deferred implementation sections in source comments.
…k was structural, not closure-ready)
Owner
Author
Overnight run complete — CI green ✅
Ready for slicing whenever you're up. PR remains draft per directive — slice into 5-20 smaller PRs, codex-review each, sequentially merge. Full overnight summary at |
This was referenced May 25, 2026
Merged
Owner
Author
Sliced — superseded by 9 smaller PRsPer the slicing directive in this PR body, the integration has been sliced into 9 reviewable PRs: Independent (mergeable in parallel, targets
|
Owner
Author
Show complete — all 9 slices mergedClosing as superseded by the 9 sliced PRs that landed: Final state
Show stats
Follow-ups (tracked separately)
Thank you to codex for the substantial review work — caught real bugs that wouldn't have shown up otherwise. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Show: v022-polish (overnight autonomous run, 2026-05-25 02:11 → 06:32 EDT)
Multi-play DAG executed against
main. 9 plays merged intoshow/v022-polish/integration. Per Ocean's directive 2026-05-25 02:19: this PR is intended to be sliced into 5-20 smaller PRs and codex-reviewed before merging — do not merge as-is.Plays merged
_recon/state.md, 69 lines, file:line evidence)khive-pack-memoryconsumeskhive-retrieval::fuse_search_results; issue #309 fixed; +5 teststop_k/fusion_strategy/score_floorknobs on recall; ADR-033 §6 documented; +5 teststests/khive-contract/pytest package, 63 tests across 11 files, golden/benchmark TODOAggregate stats
adr-001-015-alignment-integration(on this PR's HEAD) +adr-001-015-alignment-impl-c16(deferred c16) remainSuggested slicing (for
/codex-pr-reviewworkflow)docs(recon)— recon report (1 file, 69 lines) — context for reviewerstest(cli)— 11 cli-tests files (+963 lines)chore(marketplace)— plugin-polish (20 files)feat(retrieval-composer)— wire-retrieval (8 files, ADR-011)feat(recall-knobs)— wire-recall-pipeline (3 files, ADR-033)test(contract)— python-tests package (13 files)feat(embedding-registry)— dual-embedding (17 files, ADR-043 + V16 migration)tune(recall)— param-tuning grid + config nudges (6 files)chore(audit)— close-issues log (1 file)Each slice maps to a single play's commits and can be codex-reviewed independently. Recommended sequential merge order = listed above (matches dependency chain).
Known gaps / follow-ups (not blockers)
python-testsskeleton needs golden snapshots + benchmark baselines (timed out before that phase)param-tuningneeds a harder eval corpus (embed-enabled, synonym queries) to actually ground defaults — current eval set has corpus ceilingdual-embeddingpost-timeout test fixes applied by orchestrator (V16 migration version updates); no critic gate ran on those specific fixes, butcargo test --workspaceis greenHow to verify locally
🤖 Generated overnight by orchestrate:show / dynamic /loop pacing