Skip to content

[ADR-043] Embedding model registry + vector engine migration (cluster-20)#374

Merged
ohdearquant merged 2 commits into
integration/v1-adr-alignmentfrom
show/adr-001-015-alignment/impl-c20
May 25, 2026
Merged

[ADR-043] Embedding model registry + vector engine migration (cluster-20)#374
ohdearquant merged 2 commits into
integration/v1-adr-alignmentfrom
show/adr-001-015-alignment/impl-c20

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Summary

  • F227 (CRIT): The MIGRATIONS array previously had no embedding registry. Adds V14 migration that creates the _embedding_models table, one-active-per-engine partial unique index (idx_embed_models_one_active), and engine+status composite index (idx_embed_models_engine_status) per ADR-043 §1. The migration also adds embedding_model_id to any existing regular (non-virtual) vec_<engine> tables discovered via sqlite_master at migration time.

  • F228 (CRIT): vec_<engine> tables were created without an embedding_model_id FK column. StorageBackend::vectors_for_namespace now ensures _embedding_models exists as a belt-and-suspenders fallback for callers that create vector stores without first calling run_migrations(). The embedding_model_id column is intentionally absent from the vec0 virtual table DDL — sqlite-vec rejects NULL TEXT metadata columns at insert time, so the column is added to existing vec0 tables during the startup backfill rebuild (ADR-043 §8), which is out of scope for this migration PR.

Files changed

  • crates/khive-db/src/migrations.rs: V14 migration + build_v14_embedding_model_registry_sql() + 2 new tests + all test assertions updated to expect V14 as latest version
  • crates/khive-db/src/backend.rs: vectors_for_namespace now creates _embedding_models registry table as a fallback when called without prior run_migrations()

Test plan

  • cargo test -p khive-db -p khive-storage — 104 tests pass
  • cargo test --workspace — all pass
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --all -- --check — clean
  • make ci (smoke test) — ALL VERB SMOKE TESTS PASSED

Dependencies satisfied

All cluster dependencies (c01, c03, c04, c05, c06, c08, c15) are on integration/v1-adr-alignment.

🤖 Generated with Claude Code

…r-20)

F227: The MIGRATIONS array previously stopped at V4 (dedupe_graph_edge_triples);
no embedding model registry existed. Adds V14 with the _embedding_models table,
one-active-per-engine partial unique index, and engine+status composite index
(ADR-043 §1). The migration also adds embedding_model_id to any existing regular
(non-virtual) vec_ tables discovered at migration time.

F228: vec_<engine> tables were created without an embedding_model_id FK column.
StorageBackend::vectors_for_namespace now ensures _embedding_models exists as a
belt-and-suspenders fallback for callers that create vector stores directly
without calling run_migrations(). New vec0 tables do not include embedding_model_id
in the vec0 DDL itself (sqlite-vec rejects NULL TEXT metadata columns at insert
time); the column is added to vec0 tables during the startup backfill rebuild
described in ADR-043 §8.

Tests: 2 new regression tests (migration_v14_creates_embedding_model_registry,
migration_v14_adds_embedding_model_id_to_existing_regular_vec_tables). All
existing tests updated to expect V14 as the latest migration version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CRIT-1: rebase onto origin/integration/v1-adr-alignment (adcf8c3) so
c11/c14/c15 work (pack-comm, pack-schedule, pack-template, HandlerDef
category, GTD lifecycle) is not silently reverted on merge.

CRIT-2: tighten V14 sqlite_master discovery filter to exclude sqlite-vec
internal shadow tables (vec_*_chunks, _rowids, _info, _vector_chunks00)
via explicit NOT LIKE suffix clauses with ESCAPE '\\'. Add regression test
`migration_v14_does_not_alter_sqlite_vec_shadow_tables` that creates the
four shadow table shapes and asserts V14 leaves them unaltered.

MAJ-1: fix misleading vec0 DDL comment — embedding_model_id is NOT
present at table creation; will be added by ADR-043 §8 backfill rebuild
(follow-up #385). Update comment in both backend.rs and migrations.rs.

MIN-1: extract EMBEDDING_MODELS_DDL pub const (single source of truth);
reference it from both build_v14_embedding_model_registry_sql and
StorageBackend::vectors_for_namespace to eliminate DDL drift risk.

MIN-2: add NOTE comment to V6 explaining the "reserved_adr043" name
predates the actual ADR-043 work that landed at V14 (cluster-20).

Follow-up: #385 tracks ADR-043 §8 steps 2-4 (backfill + rebuild + events).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ohdearquant ohdearquant force-pushed the show/adr-001-015-alignment/impl-c20 branch from 3acc64c to 5769eac Compare May 25, 2026 02:32
@ohdearquant ohdearquant merged commit bed9b46 into integration/v1-adr-alignment May 25, 2026
3 checks passed
@ohdearquant ohdearquant deleted the show/adr-001-015-alignment/impl-c20 branch May 25, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant