[ADR-043] Embedding model registry + vector engine migration (cluster-20)#374
Merged
ohdearquant merged 2 commits intoMay 25, 2026
Merged
Conversation
…r-20) F227: The MIGRATIONS array previously stopped at V4 (dedupe_graph_edge_triples); no embedding model registry existed. Adds V14 with the _embedding_models table, one-active-per-engine partial unique index, and engine+status composite index (ADR-043 §1). The migration also adds embedding_model_id to any existing regular (non-virtual) vec_ tables discovered at migration time. F228: vec_<engine> tables were created without an embedding_model_id FK column. StorageBackend::vectors_for_namespace now ensures _embedding_models exists as a belt-and-suspenders fallback for callers that create vector stores directly without calling run_migrations(). New vec0 tables do not include embedding_model_id in the vec0 DDL itself (sqlite-vec rejects NULL TEXT metadata columns at insert time); the column is added to vec0 tables during the startup backfill rebuild described in ADR-043 §8. Tests: 2 new regression tests (migration_v14_creates_embedding_model_registry, migration_v14_adds_embedding_model_id_to_existing_regular_vec_tables). All existing tests updated to expect V14 as the latest migration version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CRIT-1: rebase onto origin/integration/v1-adr-alignment (adcf8c3) so c11/c14/c15 work (pack-comm, pack-schedule, pack-template, HandlerDef category, GTD lifecycle) is not silently reverted on merge. CRIT-2: tighten V14 sqlite_master discovery filter to exclude sqlite-vec internal shadow tables (vec_*_chunks, _rowids, _info, _vector_chunks00) via explicit NOT LIKE suffix clauses with ESCAPE '\\'. Add regression test `migration_v14_does_not_alter_sqlite_vec_shadow_tables` that creates the four shadow table shapes and asserts V14 leaves them unaltered. MAJ-1: fix misleading vec0 DDL comment — embedding_model_id is NOT present at table creation; will be added by ADR-043 §8 backfill rebuild (follow-up #385). Update comment in both backend.rs and migrations.rs. MIN-1: extract EMBEDDING_MODELS_DDL pub const (single source of truth); reference it from both build_v14_embedding_model_registry_sql and StorageBackend::vectors_for_namespace to eliminate DDL drift risk. MIN-2: add NOTE comment to V6 explaining the "reserved_adr043" name predates the actual ADR-043 work that landed at V14 (cluster-20). Follow-up: #385 tracks ADR-043 §8 steps 2-4 (backfill + rebuild + events). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3acc64c to
5769eac
Compare
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
F227 (CRIT): The
MIGRATIONSarray previously had no embedding registry. Adds V14 migration that creates the_embedding_modelstable, one-active-per-engine partial unique index (idx_embed_models_one_active), and engine+status composite index (idx_embed_models_engine_status) per ADR-043 §1. The migration also addsembedding_model_idto any existing regular (non-virtual)vec_<engine>tables discovered viasqlite_masterat migration time.F228 (CRIT):
vec_<engine>tables were created without anembedding_model_idFK column.StorageBackend::vectors_for_namespacenow ensures_embedding_modelsexists as a belt-and-suspenders fallback for callers that create vector stores without first callingrun_migrations(). Theembedding_model_idcolumn is intentionally absent from the vec0 virtual table DDL — sqlite-vec rejectsNULLTEXT metadata columns at insert time, so the column is added to existing vec0 tables during the startup backfill rebuild (ADR-043 §8), which is out of scope for this migration PR.Files changed
crates/khive-db/src/migrations.rs: V14 migration +build_v14_embedding_model_registry_sql()+ 2 new tests + all test assertions updated to expect V14 as latest versioncrates/khive-db/src/backend.rs:vectors_for_namespacenow creates_embedding_modelsregistry table as a fallback when called without priorrun_migrations()Test plan
cargo test -p khive-db -p khive-storage— 104 tests passcargo test --workspace— all passcargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --all -- --check— cleanmake ci(smoke test) — ALL VERB SMOKE TESTS PASSEDDependencies satisfied
All cluster dependencies (c01, c03, c04, c05, c06, c08, c15) are on
integration/v1-adr-alignment.🤖 Generated with Claude Code