Skip to content

schema: halfvec HNSW indexes silently fail to create (no dimension spec) #95

@salishforge

Description

@salishforge

Problem

Fresh `schema/schema.sql` fails to create two HNSW indexes on halfvec columns:

  • line 96: `CREATE INDEX IF NOT EXISTS warm_tier_embedding_idx ON warm_tier USING hnsw (embedding halfvec_cosine_ops);`
  • line 435: `CREATE INDEX IF NOT EXISTS shared_memories_embedding_idx ON shared_memories USING hnsw (embedding halfvec_cosine_ops);`

Postgres errors with `ERROR: column does not have dimensions` because the halfvec columns are declared without a dimension spec (e.g., `embedding halfvec,` rather than `embedding halfvec(384),`).

Since schema.sql is applied via `psql -f` without `ON_ERROR_STOP`, psql continues after these errors. The tables exist but the HNSW indexes do not. Semantic search falls back to sequential scan — slow but functional.

Evidence

First surfaced in the v3.0.0-beta.3 release workflow (run 24591387057) because `test-load` is PR-gated and had never been exercised until release. Integration tests have been passing throughout because they run with `EMBEDDING_PROVIDER=none`, so the missing indexes are never hit.

Latent since v2.7.0 when halfvec was introduced.

Constraints

The embedding dimension is provider-specific (`Xenova/bge-small-en-v1.5` = 384, OpenAI `text-embedding-3-small` = 1536, etc.) — not known at schema time. Hardcoding a default would be wrong for operators using different providers.

Options to consider

  1. Move HNSW index creation out of schema.sql. Provide a post-install script or a management endpoint that ALTERs the halfvec column to the operator's configured dimension and creates the index. Document in DEPLOYMENT-SECURITY.md.
  2. Provide a `schema/hnsw-indexes.sql` companion script that operators run after choosing a dimension. `schema.sql` stays dimensionless; index creation is opt-in.
  3. Require `EMBEDDING_DIMENSIONS` at install time and template it into schema.sql via envsubst or similar. Adds a build step to installation.

Option 2 is lowest-friction for users who don't use embeddings (the default) while still giving a clean path for those who do.

Acceptance

  • HNSW indexes are created for operators who use embeddings.
  • No errors in CI when applying fresh `schema.sql`.
  • Documentation clearly explains the dimension requirement and how to add indexes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions