-
Notifications
You must be signed in to change notification settings - Fork 81
Closed
Description
Hi,
I'm facing a few issues with using Alembic and SQLAlchemy when creating halfvec indices.
1. The DLL generated by alembic given the index is invalid
My index:
Index(
"ix_chunks_vector",
func.cast(Chunk.vector, HALFVEC(settings.EMBEDDING_DIMS)).label("vector"),
postgresql_using="hnsw",
postgresql_with={"m": 16, "ef_construction": 64},
postgresql_ops={"vector": "halfvec_cosine_ops"},
)
Which translates to Alembic as:
op.create_index(
"ix_chunks_vector",
"chunks",
[sa.text("CAST(vector AS HALFVEC(3072))")],
unique=False,
postgresql_using="hnsw",
postgresql_with={"m": 16, "ef_construction": 64},
postgresql_ops={"vector": "halfvec_cosine_ops"},
)
Which in turn becomes:
CREATE INDEX ix_chunks_vector
ON chunks
USING hnsw (CAST(vector AS HALFVEC(3072)))
WITH (m = 16, ef_construction = 64)
As you can see, the operations are not inserted correctly into the query: hnsw (CAST(vector AS HALFVEC(3072)))
should be hnsw (CAST(vector AS HALFVEC(3072)) halfvec_cosine_ops)
2. Manually altering the DLL results in a mismatch next time I generate a migration
If I manually correct the query to be:
CREATE INDEX ix_chunks_vector
ON chunks
USING hnsw (CAST(vector AS HALFVEC(3072)) halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64)
The next time I run alembic autogenerate it will try to remove this index, and create one with the wrong DDL again:
op.drop_index(
"ix_chunks_vector",
table_name="chunks",
postgresql_with={"m": "16", "ef_construction": "64"},
postgresql_using="hnsw",
)
op.create_index(
"ix_chunks_vector",
"chunks",
[sa.text("CAST(vector AS HALFVEC(3072))")],
unique=False,
postgresql_using="hnsw",
postgresql_with={"m": 16, "ef_construction": 64},
postgresql_ops={"vector": "halfvec_cosine_ops"},
)
I'm struggling to find a way to define the index appropriately to avoid this.
Thanks.
For context, here is my ORM table definition:
class Chunk(Base):
__tablename__ = "chunks"
# I've tried setting index to True and False, no change
vector: Mapped[list[float]] = mapped_column(
Vector(settings.EMBEDDING_DIMS), unique=False, nullable=False
)
Index(
"ix_chunks_vector",
func.cast(Chunk.vector, HALFVEC(settings.EMBEDDING_DIMS)).label("vector"),
postgresql_using="hnsw",
postgresql_with={"m": 16, "ef_construction": 64},
postgresql_ops={"vector": "halfvec_cosine_ops"},
)
I am using:
[[package]]
name = "pgvector"
version = "0.3.6"
[[package]]
name = "alembic"
version = "1.14.1"
[[package]]
name = "sqlalchemy"
version = "2.0.36"
Metadata
Metadata
Assignees
Labels
No labels