-
Notifications
You must be signed in to change notification settings - Fork 81
Closed
Description
First of all, really appreciate your work and this library. It's a tremendous help.
Following this example here: https://github.com/pgvector/pgvector-python/blob/master/examples/colbert/exact.py
I modified it a bit - and had embeddings be its own table, instead of an Array. Here are the django models.
class Page(models.Model):
document = models.ForeignKey(
Document, on_delete=models.CASCADE, related_name="pages"
)
page_number = models.IntegerField()
content = models.TextField(blank=True)
img_base64 = models.TextField(blank=True)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
def __str__(self) -> str:
return f"{self.document.name} - Page {self.page_number}"
class PageEmbedding(models.Model):
page = models.ForeignKey(Page, on_delete=models.CASCADE, related_name="embeddings")
embedding = HalfVectorField(dimensions=128)
I am wondering if you can give you a little guidance and confirmation that our implementation is correct?
from django.db import migrations
class Migration(migrations.Migration):
operations = [
migrations.RunSQL(
"""
CREATE OR REPLACE FUNCTION max_sim(document halfvec[], query halfvec[]) RETURNS double precision AS $$
WITH queries AS (
SELECT row_number() OVER () AS query_number, * FROM (SELECT unnest(query) AS query)
),
documents AS (
SELECT unnest(document) AS document
),
similarities AS (
SELECT query_number, 1 - (document <=> query) AS similarity FROM queries CROSS JOIN documents
),
max_similarities AS (
SELECT MAX(similarity) AS max_similarity FROM similarities GROUP BY query_number
)
SELECT SUM(max_similarity) FROM max_similarities
$$ LANGUAGE SQL
"""
)
]
Really appreciate it!
Metadata
Metadata
Assignees
Labels
No labels