Skip to content

Help: MaxSim/Colbert Calculation  #102

@Jonathan-Adly

Description

@Jonathan-Adly

First of all, really appreciate your work and this library. It's a tremendous help.

Following this example here: https://github.com/pgvector/pgvector-python/blob/master/examples/colbert/exact.py

I modified it a bit - and had embeddings be its own table, instead of an Array. Here are the django models.

class Page(models.Model):
    document = models.ForeignKey(
        Document, on_delete=models.CASCADE, related_name="pages"
    )
    page_number = models.IntegerField()
    content = models.TextField(blank=True)
    img_base64 = models.TextField(blank=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def __str__(self) -> str:
        return f"{self.document.name} - Page {self.page_number}"


class PageEmbedding(models.Model):
    page = models.ForeignKey(Page, on_delete=models.CASCADE, related_name="embeddings")
    embedding = HalfVectorField(dimensions=128)
  

I am wondering if you can give you a little guidance and confirmation that our implementation is correct?


from django.db import migrations


class Migration(migrations.Migration):

    operations = [
        migrations.RunSQL(
            """
            CREATE OR REPLACE FUNCTION max_sim(document halfvec[], query halfvec[]) RETURNS double precision AS $$
                WITH queries AS (
                    SELECT row_number() OVER () AS query_number, * FROM (SELECT unnest(query) AS query)
                ),
                documents AS (
                    SELECT unnest(document) AS document
                ),
                similarities AS (
                    SELECT query_number, 1 - (document <=> query) AS similarity FROM queries CROSS JOIN documents
                ),
                max_similarities AS (
                    SELECT MAX(similarity) AS max_similarity FROM similarities GROUP BY query_number
                )
                SELECT SUM(max_similarity) FROM max_similarities
            $$ LANGUAGE SQL
            """
        )
    ]

Really appreciate it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions