Skip to content

Conversation

@Anush008
Copy link
Contributor

@Anush008 Anush008 commented May 6, 2024

Description

  • This refactor allows users to query Qdrant with any implementation for BaseVectorizer, the default being the new FastEmbedVectorizer.

  • The field containing the document content in the Qdrant payload can be specified. It doesn't necessarily have to be "document" anymore.

  • Qdrant supports multiple named vectors and this update allows specifying one for retrieval. Defaults to the first found vector.

Currently, the implementation relies on qdrant_client's query_batch() abstraction, which uses FastEmbed internally.

Breaking?

The default values to the new params ensure backward compatibility.

@Anush008 Anush008 force-pushed the refactor-qdrant branch from ac5fad4 to f8f7203 Compare May 6, 2024 07:30
@Ankush-Chander
Copy link

Hi @Anush008
Will this refactor also support SparseVectors?

@Anush008
Copy link
Contributor Author

Anush008 commented May 6, 2024

Not this one.

If we can have a sparse vectors providers interface in DSPy, in BaseSentenceVectorizer maybe, we could support it in QdrantRM.

@Ankush-Chander
Copy link

Ankush-Chander commented May 6, 2024

I was thinking in these lines. Please correct me if i am missing something.

qdrant_client.search expects same parameter query_vector in dense as well as sparse search. It can be

query_vector = models.NamedSparseVector(
        name="text",
        vector=models.SparseVector(
            indices=[1, 7],
            values=[2.0, 1.0],
        ),
    ),

for sparse embedding.

and

query_vector=[0.2, 0.1, 0.9, 0.7],

for dense embedding.

So if we can generalize the vectorizer function and let user return the vector consistent with collection, it should work for dense, sparse as well as sparse-embed alike without DSPy intervention.

@Anush008
Copy link
Contributor Author

Anush008 commented May 6, 2024

vectorizer is of type BaseSentenceVectorizer, which has abstract methods to generate dense vectors.

If it can support sparse vectors, we can have them here.

@Anush008
Copy link
Contributor Author

Hey @arnavsinghvi11. Just bumping this PR. Please take a look when possible.

@arnavsinghvi11 arnavsinghvi11 merged commit 0e595a7 into stanfordnlp:main May 15, 2024
@arnavsinghvi11
Copy link
Collaborator

LGTM. thanks @Anush008 !

@Anush008 Anush008 deleted the refactor-qdrant branch May 16, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants