Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support named vectors in Qdrant #6871

Merged

Conversation

kacperlukawski
Copy link
Contributor

Description

This PR makes it possible to use named vectors from Qdrant in Langchain. That was requested multiple times, as people want to reuse externally created collections in Langchain. It doesn't change anything for the existing applications. The changes were covered with some integration tests and included in the docs.

Example

Qdrant.from_documents(
    docs,
    embeddings,
    location=":memory:",
    collection_name="my_documents",
    vector_name="custom_vector",
)

Issue: #2594

Tagging @rlancemartin & @eyurtsev. I'd appreciate your review.

@vercel
Copy link

vercel bot commented Jun 28, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Jun 29, 2023 10:19am

@mahmoudajawad
Copy link

Your approach over mine allows the following:

  • Add new documents to QDrant collection that uses a single vector key vector_name.
  • Query QDrant collection that uses any number of vector keys, using one pre-specified vector_name.

My approach in my second PR #5975 allows both scenarios but expands first to allow adding documents to collections that define multiple vector keys.

Now, my use case is already fulfilled with your work, because my use of langchain doesn't involve getting it to add new documents, but only for querying them. However, I would like you to understand the difference between both before you finalise your approach.

with_payload=True,
with_vectors=True,
limit=fetch_k,
)
embeddings = [result.vector for result in results]
mmr_selected = maximal_marginal_relevance(
np.array(embedding), embeddings, k=k, lambda_mult=lambda_mult
np.array(query_vector), embeddings, k=k, lambda_mult=lambda_mult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't want to pass in vector_name to this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dev2049 Thanks for pointing this out! This was not covered well in the tests, so I extended them and fixed the issues. I would appreciate another look!

@kacperlukawski
Copy link
Contributor Author

Your approach over mine allows the following:

* Add new documents to QDrant collection that uses a single vector key `vector_name`.

* Query QDrant collection that uses any number of vector keys, using one pre-specified `vector_name`.

My approach in my second PR #5975 allows both scenarios but expands first to allow adding documents to collections that define multiple vector keys.

Now, my use case is already fulfilled with your work, because my use of langchain doesn't involve getting it to add new documents, but only for querying them. However, I would like you to understand the difference between both before you finalise your approach.

We should not expose the vector configuration to the Langchain users. If anyone wants to use a custom configuration for their collection, it should be created directly with QdrantClient and then passed to Qdrant while being instantiated.

@rlancemartin rlancemartin self-assigned this Jun 29, 2023
@rlancemartin
Copy link
Collaborator

lgtm

@rlancemartin rlancemartin merged commit 140ba68 into langchain-ai:master Jun 29, 2023
14 checks passed
vowelparrot pushed a commit that referenced this pull request Jul 4, 2023
# Description

This PR makes it possible to use named vectors from Qdrant in Langchain.
That was requested multiple times, as people want to reuse externally
created collections in Langchain. It doesn't change anything for the
existing applications. The changes were covered with some integration
tests and included in the docs.

## Example

```python
Qdrant.from_documents(
    docs,
    embeddings,
    location=":memory:",
    collection_name="my_documents",
    vector_name="custom_vector",
)
```

### Issue: #2594 

Tagging @rlancemartin & @eyurtsev. I'd appreciate your review.
aerrober pushed a commit to aerrober/langchain-fork that referenced this pull request Jul 24, 2023
# Description

This PR makes it possible to use named vectors from Qdrant in Langchain.
That was requested multiple times, as people want to reuse externally
created collections in Langchain. It doesn't change anything for the
existing applications. The changes were covered with some integration
tests and included in the docs.

## Example

```python
Qdrant.from_documents(
    docs,
    embeddings,
    location=":memory:",
    collection_name="my_documents",
    vector_name="custom_vector",
)
```

### Issue: langchain-ai#2594 

Tagging @rlancemartin & @eyurtsev. I'd appreciate your review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants