Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error querying PineconeVectorStore using sparse query mode #12967

Merged
merged 3 commits into from Apr 22, 2024

Conversation

Javtor
Copy link
Contributor

@Javtor Javtor commented Apr 19, 2024

Description

Pinecone always expects a vector (or technically also a record id) when querying. When using sparse query mode, we were setting the vector to None and making the request with that, resulting in an error. Changed to make the request with a vector filled with zeroes instead.

Version Bump?

  • Yes

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 19, 2024
@@ -433,7 +433,7 @@ def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResul
"values": [v * (1 - query.alpha) for v in sparse_vector["values"]],
}

query_embedding = None
query_embedding = [0.0] * len(query.query_embedding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically to fix the case where query_mode is sparse right?

Won't providing a vector of zeros still kind of influence the search result? If we are in sparse mode, should we just query without the vector kwarg? Or should we set the alpha to completely ignore the Dense zero vector?

Copy link
Contributor Author

@Javtor Javtor Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

The issue is that if we query without the vector kwarg, we get an error when making the request if we don't pass a record id instead. In their hybrid search tutorial, pinecone says we should scale the dense vector by alpha (in this case zero) before querying, so using a vector of zeros in this case should be fine (it's overwritten later for other query modes anyways) https://www.pinecone.io/learn/hybrid-search-intro/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great, that works for me!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok wait, one more worry haha -- we should ensure that the query embedding is not none before doing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, added a check

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 20, 2024
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 22, 2024
@Javtor Javtor merged commit 521178f into main Apr 22, 2024
8 checks passed
@Javtor Javtor deleted the javier/pinecone-sparse-search branch April 22, 2024 16:36
chrisalexiuk-nvidia pushed a commit to chrisalexiuk-nvidia/llama_index that referenced this pull request Apr 25, 2024
mattf pushed a commit to mattf/llama_index that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants