-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added filter and delete all option to delete function in Pinecone integration, updated base VectorStore's delete function #6876
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Analysis
- 🎯 Main theme: The PR enhances the delete function in the Pinecone integration by adding the ability to delete vectors by specifying a filter condition or delete all vectors in a namespace.
- 🔍 Description and title: Yes
- 📌 Type of PR: Enhancement
- 🧪 Relevant tests added: No
⚠️ Unrelated changes: No- ✨ Minimal and focused: Yes, this PR is small and focuses only on enhancing the delete function in the Pinecone integration.
PR Feedback
-
💡 Suggestions: The PR is well-structured and the changes are clear. However, it would be beneficial to include tests that demonstrate the new functionality. This would help ensure that the changes work as expected and prevent potential regressions in the future.
-
🌱 Minor suggestions: Consider adding more detailed docstrings for the delete function. This would help users understand the different ways they can use the function and what they should expect as a result.
-
🤖 Code suggestions:
- In the
delete
function, consider handling the case where bothids
andfilter
are provided. Currently, the function only deletes byids
if they are provided, even if afilter
is also provided. This could lead to unexpected behavior for users. [important] - In the
delete
function, consider returning a meaningful message or result when no vectors are deleted because none match the providedids
orfilter
. This would provide users with more information about the result of their delete operation. [medium]
- In the
Comment 'CodiumAI please review' to ask for another review after you update the PR.
langchain/vectorstores/pinecone.py
Outdated
chunk_size = 1000 | ||
for i in range(0, len(ids), chunk_size): | ||
chunk = ids[i : i + chunk_size] | ||
return self._index.delete(ids=chunk, namespace=namespace, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean that the loop won't actually run over all chunks and break on the first one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true, great catch. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually ran CodiumAI-Agent a few times on your code (sorry, mistakenly, two times was published here, apologize),
and in one of these cases, it suggested this error 🤖
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea and execution, it's very helpful. I'd love to contribute if anything's open source!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coditamar this is great. how can I run on PR's regularly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlancemartin , we will try to release an option to run regularly this coming week!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlancemartin , here you go:
https://github.com/Codium-ai/pr-agent
would love to discuss how to use 'LangChain' in 'pr-agent'
langchain/vectorstores/pinecone.py
Outdated
def delete(self, ids: List[str], namespace: Optional[str] = None) -> None: | ||
"""Delete by vector IDs. | ||
|
||
def delete(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like delete
is implemented on base VectorStore class as delete(self, ids: List[str])
, so we shouldn't change signature (which I recognize Pinecone implementation already did, but should fix that). i suggest we either:
- update VectorStore.delete to have signature more like
delete(self, ids: Optional[List[str]]=None, **kwargs: Any)
or rename it todelete_by_ids(self, ids: List[str])
and then we can keep this implementation as is - rename this method to something like
delete_with_kwargs
(maybe there's a better name)
think i prefer (1) but curious to hear what others think, cc @rlancemartin @eyurtsev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. I think 1 is best also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. @0xcha05 how do you recommend modifying the base VectorStore class delete
method? IMO delete(self, ids: Optional[List[str]]=None, **kwargs: Any)
is reasonable.
the central point is that there are various uses of delete
so do not require IDs, so base VectorStore should be modified. do you want to take that up in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i'll change the vecstorestore class delete method to delete(self, ids: Optional[List[str]]=None, **kwargs: Any) , will do this in this PR.
Nice addition! As mentioned by @dev2049 we should updated the base VectorStore delete class here and above suggestion is reasonable to me.
A number of vectorstores have delete methods w/ ids passed, and this would of course be backwards compatible. But making IDs optional is more flexible for future additions, like the nice one you propose here. |
…optional IDs and additional keyword arguments
I've refactored delete method in VectorStore and Pinecone classes to accept optional IDs and additional keyword arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing w/ @nfcampos and @jacoblee93 who are adding this same functionality to js - there is possible confusion if a user passes both filter params and IDs. In this case, ID-wise delete is done first. Thoughts on having independent methods delete_by_id and delete_by_filter? Pinecone simply has delete as done here, so perhaps separate methods also has drawbacks. https://docs.pinecone.io/reference/delete_post
There could also be additional one-off options for other vector stores in the future - if we do separate methods we can always refactor to a single combined one if there's confusion. |
the pinecone doc says that "If specified, the metadata filter here will be used to select the vectors to delete. This is mutually exclusive |
I think it sounds fine for Pinecone and what devs are used to there, just not sure it's going to be so clean and clear as an interface extended to other vector stores since they might have different behavior/expected precedence for args like this and we're adding this to the base class. |
Absolutely, let's clarify things a bit. The base delete function is only being tweaked to make ids optional and to allow for additional keyword arguments, like so: So, this change shouldn't disrupt any existing code that uses the delete method. The filter and delete_all parameters are specific to the Pinecone subclass and won't affect the base class or any other subclasses. Please correct me if I'm missing something! |
+1, imo this is reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do this in this PR, @rlancemartin . |
…nd Weaviate classes to make ids optional and accept additional kwargs
The reason i did
in deeplake is because I didn't want to change the current behaviour. but passing kwargs directly into self.vectorstore.delete would be good. |
I've fixed the lint issues, but not sure what the Cassandra lint issue is about. There isn't a delete function implemented. @rlancemartin |
Will have a look, just kicked off tests. |
@0xcha05 Cassandra also has delete -- can you please change signature as done for the other DBs. That should resolve Lint and we can get this in. (See delete() in cassandra.py.) |
Thank you, the version I had didn't have a delete method in Cassandra. but this should work now, please let me know if this looks good @rlancemartin . |
I've reformatted with black. This should fix all lint errors. |
* master: (212 commits) Add SpacyEmbeddings class (langchain-ai#6967) docs: commented out `editUrl` option (langchain-ai#6440) Remove duplicate mongodb integration doc (langchain-ai#7006) Update get_started.mdx (langchain-ai#7005) openapi chain nit (langchain-ai#7012) Fix sample in FAISS section (langchain-ai#7050) Fix typo in google_places_api.py (langchain-ai#7055) move base prompt to schema (langchain-ai#6995) added `Brave Search` document_loader (langchain-ai#6989) Add JSON Lines support to JSONLoader (langchain-ai#6913) Vectara upd2 (langchain-ai#6506) docstrings `document_loaders` 2 (langchain-ai#6890) docstrings `document_loaders` 1 (langchain-ai#6847) Added filter and delete all option to delete function in Pinecone integration, updated base VectorStore's delete function (langchain-ai#6876) bump 221 (langchain-ai#7047) Rm retriever kwargs (langchain-ai#7013) Polish reference docs (langchain-ai#7045) Support params on GoogleSearchApiWrapper (langchain-ai#6810) (langchain-ai#7014) Fix typo (langchain-ai#7023) Fix openai multi functions agent docs (langchain-ai#7028) ...
…egration, updated base VectorStore's delete function (#6876) ### Description: Updated the delete function in the Pinecone integration to allow for deletion of vectors by specifying a filter condition, and to delete all vectors in a namespace. Made the ids parameter optional in the delete function in the base VectorStore class and allowed for additional keyword arguments. Updated the delete function in several classes (Redis, Chroma, Supabase, Deeplake, Elastic, Weaviate, and Cassandra) to match the changes made in the base VectorStore class. This involved making the ids parameter optional and allowing for additional keyword arguments.
…egration, updated base VectorStore's delete function (langchain-ai#6876) ### Description: Updated the delete function in the Pinecone integration to allow for deletion of vectors by specifying a filter condition, and to delete all vectors in a namespace. Made the ids parameter optional in the delete function in the base VectorStore class and allowed for additional keyword arguments. Updated the delete function in several classes (Redis, Chroma, Supabase, Deeplake, Elastic, Weaviate, and Cassandra) to match the changes made in the base VectorStore class. This involved making the ids parameter optional and allowing for additional keyword arguments.
Description:
Updated the delete function in the Pinecone integration to allow for deletion of vectors by specifying a filter condition, and to delete all vectors in a namespace.
Made the ids parameter optional in the delete function in the base VectorStore class and allowed for additional keyword arguments.
Updated the delete function in several classes (Redis, Chroma, Supabase, Deeplake, Elastic, Weaviate, and Cassandra) to match the changes made in the base VectorStore class. This involved making the ids parameter optional and allowing for additional keyword arguments.