Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added filter and delete all option to delete function in Pinecone integration, updated base VectorStore's delete function #6876

Merged
merged 9 commits into from
Jul 2, 2023

Conversation

0xcha05
Copy link
Contributor

@0xcha05 0xcha05 commented Jun 28, 2023

Description:

Updated the delete function in the Pinecone integration to allow for deletion of vectors by specifying a filter condition, and to delete all vectors in a namespace.

Made the ids parameter optional in the delete function in the base VectorStore class and allowed for additional keyword arguments.

Updated the delete function in several classes (Redis, Chroma, Supabase, Deeplake, Elastic, Weaviate, and Cassandra) to match the changes made in the base VectorStore class. This involved making the ids parameter optional and allowing for additional keyword arguments.

@vercel
Copy link

vercel bot commented Jun 28, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 2, 2023 4:29pm

@dosubot dosubot bot added the 03 enhancement Enhancement of existing functionality label Jun 28, 2023
@vercel vercel bot temporarily deployed to Preview June 28, 2023 15:47 Inactive
@0xcha05 0xcha05 changed the title Added filter option to delete function in Pinecone integration Added filter and delete all option to delete function in Pinecone integration Jun 28, 2023
@vercel vercel bot temporarily deployed to Preview June 28, 2023 16:08 Inactive
Copy link

@CodiumAI-Agent CodiumAI-Agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Analysis

  • 🎯 Main theme: The PR enhances the delete function in the Pinecone integration by adding the ability to delete vectors by specifying a filter condition or delete all vectors in a namespace.
  • 🔍 Description and title: Yes
  • 📌 Type of PR: Enhancement
  • 🧪 Relevant tests added: No
  • ⚠️ Unrelated changes: No
  • Minimal and focused: Yes, this PR is small and focuses only on enhancing the delete function in the Pinecone integration.

PR Feedback

  • 💡 Suggestions: The PR is well-structured and the changes are clear. However, it would be beneficial to include tests that demonstrate the new functionality. This would help ensure that the changes work as expected and prevent potential regressions in the future.

  • 🌱 Minor suggestions: Consider adding more detailed docstrings for the delete function. This would help users understand the different ways they can use the function and what they should expect as a result.

  • 🤖 Code suggestions:

    • In the delete function, consider handling the case where both ids and filter are provided. Currently, the function only deletes by ids if they are provided, even if a filter is also provided. This could lead to unexpected behavior for users. [important]
    • In the delete function, consider returning a meaningful message or result when no vectors are deleted because none match the provided ids or filter. This would provide users with more information about the result of their delete operation. [medium]

Comment 'CodiumAI please review' to ask for another review after you update the PR.

CodiumAI-Agent

This comment was marked as duplicate.

chunk_size = 1000
for i in range(0, len(ids), chunk_size):
chunk = ids[i : i + chunk_size]
return self._index.delete(ids=chunk, namespace=namespace, **kwargs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that the loop won't actually run over all chunks and break on the first one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true, great catch. Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually ran CodiumAI-Agent a few times on your code (sorry, mistakenly, two times was published here, apologize),
and in one of these cases, it suggested this error 🤖

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea and execution, it's very helpful. I'd love to contribute if anything's open source!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coditamar this is great. how can I run on PR's regularly?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlancemartin , we will try to release an option to run regularly this coming week!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlancemartin , here you go:
https://github.com/Codium-ai/pr-agent
would love to discuss how to use 'LangChain' in 'pr-agent'

@vercel vercel bot temporarily deployed to Preview June 28, 2023 21:25 Inactive
@dev2049 dev2049 removed the 03 enhancement Enhancement of existing functionality label Jun 29, 2023
def delete(self, ids: List[str], namespace: Optional[str] = None) -> None:
"""Delete by vector IDs.

def delete(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like delete is implemented on base VectorStore class as delete(self, ids: List[str]), so we shouldn't change signature (which I recognize Pinecone implementation already did, but should fix that). i suggest we either:

  1. update VectorStore.delete to have signature more like delete(self, ids: Optional[List[str]]=None, **kwargs: Any) or rename it to delete_by_ids(self, ids: List[str]) and then we can keep this implementation as is
  2. rename this method to something like delete_with_kwargs (maybe there's a better name)

think i prefer (1) but curious to hear what others think, cc @rlancemartin @eyurtsev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. I think 1 is best also.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. @0xcha05 how do you recommend modifying the base VectorStore class delete method? IMO delete(self, ids: Optional[List[str]]=None, **kwargs: Any) is reasonable.

the central point is that there are various uses of delete so do not require IDs, so base VectorStore should be modified. do you want to take that up in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i'll change the vecstorestore class delete method to delete(self, ids: Optional[List[str]]=None, **kwargs: Any) , will do this in this PR.

@rlancemartin
Copy link
Collaborator

rlancemartin commented Jun 29, 2023

Nice addition! As mentioned by @dev2049 we should updated the base VectorStore delete class here and above suggestion is reasonable to me.

delete(self, ids: Optional[List[str]]=None, **kwargs: Any) 

A number of vectorstores have delete methods w/ ids passed, and this would of course be backwards compatible. But making IDs optional is more flexible for future additions, like the nice one you propose here.

…optional IDs and additional keyword arguments
@0xcha05
Copy link
Contributor Author

0xcha05 commented Jun 30, 2023

I've refactored delete method in VectorStore and Pinecone classes to accept optional IDs and additional keyword arguments.
CC: @rlancemartin @dev2049 @eyurtsev

@vercel vercel bot temporarily deployed to Preview June 30, 2023 04:31 Inactive
@0xcha05 0xcha05 requested a review from rlancemartin July 1, 2023 03:04
Copy link
Collaborator

@rlancemartin rlancemartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing w/ @nfcampos and @jacoblee93 who are adding this same functionality to js - there is possible confusion if a user passes both filter params and IDs. In this case, ID-wise delete is done first. Thoughts on having independent methods delete_by_id and delete_by_filter? Pinecone simply has delete as done here, so perhaps separate methods also has drawbacks. https://docs.pinecone.io/reference/delete_post

@jacoblee93
Copy link
Contributor

jacoblee93 commented Jul 1, 2023

Discussing w/ @nfcampos and @jacoblee93 who are adding this same functionality to js - there is possible confusion if a user passes both filter params and IDs. In this case, ID-wise delete is done first. Thoughts on having independent methods delete_by_id and delete_by_filter? Pinecone simply has delete as done here, so perhaps separate methods also has drawbacks. https://docs.pinecone.io/reference/delete_post

There could also be additional one-off options for other vector stores in the future - if we do separate methods we can always refactor to a single combined one if there's confusion.

@0xcha05
Copy link
Contributor Author

0xcha05 commented Jul 1, 2023

the pinecone doc says that "If specified, the metadata filter here will be used to select the vectors to delete. This is mutually exclusive
with specifying IDs to delete in the ids param or using delete_all=True."
so we can just have the pinecone delete function with ids optional and kwargs, and have some validation in delete function of the pinecone class. what do you say? @rlancemartin

@jacoblee93
Copy link
Contributor

jacoblee93 commented Jul 1, 2023

the pinecone doc says that "If specified, the metadata filter here will be used to select the vectors to delete. This is mutually exclusive
with specifying IDs to delete in the ids param or using delete_all=True."
so we can just have the pinecone delete function with ids optional and kwargs, and have some validation in delete function of the pinecone class. what do you say?

I think it sounds fine for Pinecone and what devs are used to there, just not sure it's going to be so clean and clear as an interface extended to other vector stores since they might have different behavior/expected precedence for args like this and we're adding this to the base class.

@0xcha05
Copy link
Contributor Author

0xcha05 commented Jul 1, 2023

Absolutely, let's clarify things a bit. The base delete function is only being tweaked to make ids optional and to allow for additional keyword arguments, like so:
def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> Optional[bool]:
Here's the commit for reference.

So, this change shouldn't disrupt any existing code that uses the delete method. The filter and delete_all parameters are specific to the Pinecone subclass and won't affect the base class or any other subclasses. Please correct me if I'm missing something!

@rlancemartin
Copy link
Collaborator

Absolutely, let's clarify things a bit. The base delete function is only being tweaked to make ids optional and to allow for additional keyword arguments, like so: def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> Optional[bool]: Here's the commit for reference.

So, this change shouldn't disrupt any existing code that uses the delete method. The filter and delete_all parameters are specific to the Pinecone subclass and won't affect the base class or any other subclasses. Please correct me if I'm missing something!

+1, imo this is reasonable.

Copy link
Collaborator

@rlancemartin rlancemartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With update to base vectorstore, we need to make ids: Optional [List[str]] in the delete method for:

1/ redis
2/ chroma
3/ supabase
4/ deeplake
5/ elastic
6/ weaviate

This will resolve Lint errors.

@0xcha05
Copy link
Contributor Author

0xcha05 commented Jul 1, 2023

I'll do this in this PR, @rlancemartin .

…nd Weaviate classes to make ids optional and accept additional kwargs
@0xcha05 0xcha05 requested a review from rlancemartin July 1, 2023 18:28
@vercel vercel bot temporarily deployed to Preview July 1, 2023 18:34 Inactive
@0xcha05
Copy link
Contributor Author

0xcha05 commented Jul 1, 2023

The reason i did

filter = kwargs.get('filter')
delete_all = kwargs.get('delete_all')

in deeplake is because I didn't want to change the current behaviour. but passing kwargs directly into self.vectorstore.delete would be good.

@vercel vercel bot temporarily deployed to Preview July 1, 2023 19:47 Inactive
@0xcha05
Copy link
Contributor Author

0xcha05 commented Jul 1, 2023

I've fixed the lint issues, but not sure what the Cassandra lint issue is about. There isn't a delete function implemented. @rlancemartin

@rlancemartin
Copy link
Collaborator

I've fixed the lint issues, but not sure what the Cassandra lint issue is about. There isn't a delete function implemented. @rlancemartin

Will have a look, just kicked off tests.

@rlancemartin
Copy link
Collaborator

I've fixed the lint issues, but not sure what the Cassandra lint issue is about. There isn't a delete function implemented. @rlancemartin

Will have a look, just kicked off tests.

@0xcha05 Cassandra also has delete -- can you please change signature as done for the other DBs. That should resolve Lint and we can get this in. (See delete() in cassandra.py.)

@0xcha05
Copy link
Contributor Author

0xcha05 commented Jul 2, 2023

Thank you, the version I had didn't have a delete method in Cassandra. but this should work now, please let me know if this looks good @rlancemartin .

@vercel vercel bot temporarily deployed to Preview July 2, 2023 15:00 Inactive
@0xcha05 0xcha05 changed the title Added filter and delete all option to delete function in Pinecone integration Added filter and delete all option to delete function in Pinecone integration, change the base vectorstore class's delete function Jul 2, 2023
@0xcha05 0xcha05 changed the title Added filter and delete all option to delete function in Pinecone integration, change the base vectorstore class's delete function Added filter and delete all option to delete function in Pinecone integration, updated base VectorStore's delete function Jul 2, 2023
@0xcha05
Copy link
Contributor Author

0xcha05 commented Jul 2, 2023

I've reformatted with black. This should fix all lint errors.

@vercel vercel bot temporarily deployed to Preview July 2, 2023 16:29 Inactive
@rlancemartin rlancemartin merged commit e41b382 into langchain-ai:master Jul 2, 2023
13 checks passed
bdonkey added a commit to bdonkey/langchain that referenced this pull request Jul 3, 2023
* master: (212 commits)
  Add SpacyEmbeddings class (langchain-ai#6967)
  docs: commented out `editUrl` option (langchain-ai#6440)
  Remove duplicate mongodb integration doc (langchain-ai#7006)
  Update get_started.mdx (langchain-ai#7005)
  openapi chain nit (langchain-ai#7012)
  Fix sample in FAISS section (langchain-ai#7050)
  Fix typo in google_places_api.py (langchain-ai#7055)
  move base prompt to schema (langchain-ai#6995)
  added `Brave Search` document_loader (langchain-ai#6989)
  Add JSON Lines support to JSONLoader (langchain-ai#6913)
  Vectara upd2 (langchain-ai#6506)
  docstrings `document_loaders` 2 (langchain-ai#6890)
  docstrings `document_loaders` 1 (langchain-ai#6847)
  Added filter and delete all option to delete function in Pinecone integration, updated base VectorStore's delete function (langchain-ai#6876)
  bump 221 (langchain-ai#7047)
  Rm retriever kwargs (langchain-ai#7013)
  Polish reference docs (langchain-ai#7045)
  Support params on GoogleSearchApiWrapper (langchain-ai#6810) (langchain-ai#7014)
  Fix typo (langchain-ai#7023)
  Fix openai multi functions agent docs (langchain-ai#7028)
  ...
vowelparrot pushed a commit that referenced this pull request Jul 4, 2023
…egration, updated base VectorStore's delete function (#6876)

### Description:
Updated the delete function in the Pinecone integration to allow for
deletion of vectors by specifying a filter condition, and to delete all
vectors in a namespace.

Made the ids parameter optional in the delete function in the base
VectorStore class and allowed for additional keyword arguments.

Updated the delete function in several classes (Redis, Chroma, Supabase,
Deeplake, Elastic, Weaviate, and Cassandra) to match the changes made in
the base VectorStore class. This involved making the ids parameter
optional and allowing for additional keyword arguments.
aerrober pushed a commit to aerrober/langchain-fork that referenced this pull request Jul 24, 2023
…egration, updated base VectorStore's delete function (langchain-ai#6876)

### Description:
Updated the delete function in the Pinecone integration to allow for
deletion of vectors by specifying a filter condition, and to delete all
vectors in a namespace.

Made the ids parameter optional in the delete function in the base
VectorStore class and allowed for additional keyword arguments.

Updated the delete function in several classes (Redis, Chroma, Supabase,
Deeplake, Elastic, Weaviate, and Cassandra) to match the changes made in
the base VectorStore class. This involved making the ids parameter
optional and allowing for additional keyword arguments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants