[Feat] Pinecone Vector DB support #723

rupeshbansal · 2023-09-28T17:09:49Z

Description

This PR adds support for pinecone as a vector database

How to use it

Imports

from embedchain import CustomApp
from embedchain.embedder.openai import OpenAiEmbedder
from embedchain.llm.openai import OpenAILlm
from embedchain.vectordb.pineconedb import PineconeDb

Creating a custom app

pinecone_app = CustomApp(llm=OpenAILlm(), embedder=OpenAiEmbedder(), db=PineconeDb())

You app is ready. Now play around with adding and querying your data

Fixes #39

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Maintainer Checklist

closes #xxxx (Replace xxxx with the GitHub issue number)
Made sure Checks passed

rupeshbansal · 2023-09-28T17:13:48Z

embedchain/config/vectordbs/PineconeDbConfig.py

+
+
+@register_deserializable
+class PineconeDbConfig(BaseVectorDbConfig):


The config is quite raw for the first release. More pinecone specific settings like number of replicas can be added to it as the need arises

Can we support allow the users to pass **kwargs during init and we pass those args to the pinecone client?

Its definitely viable, but might make things a little tedious IMO, and make pinecone out of sync with other vector stores. The way Pinecone client is set is it takes in the explicit key-value pairs: https://docs.pinecone.io/docs/python-client. With Kwargs, we will have to explicitly fetch and parse all the keys and values while setting the pinecone client.

rupeshbansal · 2023-09-28T17:14:47Z

embedchain/embedchain.py

@@ -399,7 +399,6 @@ def load_and_embed(

        self.db.add(documents=documents, metadatas=metadatas, ids=ids)
        count_new_chunks = self.count() - chunks_before_addition
-        print((f"Successfully saved {src} ({chunker.data_type}). New chunks count: {count_new_chunks}"))


Because pinecone is remote, it takes time for the index count to be updated. As such, this log line becomes redundant for pinecone. To avoid confusion, removed it altogether

Can we instead do this:

Print the current statement if db != pinecone

Print saying that "Successfully saved {src} ({chunker.data_type})" but don't mention the chunks count.

Wondering whats the upside of printing the src. It's something that is supplied by the user, so they already know what src is being stored? Or are we trying to give the assurance that what they supplied is indeed what is saved?

deshraj

Thanks for adding this @rupeshbansal.

Generally looks good. Can you please resolve the comments?

deshraj · 2023-09-28T17:28:13Z

embedchain/config/vectordbs/PineconeDbConfig.py

+
+
+@register_deserializable
+class PineconeDbConfig(BaseVectorDbConfig):


Can we support allow the users to pass **kwargs during init and we pass those args to the pinecone client?

deshraj · 2023-09-28T17:29:32Z

embedchain/embedchain.py

@@ -399,7 +399,6 @@ def load_and_embed(

        self.db.add(documents=documents, metadatas=metadatas, ids=ids)
        count_new_chunks = self.count() - chunks_before_addition
-        print((f"Successfully saved {src} ({chunker.data_type}). New chunks count: {count_new_chunks}"))


Can we instead do this:

Print the current statement if db != pinecone

Print saying that "Successfully saved {src} ({chunker.data_type})" but don't mention the chunks count.

embedchain/vectordb/pineconedb.py

kencanak · 2023-10-04T09:25:44Z

@rupeshbansal I think we need to batch the upsert task. as pinecone has 2mb limit? https://docs.pinecone.io/docs/limits#:~:text=Max%20size%20for%20an%20upsert,to%20queries%20immediately%20after%20upserting.

context: i tried to embed https://en.wikipedia.org/wiki/Elon_Musk & https://www.forbes.com/profile/elon-musk. i am getting the limit exceeded error. using OPENAI as my embedder

embedchain/vectordb/pineconedb.py

rupeshbansal · 2023-10-07T17:03:09Z

@rupeshbansal I think we need to batch the upsert task. as pinecone has 2mb limit? https://docs.pinecone.io/docs/limits#:~:text=Max%20size%20for%20an%20upsert,to%20queries%20immediately%20after%20upserting.

context: i tried to embed https://en.wikipedia.org/wiki/Elon_Musk & https://www.forbes.com/profile/elon-musk. i am getting the limit exceeded error. using OPENAI as my embedder

Thanks for noting this. Fixed!

codecov · 2023-10-07T17:11:15Z

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Files	Coverage Δ
embedchain/config/vectordb/pinecone.py	`100.00% <100.00%> (ø)`
embedchain/embedchain.py	`72.64% <ø> (-0.12%)`	⬇️
embedchain/vectordb/elasticsearch.py	`67.53% <100.00%> (ø)`
embedchain/vectordb/pineconedb.py	`72.46% <72.46%> (ø)`

📢 Thoughts on this report? Let us know!.

deshraj · 2023-10-09T18:44:47Z

embedchain/vectordb/pineconedb.py

+        pinecone.init(
+            api_key=os.environ.get("PINECONE_API_KEY"),
+            environment=os.environ.get("PINECONE_ENV"),
+        )


Can we please error out with proper error message if the env variables are missing? Currently, it errors out without a proper error messaage.

Pinecone throws API key is missing or invalid for the environment "us-west1-gcp". Check that the correct environment is specified. which is also bubbled up in the app. Thats descriptive enough for users to understand what went wrong?

embedchain/vectordb/pineconedb.py

…econe_support

deshraj · 2023-10-15T08:53:48Z

Thanks for the PR Rupesh. This is great. ❤️ Although there are some changes that we would have to do to be able to configure pinecone and make it work with the yaml configuration but I will incorporate it in a follow up PR.

Rupesh Bansal added 2 commits September 28, 2023 22:29

Added pinecone

058812b

Polished

77637f3

rupeshbansal commented Sep 28, 2023

View reviewed changes

deshraj reviewed Sep 28, 2023

View reviewed changes

Rupesh Bansal and others added 2 commits September 30, 2023 18:27

COmments

3aa8f52

Merge branch 'main' into feat/pinecone_support

26069f7

rupeshbansal requested a review from deshraj September 30, 2023 13:25

Changed module name

62356ac

kencanak added a commit to kencanak/embedchain that referenced this pull request Oct 3, 2023

add(pinecone): add support - ref mem0ai#723

25df4a6

Resolved conflicts

4e17e44

kencanak reviewed Oct 5, 2023

View reviewed changes

embedchain/vectordb/pineconedb.py Show resolved Hide resolved

kencanak reviewed Oct 5, 2023

View reviewed changes

embedchain/vectordb/pineconedb.py Outdated Show resolved Hide resolved

Rupesh Bansal added 5 commits October 6, 2023 22:32

resolved conflicts

7fb6e35

Added tests

1c7fd00

Resolved conflicts

40dcb92

Fix lint

6d9c51b

Fixed mock import

2502c29

deshraj reviewed Oct 9, 2023

View reviewed changes

embedchain/vectordb/pineconedb.py Show resolved Hide resolved

Rupesh Bansal added 4 commits October 12, 2023 09:36

Merge branch 'main' of github.com:embedchain/embedchain into feat/pin…

c75d110

…econe_support

Merge branch 'main' of github.com:embedchain/embedchain into feat/pin…

2141014

…econe_support

Fixed bug

1a7ccf1

Formatted

1b69401

rupeshbansal requested a review from deshraj October 13, 2023 14:02

deshraj merged commit a7a61fa into mem0ai:main Oct 15, 2023
5 checks passed

deshraj mentioned this pull request Oct 15, 2023

[feature]: Improve pinecone db integration #806

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Pinecone Vector DB support #723

[Feat] Pinecone Vector DB support #723

rupeshbansal commented Sep 28, 2023

rupeshbansal Sep 28, 2023

deshraj Sep 28, 2023

rupeshbansal Sep 30, 2023

rupeshbansal Sep 28, 2023

deshraj Sep 28, 2023

rupeshbansal Sep 30, 2023

deshraj left a comment

deshraj Sep 28, 2023

deshraj Sep 28, 2023

kencanak commented Oct 4, 2023 •

edited

Loading

rupeshbansal commented Oct 7, 2023

codecov bot commented Oct 7, 2023 •

edited

Loading

deshraj Oct 9, 2023

rupeshbansal Oct 13, 2023

deshraj commented Oct 15, 2023



		@register_deserializable
		class PineconeDbConfig(BaseVectorDbConfig):

[Feat] Pinecone Vector DB support #723

[Feat] Pinecone Vector DB support #723

Conversation

rupeshbansal commented Sep 28, 2023

Description

Type of change

Checklist:

Maintainer Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deshraj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kencanak commented Oct 4, 2023 • edited Loading

rupeshbansal commented Oct 7, 2023

codecov bot commented Oct 7, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deshraj commented Oct 15, 2023

kencanak commented Oct 4, 2023 •

edited

Loading

codecov bot commented Oct 7, 2023 •

edited

Loading