Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/Add support for weaviate vector db #782

Merged
merged 22 commits into from
Oct 18, 2023

Conversation

rupeshbansal
Copy link
Contributor

Description

Adding support for weaviate vector database

Fixes #436

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

from embedchain import CustomApp
from embedchain.embedder.openai import OpenAIEmbedder
from embedchain.llm.openai import OpenAILlm
from embedchain.vectordb.weaviate import WeaviateDb

app = CustomApp(llm=OpenAILlm(), embedder=OpenAIEmbedder(), db=WeaviateDb())

app.add(...)
app.query(...)

Please delete options that are not relevant.

  • Unit Test

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed

@rupeshbansal
Copy link
Contributor Author

@Dev-Khant for your reference

@codecov
Copy link

codecov bot commented Oct 8, 2023

Codecov Report

Attention: 31 lines in your changes are missing coverage. Please review.

Files Coverage Δ
embedchain/config/vectordb/weaviate.py 100.00% <100.00%> (ø)
embedchain/embedchain.py 72.99% <ø> (+0.22%) ⬆️
embedchain/factory.py 93.02% <ø> (ø)
embedchain/llm/base.py 85.24% <ø> (ø)
embedchain/vectordb/weaviate.py 67.70% <67.70%> (ø)

... and 8 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@Dev-Khant
Copy link
Collaborator

@Dev-Khant for your reference

Thanks, I will go through it

@deshraj
Copy link
Collaborator

deshraj commented Oct 9, 2023

@rupeshbansal thanks for adding the support. Although, I think there are some issues with the integration which I found out when testing with the weaviate sandbox.

Issues:

  1. Looks like we are trying to add documents again even when it already exists and hasn't changed. We should only try to add if it doesn't exist. If it exists and has changed, then we only update the ones that has changed. See the integration code for chroma and opensearch db.
  2. Similar to the issue with pinecone db integration, looks like the chunks are not added properly to the weaviate db resulting in bad answers. See the code below to reproduce the issue
In [1]: from embedchain import CustomApp
   ...: from embedchain.embedder.openai import OpenAIEmbedder
   ...: from embedchain.llm.openai import OpenAILlm
   ...: from embedchain.vectordb.weaviate import WeaviateDb

In [3]: import os
   ...: 
   ...: os.environ['WEAVIATE_ENDPOINT'] = 'https://xxx'
   ...: os.environ['WEAVIATE_API_KEY'] = 'xxx'


In [4]: app = CustomApp(llm=OpenAILlm(), embedder=OpenAIEmbedder(), db=WeaviateDb())
WARNING:root:DEPRECATION WARNING: Please use `App` instead of `CustomApp`. `CustomApp` will be removed in a future release. Please refer to https://docs.embedchain.ai/advanced/app_types#opensourceapp for instructions.

In [5]: app.query("What is the net worth of Elon musk?")
Out[5]: "As of September 2021, Elon Musk's net worth is estimated to be around $250 billion, making him one of the wealthiest individuals in the world. However, please note that net worth can fluctuate over time due to various factors such as stock market fluctuations and business ventures."

In [6]: app.add("https://www.forbes.com/profile/elon-musk")
Successfully saved https://www.forbes.com/profile/elon-musk (DataType.WEB_PAGE). New chunks count: 13
Out[6]: '8cf46026cabf9b05394a2658bd1fe890'

In [7]: app.query("What is the net worth of Elon musk?")
Out[7]: "As of September 2021, Elon Musk's net worth is estimated to be around $250 billion, making him one of the wealthiest individuals in the world. However, please note that net worth can fluctuate over time due to various factors such as stock market fluctuations and business ventures."

In [8]: app.add("https://en.wikipedia.org/wiki/Elon_Musk")
   ...: app.add("https://www.forbes.com/profile/elon-musk")

Successfully saved https://en.wikipedia.org/wiki/Elon_Musk (DataType.WEB_PAGE). New chunks count: 367
Successfully saved https://www.forbes.com/profile/elon-musk (DataType.WEB_PAGE). New chunks count: 13
Out[8]: '8cf46026cabf9b05394a2658bd1fe890'

@rupeshbansal
Copy link
Contributor Author

rupeshbansal commented Oct 13, 2023

Thanks for the thorough review. Have addressed the comments. Requesting review please!

Copy link
Collaborator

@deshraj deshraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great. Left some minor comments which I can take care of.

embedchain/vectordb/weaviate.py Outdated Show resolved Hide resolved
embedchain/vectordb/weaviate.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
configs/weaviate.yaml Outdated Show resolved Hide resolved
configs/weaviate.yaml Outdated Show resolved Hide resolved
docs/components/vector-databases.mdx Outdated Show resolved Hide resolved
docs/components/vector-databases.mdx Outdated Show resolved Hide resolved
@deshraj deshraj merged commit cdfd651 into mem0ai:main Oct 18, 2023
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for weaviate database
3 participants