Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector store support (Weaviate, Pinecone, Faiss) #108

Merged
merged 19 commits into from
Dec 19, 2022

Conversation

jerryjliu
Copy link
Collaborator

GPT Index now offers multiple integration points with vector stores / vector databases:

  1. GPT Index can load data from vector stores, similar to any other data connector. This data can then be used within GPT Index data structures. PineconeReader, WeaviateReader, FaissReader.
  2. GPT Index can use a vector store itself (Faiss) as an index. Like any other index, this index can store documents and be used to answer queries. GPTFaissIndex

@jerryjliu jerryjliu requested a review from teoh December 18, 2022 23:18
@@ -0,0 +1,46 @@
# Using Vector Stores
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a later PR: i think this page would benefit from some diagrams to show the differences between how gpt index interacts with the vector stores.

made an for later #109

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah totally!

[Example notebooks can be found here](https://github.com/jerryjliu/gpt_index/tree/main/examples/data_connectors).


## Using a Vector Store as an Index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(possibly noob question, i'm still getting familiar with faiss and vector dbs)

is the difference between this, vs using faiss directly to store embeddings of the paul graham essay, mainly that gpt index also generates a coherent answer? or are there other things going on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so using Faiss as a data loader (the first section) means that you load documents from an existing Faiss index (say that the user already has), and can use a GPT index structure on top of the retrieved documents - say you build a tree over the retrieved documents.

In this section it's saying that once you have documents, you can also build a GPT Index data struct, with Faiss under the hood, over these documents. So these documents could be from anywhere (e.g. Slack, notion), and we'll create an index data structure over that, taking care of tokenization/chunking/querying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something where a diagram absolutely would help!

from gpt_index.schema import Document


class WeaviateReader(BaseReader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it simplify things if we made one base class for a VectorDbReader that all faiss, pinecone, weaviate inherited from? i'm wondering if there's enough shared logic here to do that.

since people have been asking for vector db support, this might save us time in the future if we have to add more vector db readers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good q. Tbh I thought about it and the interfaces between these three are actually quite different, each one has different required args during loading and query time. But definitely something to think about as I add more abstractions!

@jerryjliu jerryjliu merged commit d421afa into main Dec 19, 2022
@jerryjliu jerryjliu deleted the jerry/add_weaviate branch December 19, 2022 04:16
viveksilimkhan1 pushed a commit to viveksilimkhan1/llama_index that referenced this pull request Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants