-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vector store support (Weaviate, Pinecone, Faiss) #108
Conversation
@@ -0,0 +1,46 @@ | |||
# Using Vector Stores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a later PR: i think this page would benefit from some diagrams to show the differences between how gpt index interacts with the vector stores.
made an for later #109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah totally!
[Example notebooks can be found here](https://github.com/jerryjliu/gpt_index/tree/main/examples/data_connectors). | ||
|
||
|
||
## Using a Vector Store as an Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(possibly noob question, i'm still getting familiar with faiss and vector dbs)
is the difference between this, vs using faiss directly to store embeddings of the paul graham essay, mainly that gpt index also generates a coherent answer? or are there other things going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh so using Faiss as a data loader (the first section) means that you load documents from an existing Faiss index (say that the user already has), and can use a GPT index structure on top of the retrieved documents - say you build a tree over the retrieved documents.
In this section it's saying that once you have documents, you can also build a GPT Index data struct, with Faiss under the hood, over these documents. So these documents could be from anywhere (e.g. Slack, notion), and we'll create an index data structure over that, taking care of tokenization/chunking/querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is something where a diagram absolutely would help!
from gpt_index.schema import Document | ||
|
||
|
||
class WeaviateReader(BaseReader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it simplify things if we made one base class for a VectorDbReader
that all faiss, pinecone, weaviate inherited from? i'm wondering if there's enough shared logic here to do that.
since people have been asking for vector db support, this might save us time in the future if we have to add more vector db readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good q. Tbh I thought about it and the interfaces between these three are actually quite different, each one has different required args during loading and query time. But definitely something to think about as I add more abstractions!
GPT Index now offers multiple integration points with vector stores / vector databases:
PineconeReader
,WeaviateReader
,FaissReader
.GPTFaissIndex