run-llama · jerryjliu · Dec 19, 2022 · Dec 15, 2022 · Dec 15, 2022 · Dec 16, 2022
diff --git a/docs/_static/vector_stores/faiss_index_0.png b/docs/_static/vector_stores/faiss_index_0.png
diff --git a/docs/_static/vector_stores/faiss_index_1.png b/docs/_static/vector_stores/faiss_index_1.png
diff --git a/docs/_static/vector_stores/pinecone_reader.png b/docs/_static/vector_stores/pinecone_reader.png
diff --git a/docs/_static/vector_stores/weaviate_reader_0.png b/docs/_static/vector_stores/weaviate_reader_0.png
diff --git a/docs/_static/vector_stores/weaviate_reader_1.png b/docs/_static/vector_stores/weaviate_reader_1.png
diff --git a/docs/how_to/data_connectors.md b/docs/how_to/data_connectors.md
@@ -3,11 +3,24 @@
 We currently offer connectors into the following data sources. External data sources are retrieved through their APIs + corresponding authentication token.
 The API reference documentation can be found [here](/reference/readers.rst).
 
+#### External API's
 - [Notion](https://developers.notion.com/) (`NotionPageReader`)
 - [Google Docs](https://developers.google.com/docs/api) (`GoogleDocsReader`)
 - [Slack](https://api.slack.com/) (`SlackReader`)
-- MongoDB (`SimpleMongoReader`)
 - Wikipedia (`WikipediaReader`)
+
+#### Databases
+- MongoDB (`SimpleMongoReader`)
+
+#### Vector Stores
+
+See [How to use Vector Stores with GPT Index](vector_stores.md) for a more thorough guide on integrating vector stores with GPT Index.
+
+- Weaviate (`WeaviateReader`)
+- Pinecone (`PineconeReader`)
+- Faiss (`FaissReader`)
+
+#### File
 - local file directory (`SimpleDirectoryReader`)
 
 We offer [example notebooks of connecting to different data sources](https://github.com/jerryjliu/gpt_index/tree/main/examples/data_connectors). Please check them out!
diff --git a/docs/how_to/vector_stores.md b/docs/how_to/vector_stores.md
@@ -0,0 +1,46 @@
+# Using Vector Stores
+
+GPT Index offers multiple integration points with vector stores / vector databases: 
+
+1) GPT Index can load data from vector stores, similar to any other data connector. This data can then be used within GPT Index data structures.
+2) GPT Index can use a vector store itself (Faiss) as an index. Like any other index, this index can store documents and be used to answer queries.
+
+
+## Loading Data from Vector Stores using Data Connector
+GPT Index supports loading data from the following sources. See [Data Connectors](data_connectors.md) for more details and API documentation.
+
+- Weaviate (`WeaviateReader`). [Installation](https://weaviate.io/developers/weaviate/current/getting-started/installation.html). [Python Client](https://weaviate.io/developers/weaviate/current/client-libraries/python.html).
+- Pinecone (`PineconeReader`). [Installation/Quickstart](https://docs.pinecone.io/docs/quickstart).
+- Faiss (`FaissReader`). [Installation](https://github.com/facebookresearch/faiss/blob/main/INSTALL.md).
+
+NOTE: Both Pinecone and Faiss data loaders assume that the respective data sources only store vectors; text content is stored elsewhere. Therefore, both data loaders require that the user specifies an `id_to_text_map` in the load_data call.
+
+For instance, this is an example usage of the Pinecone data loader `PineconeReader`:
+
+![](/_static/vector_stores/pinecone_reader.png)
+
+
+NOTE: Since Weaviate can store a hybrid of document and vector objects, the user may either choose to explicitly specify `class_name` and `properties` in order to query documents, or they may choose to specify a raw GraphQL query. See below for usage.
+
+![](/_static/vector_stores/weaviate_reader_0.png)
+![](/_static/vector_stores/weaviate_reader_1.png)
+
+[Example notebooks can be found here](https://github.com/jerryjliu/gpt_index/tree/main/examples/data_connectors).
+
+
+## Using a Vector Store as an Index
+
+GPT Index also supports using a vector store itself (specifically, Faiss) as an index. Similar to 
+any other index within GPT Index (tree, keyword table, list), this index can be constructed upon any collection
+of documents. We use the vector store within the index to store embeddings for the input text chunks.
+
+Once constructed, the index can be used for querying.
+
+**Index Construction**
+![](/_static/vector_stores/faiss_index_0.png)
+
+**Index Querying**
+![](/_static/vector_stores/faiss_index_1.png)
+
+
+[Example notebooks can be found here](https://github.com/jerryjliu/gpt_index/tree/main/examples/vector_indices).
diff --git a/docs/index.rst b/docs/index.rst
@@ -59,6 +59,7 @@ At the core of GPT Index is a **data structure**. Instead of relying on world kn
    how_to/embeddings.md
    how_to/custom_prompts.md
    how_to/custom_llms.md
+   how_to/vector_stores.md
 
 
 .. toctree::

diff --git a/docs/reference/indices.rst b/docs/reference/indices.rst
@@ -13,3 +13,4 @@ classes allow for index creation, insertion, and also querying.
    indices/list.rst
    indices/table.rst
    indices/tree.rst
+   indices/vector_store.rst
diff --git a/docs/reference/query.rst b/docs/reference/query.rst
@@ -12,3 +12,4 @@ This doc specifically shows the classes that are used to query indices.
    indices/list_query.rst
    indices/table_query.rst
    indices/tree_query.rst
+   indices/vector_store_query.rst
diff --git a/examples/data_connectors/FaissDemo.ipynb b/examples/data_connectors/FaissDemo.ipynb
@@ -0,0 +1,150 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5d974136",
+   "metadata": {},
+   "source": [
+    "# Faiss Demo"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b541d8ec",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from gpt_index.readers.faiss import FaissReader"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "90d37078",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Build the Faiss index. \n",
+    "# A guide for how to get started with Faiss is here: https://github.com/facebookresearch/faiss/wiki/Getting-started\n",
+    "# We provide some example code below.\n",
+    "\n",
+    "import faiss\n",
+    "\n",
+    "# # Example Code\n",
+    "# d = 8\n",
+    "# docs = np.array([\n",
+    "#     [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],\n",
+    "#     [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],\n",
+    "#     [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],\n",
+    "#     [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4],\n",
+    "#     [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]\n",
+    "# ])\n",
+    "# # id_to_text_map is used for query retrieval\n",
+    "# id_to_text_map = {\n",
+    "#     0: \"aaaaaaaaa bbbbbbb cccccc\",\n",
+    "#     1: \"foooooo barrrrrr\",\n",
+    "#     2: \"tmp tmptmp tmp\",\n",
+    "#     3: \"hello world hello world\",\n",
+    "#     4: \"cat dog cat dog\"\n",
+    "# }\n",
+    "# # build the index\n",
+    "# index = faiss.IndexFlatL2(d)\n",
+    "# index.add(docs)\n",
+    "\n",
+    "id_to_text_map = {\n",
+    "    \"id1\": \"text blob 1\",\n",
+    "    \"id2\": \"text blob 2\",\n",
+    "}\n",
+    "index = ..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd470a09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "reader = FaissReader(index)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c33084c5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# To load data from the Faiss index, you must specify: \n",
+    "# k: top nearest neighbors\n",
+    "# query: a 2D embedding representation of your queries (rows are queries)\n",
+    "k = 4\n",
+    "query1 = np.array([...])\n",
+    "query2 = np.array([...])\n",
+    "query=np.array([query1, query2])\n",
+    "\n",
+    "documents = reader.load_data(query=query, id_to_text_map=id_to_text_map, k=k)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b74697a",
+   "metadata": {},
+   "source": [
+    "### Create index"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e85d7e5b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "index = GPTListIndex(documents)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31c3b68f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = index.query(\"<query_text>\", verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "56fce3fb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(Markdown(f\"<b>{response}</b>\"))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:conda_gpt_env]",
+   "language": "python",
+   "name": "conda-env-conda_gpt_env-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/data_connectors/PineconeDemo.ipynb b/examples/data_connectors/PineconeDemo.ipynb
@@ -0,0 +1,136 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "f3ca56f0-6ef1-426f-bac5-fd7c374d0f51",
+   "metadata": {},
+   "source": [
+    "# Pinecone Demo"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "e2f49003-b952-4b9b-b935-2941f9303773",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "api_key = \"<api_key>\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "262f990a-79c8-413a-9f3c-cd9a3c191307",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from gpt_index.readers.pinecone import PineconeReader"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "252f8163-7297-44b6-a838-709e9662f3d6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "reader = PineconeReader(api_key=api_key, environment=\"us-west1-gcp\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "53b49187-8477-436c-9718-5d2f8cc6fad0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# the id_to_text_map specifies a mapping from the ID specified in Pinecone to your text. \n",
+    "id_to_text_map = {\n",
+    "    \"id1\": \"text blob 1\",\n",
+    "    \"id2\": \"text blob 2\",\n",
+    "}\n",
+    "\n",
+    "# the query_vector is an embedding representation of your query_vector\n",
+    "# Example query vector:\n",
+    "#   query_vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]\n",
+    "\n",
+    "query_vector=[n1, n2, n3, ...]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a88be1c4-603f-48b9-ac64-10a219af4951",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# NOTE: Required args are index_name, id_to_text_map, vector.\n",
+    "# In addition, we pass-through all kwargs that can be passed into the the `Query` operation in Pinecone.\n",
+    "# See the API reference: https://docs.pinecone.io/reference/query\n",
+    "# and also the Python client: https://github.com/pinecone-io/pinecone-python-client\n",
+    "# for more details. \n",
+    "documents = reader.load_data(index_name='quickstart', id_to_text_map=id_to_text_map, top_k=3, vector=query_vector, separate_documents=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4baf59e-fc97-4a1e-947f-354a6438ffa6",
+   "metadata": {},
+   "source": [
+    "### Create index "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "109d083e-f3b4-420b-886b-087c8cf3f98b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "index = GPTListIndex(documents)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e15b9177-9e94-4e4e-9a2e-cd3a288a7faf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = index.query(\"<query_text>\", verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "67b50613-a589-4acf-ba16-10571b415268",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(Markdown(f\"<b>{response}</b>\"))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "gpt_retrieve_venv",
+   "language": "python",
+   "name": "gpt_retrieve_venv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}