Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DocArray vector stores #4483

Merged
merged 25 commits into from May 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
41433e6
feat: add in-memory and hnswlib vectorstore
anna-charlotte Apr 24, 2023
b687fd4
refactor: use abtract VecStoreFromDocIndex for in memory and hnswlib …
anna-charlotte Apr 27, 2023
de262f9
fix: clean up and add dependencies
anna-charlotte Apr 27, 2023
30456bc
Add more configurations for hnswlib
anna-charlotte Apr 27, 2023
5d2324a
refactor: rename InMemory to InMemoryExactSearch
anna-charlotte Apr 27, 2023
ecc73b4
fix: change space default for hnswlib to l2
anna-charlotte Apr 28, 2023
cf08b7c
chore: fetch docarray fork and fix conflicts
jupyterjazz May 8, 2023
3eb3fdc
feat: add example notebooks
jupyterjazz May 8, 2023
d38cb10
refactor: rename classes
jupyterjazz May 9, 2023
8c0a611
refactor: modify notebooks
jupyterjazz May 9, 2023
a216ab4
Merge pull request #1 from jupyterjazz/docarray-vectorstore
jupyterjazz May 9, 2023
4af685c
refactor: merge upstream
jupyterjazz May 9, 2023
b920c15
refactor: naming adjustments
jupyterjazz May 9, 2023
d8df4bb
refactor: requested changes
jupyterjazz May 10, 2023
31b1275
refactor: merge conflicts
jupyterjazz May 10, 2023
4694bb4
style: resolve lint errors
jupyterjazz May 10, 2023
45b8c09
style: run black
jupyterjazz May 10, 2023
15c5911
style: ruff ruff
jupyterjazz May 10, 2023
b0ae2ab
Merge remote-tracking branch 'jupyterjazz/master' into dev2049/docarray
dev2049 May 10, 2023
cf9b918
Merge branch 'master' into dev2049/docarray
dev2049 May 10, 2023
8b1638b
cr
dev2049 May 10, 2023
cc6e86e
wip
dev2049 May 10, 2023
23e9ad6
cr
dev2049 May 10, 2023
672baf3
docs
dev2049 May 10, 2023
e97c894
nit
dev2049 May 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
227 changes: 227 additions & 0 deletions docs/modules/indexes/vectorstores/examples/docarray_hnsw.ipynb
@@ -0,0 +1,227 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2ce41f46-5711-4311-b04d-2fe233ac5b1b",
"metadata": {},
"source": [
"# DocArrayHnswSearch\n",
"\n",
">[DocArrayHnswSearch](https://docs.docarray.org/user_guide/storing/index_hnswlib/) is a lightweight Document Index implementation provided by [Docarray](https://docs.docarray.org/) that runs fully locally and is best suited for small- to medium-sized datasets. It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and stores all other data in [SQLite](https://www.sqlite.org/index.html).\n",
"\n",
"This notebook shows how to use functionality related to the `DocArrayHnswSearch`."
]
},
{
"cell_type": "markdown",
"id": "7ee37d28",
"metadata": {},
"source": [
"# Setup\n",
"\n",
"Uncomment the below cells to install docarray and get/set your OpenAI api key if you haven't already done so."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ce1b8cb-dbf0-40c3-99ee-04f28143331b",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# !pip install \"docarray[hnswlib]\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "878f17df-100f-4854-9e87-472cf36d51f3",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# Get an OpenAI token: https://platform.openai.com/account/api-keys\n",
"\n",
"# import os\n",
"# from getpass import getpass\n",
"\n",
"# OPENAI_API_KEY = getpass()\n",
"\n",
"# os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
]
},
{
"cell_type": "markdown",
"id": "8dbb6de2",
"metadata": {
"tags": []
},
"source": [
"# Using DocArrayHnswSearch"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b757afef-ef0a-465d-8e8a-9aadb9c32b88",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import DocArrayHnswSearch\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "605e200e-e711-486b-b36e-cbe5dd2512d7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"documents = TextLoader('../../../state_of_the_union.txt').load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
"db = DocArrayHnswSearch.from_documents(docs, embeddings, work_dir='hnswlib_store/', n_dim=1536)"
]
},
{
"cell_type": "markdown",
"id": "ed6f905b-4853-4a44-9730-614aa8e22b78",
"metadata": {},
"source": [
"## Similarity search"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4d7e742f-2002-449d-a10e-16046890906c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0da9e26f-1fc2-48e6-95a7-f692c853bbd3",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "3febb987-e903-416f-af26-6897d84c8d61",
"metadata": {},
"source": [
"## Similarity search with score"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "40764fdd-357d-475a-8152-5f1979d61a45",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"docs = db.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a479fc46-b299-4330-89b9-e9b5a218ea03",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={}),\n",
" 0.36962226)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "4d3d4e97-5d2b-4571-8ff9-e3f6b6778714",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import shutil\n",
"# delete the dir\n",
"shutil.rmtree('hnswlib_store')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}