-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
community[minor]: Qdrant sparse vector retriever (#14814)
## Description This PR intends to add support for Qdrant's new [sparse vector retrieval](https://qdrant.tech/articles/sparse-vectors/) by introducing a new retriever class, `QdrantSparseVectorRetriever`. Necessary usage docs and integration tests have been added for the retriever. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
- Loading branch information
Showing
5 changed files
with
636 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,255 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ce0f17b9", | ||
"metadata": {}, | ||
"source": [ | ||
"# Qdrant Sparse Vector Retriever\n", | ||
"\n", | ||
">[Qdrant](https://qdrant.tech/) is an open-source, high-performance vector search engine/database.\n", | ||
"\n", | ||
"\n", | ||
">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in Qdrant [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "c307b082", | ||
"metadata": {}, | ||
"source": [ | ||
"Install the 'qdrant_client' package:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "bba863a2-977c-4add-b5f4-bfc33a80eae5", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install qdrant_client" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "c10dd962", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"True" | ||
] | ||
}, | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from qdrant_client import QdrantClient, models\n", | ||
"\n", | ||
"client = QdrantClient(location=\":memory:\")\n", | ||
"collection_name = \"sparse_collection\"\n", | ||
"vector_name = \"sparse_vector\"\n", | ||
"\n", | ||
"client.create_collection(\n", | ||
" collection_name,\n", | ||
" vectors_config={},\n", | ||
" sparse_vectors_config={\n", | ||
" vector_name: models.SparseVectorParams(\n", | ||
" index=models.SparseIndexParams(\n", | ||
" on_disk=False,\n", | ||
" )\n", | ||
" )\n", | ||
" },\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "f47a2bfe", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_community.retrievers import QdrantSparseVectorRetriever\n", | ||
"from langchain_core.documents import Document" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "41baa0d1", | ||
"metadata": {}, | ||
"source": [ | ||
"Create a demo encoder function:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "f2eff08e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import random\n", | ||
"\n", | ||
"\n", | ||
"def demo_encoder(_: str) -> tuple[list[int], list[float]]:\n", | ||
" return (\n", | ||
" sorted(random.sample(range(100), 100)),\n", | ||
" [random.uniform(0.1, 1.0) for _ in range(100)],\n", | ||
" )\n", | ||
"\n", | ||
"\n", | ||
"# Create a retriever with a demo encoder\n", | ||
"retriever = QdrantSparseVectorRetriever(\n", | ||
" client=client,\n", | ||
" collection_name=collection_name,\n", | ||
" sparse_vector_name=vector_name,\n", | ||
" sparse_encoder=demo_encoder,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "b68debff", | ||
"metadata": {}, | ||
"source": [ | ||
"Add some documents:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"id": "cd8a7b17", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"docs = [\n", | ||
" Document(\n", | ||
" metadata={\n", | ||
" \"title\": \"Beyond Horizons: AI Chronicles\",\n", | ||
" \"author\": \"Dr. Cassandra Mitchell\",\n", | ||
" },\n", | ||
" page_content=\"An in-depth exploration of the fascinating journey of artificial intelligence, narrated by Dr. Mitchell. This captivating account spans the historical roots, current advancements, and speculative futures of AI, offering a gripping narrative that intertwines technology, ethics, and societal implications.\",\n", | ||
" ),\n", | ||
" Document(\n", | ||
" metadata={\n", | ||
" \"title\": \"Synergy Nexus: Merging Minds with Machines\",\n", | ||
" \"author\": \"Prof. Benjamin S. Anderson\",\n", | ||
" },\n", | ||
" page_content=\"Professor Anderson delves into the synergistic possibilities of human-machine collaboration in 'Synergy Nexus.' The book articulates a vision where humans and AI seamlessly coalesce, creating new dimensions of productivity, creativity, and shared intelligence.\",\n", | ||
" ),\n", | ||
" Document(\n", | ||
" metadata={\n", | ||
" \"title\": \"AI Dilemmas: Navigating the Unknown\",\n", | ||
" \"author\": \"Dr. Elena Rodriguez\",\n", | ||
" },\n", | ||
" page_content=\"Dr. Rodriguez pens an intriguing narrative in 'AI Dilemmas,' probing the uncharted territories of ethical quandaries arising from AI advancements. The book serves as a compass, guiding readers through the complex terrain of moral decisions confronting developers, policymakers, and society as AI evolves.\",\n", | ||
" ),\n", | ||
" Document(\n", | ||
" metadata={\n", | ||
" \"title\": \"Sentient Threads: Weaving AI Consciousness\",\n", | ||
" \"author\": \"Prof. Alexander J. Bennett\",\n", | ||
" },\n", | ||
" page_content=\"In 'Sentient Threads,' Professor Bennett unravels the enigma of AI consciousness, presenting a tapestry of arguments that scrutinize the very essence of machine sentience. The book ignites contemplation on the ethical and philosophical dimensions surrounding the quest for true AI awareness.\",\n", | ||
" ),\n", | ||
" Document(\n", | ||
" metadata={\n", | ||
" \"title\": \"Silent Alchemy: Unseen AI Alleviations\",\n", | ||
" \"author\": \"Dr. Emily Foster\",\n", | ||
" },\n", | ||
" page_content=\"Building upon her previous work, Dr. Foster unveils 'Silent Alchemy,' a profound examination of the covert presence of AI in our daily lives. This illuminating piece reveals the subtle yet impactful ways in which AI invisibly shapes our routines, emphasizing the need for heightened awareness in our technology-driven world.\",\n", | ||
" ),\n", | ||
"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a5e673fa", | ||
"metadata": {}, | ||
"source": [ | ||
"Perform a retrieval:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"id": "3c5970db", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"['1a3e0d292e6444d39451d0588ce746dc',\n", | ||
" '19b180dd31e749359d49967e5d5dcab7',\n", | ||
" '8de69e56086f47748e32c9e379e6865b',\n", | ||
" 'f528fac385954e46b89cf8607bf0ee5a',\n", | ||
" 'c1a6249d005d4abd9192b1d0b829cebe']" | ||
] | ||
}, | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"retriever.add_documents(docs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"id": "4fffd0af", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(page_content=\"In 'Sentient Threads,' Professor Bennett unravels the enigma of AI consciousness, presenting a tapestry of arguments that scrutinize the very essence of machine sentience. The book ignites contemplation on the ethical and philosophical dimensions surrounding the quest for true AI awareness.\", metadata={'title': 'Sentient Threads: Weaving AI Consciousness', 'author': 'Prof. Alexander J. Bennett'}),\n", | ||
" Document(page_content=\"Dr. Rodriguez pens an intriguing narrative in 'AI Dilemmas,' probing the uncharted territories of ethical quandaries arising from AI advancements. The book serves as a compass, guiding readers through the complex terrain of moral decisions confronting developers, policymakers, and society as AI evolves.\", metadata={'title': 'AI Dilemmas: Navigating the Unknown', 'author': 'Dr. Elena Rodriguez'}),\n", | ||
" Document(page_content=\"Professor Anderson delves into the synergistic possibilities of human-machine collaboration in 'Synergy Nexus.' The book articulates a vision where humans and AI seamlessly coalesce, creating new dimensions of productivity, creativity, and shared intelligence.\", metadata={'title': 'Synergy Nexus: Merging Minds with Machines', 'author': 'Prof. Benjamin S. Anderson'}),\n", | ||
" Document(page_content='An in-depth exploration of the fascinating journey of artificial intelligence, narrated by Dr. Mitchell. This captivating account spans the historical roots, current advancements, and speculative futures of AI, offering a gripping narrative that intertwines technology, ethics, and societal implications.', metadata={'title': 'Beyond Horizons: AI Chronicles', 'author': 'Dr. Cassandra Mitchell'})]" | ||
] | ||
}, | ||
"execution_count": 10, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"retriever.get_relevant_documents(\n", | ||
" \"Life and ethical dilemmas of AI\",\n", | ||
")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.7" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.