Skip to content

Commit

Permalink
community[minor]: Qdrant sparse vector retriever (#14814)
Browse files Browse the repository at this point in the history
## Description

This PR intends to add support for Qdrant's new [sparse vector
retrieval](https://qdrant.tech/articles/sparse-vectors/) by introducing
a new retriever class, `QdrantSparseVectorRetriever`.

Necessary usage docs and integration tests have been added for the
retriever.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
  • Loading branch information
Anush008 and baskaryan committed Dec 20, 2023
1 parent c53fab6 commit 60c70ef
Show file tree
Hide file tree
Showing 5 changed files with 636 additions and 0 deletions.
255 changes: 255 additions & 0 deletions docs/docs/integrations/retrievers/qdrant-sparse.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ce0f17b9",
"metadata": {},
"source": [
"# Qdrant Sparse Vector Retriever\n",
"\n",
">[Qdrant](https://qdrant.tech/) is an open-source, high-performance vector search engine/database.\n",
"\n",
"\n",
">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in Qdrant [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c307b082",
"metadata": {},
"source": [
"Install the 'qdrant_client' package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bba863a2-977c-4add-b5f4-bfc33a80eae5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install qdrant_client"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c10dd962",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from qdrant_client import QdrantClient, models\n",
"\n",
"client = QdrantClient(location=\":memory:\")\n",
"collection_name = \"sparse_collection\"\n",
"vector_name = \"sparse_vector\"\n",
"\n",
"client.create_collection(\n",
" collection_name,\n",
" vectors_config={},\n",
" sparse_vectors_config={\n",
" vector_name: models.SparseVectorParams(\n",
" index=models.SparseIndexParams(\n",
" on_disk=False,\n",
" )\n",
" )\n",
" },\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f47a2bfe",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.retrievers import QdrantSparseVectorRetriever\n",
"from langchain_core.documents import Document"
]
},
{
"cell_type": "markdown",
"id": "41baa0d1",
"metadata": {},
"source": [
"Create a demo encoder function:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f2eff08e",
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"\n",
"def demo_encoder(_: str) -> tuple[list[int], list[float]]:\n",
" return (\n",
" sorted(random.sample(range(100), 100)),\n",
" [random.uniform(0.1, 1.0) for _ in range(100)],\n",
" )\n",
"\n",
"\n",
"# Create a retriever with a demo encoder\n",
"retriever = QdrantSparseVectorRetriever(\n",
" client=client,\n",
" collection_name=collection_name,\n",
" sparse_vector_name=vector_name,\n",
" sparse_encoder=demo_encoder,\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "b68debff",
"metadata": {},
"source": [
"Add some documents:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cd8a7b17",
"metadata": {},
"outputs": [],
"source": [
"docs = [\n",
" Document(\n",
" metadata={\n",
" \"title\": \"Beyond Horizons: AI Chronicles\",\n",
" \"author\": \"Dr. Cassandra Mitchell\",\n",
" },\n",
" page_content=\"An in-depth exploration of the fascinating journey of artificial intelligence, narrated by Dr. Mitchell. This captivating account spans the historical roots, current advancements, and speculative futures of AI, offering a gripping narrative that intertwines technology, ethics, and societal implications.\",\n",
" ),\n",
" Document(\n",
" metadata={\n",
" \"title\": \"Synergy Nexus: Merging Minds with Machines\",\n",
" \"author\": \"Prof. Benjamin S. Anderson\",\n",
" },\n",
" page_content=\"Professor Anderson delves into the synergistic possibilities of human-machine collaboration in 'Synergy Nexus.' The book articulates a vision where humans and AI seamlessly coalesce, creating new dimensions of productivity, creativity, and shared intelligence.\",\n",
" ),\n",
" Document(\n",
" metadata={\n",
" \"title\": \"AI Dilemmas: Navigating the Unknown\",\n",
" \"author\": \"Dr. Elena Rodriguez\",\n",
" },\n",
" page_content=\"Dr. Rodriguez pens an intriguing narrative in 'AI Dilemmas,' probing the uncharted territories of ethical quandaries arising from AI advancements. The book serves as a compass, guiding readers through the complex terrain of moral decisions confronting developers, policymakers, and society as AI evolves.\",\n",
" ),\n",
" Document(\n",
" metadata={\n",
" \"title\": \"Sentient Threads: Weaving AI Consciousness\",\n",
" \"author\": \"Prof. Alexander J. Bennett\",\n",
" },\n",
" page_content=\"In 'Sentient Threads,' Professor Bennett unravels the enigma of AI consciousness, presenting a tapestry of arguments that scrutinize the very essence of machine sentience. The book ignites contemplation on the ethical and philosophical dimensions surrounding the quest for true AI awareness.\",\n",
" ),\n",
" Document(\n",
" metadata={\n",
" \"title\": \"Silent Alchemy: Unseen AI Alleviations\",\n",
" \"author\": \"Dr. Emily Foster\",\n",
" },\n",
" page_content=\"Building upon her previous work, Dr. Foster unveils 'Silent Alchemy,' a profound examination of the covert presence of AI in our daily lives. This illuminating piece reveals the subtle yet impactful ways in which AI invisibly shapes our routines, emphasizing the need for heightened awareness in our technology-driven world.\",\n",
" ),\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "a5e673fa",
"metadata": {},
"source": [
"Perform a retrieval:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3c5970db",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['1a3e0d292e6444d39451d0588ce746dc',\n",
" '19b180dd31e749359d49967e5d5dcab7',\n",
" '8de69e56086f47748e32c9e379e6865b',\n",
" 'f528fac385954e46b89cf8607bf0ee5a',\n",
" 'c1a6249d005d4abd9192b1d0b829cebe']"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.add_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4fffd0af",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content=\"In 'Sentient Threads,' Professor Bennett unravels the enigma of AI consciousness, presenting a tapestry of arguments that scrutinize the very essence of machine sentience. The book ignites contemplation on the ethical and philosophical dimensions surrounding the quest for true AI awareness.\", metadata={'title': 'Sentient Threads: Weaving AI Consciousness', 'author': 'Prof. Alexander J. Bennett'}),\n",
" Document(page_content=\"Dr. Rodriguez pens an intriguing narrative in 'AI Dilemmas,' probing the uncharted territories of ethical quandaries arising from AI advancements. The book serves as a compass, guiding readers through the complex terrain of moral decisions confronting developers, policymakers, and society as AI evolves.\", metadata={'title': 'AI Dilemmas: Navigating the Unknown', 'author': 'Dr. Elena Rodriguez'}),\n",
" Document(page_content=\"Professor Anderson delves into the synergistic possibilities of human-machine collaboration in 'Synergy Nexus.' The book articulates a vision where humans and AI seamlessly coalesce, creating new dimensions of productivity, creativity, and shared intelligence.\", metadata={'title': 'Synergy Nexus: Merging Minds with Machines', 'author': 'Prof. Benjamin S. Anderson'}),\n",
" Document(page_content='An in-depth exploration of the fascinating journey of artificial intelligence, narrated by Dr. Mitchell. This captivating account spans the historical roots, current advancements, and speculative futures of AI, offering a gripping narrative that intertwines technology, ethics, and societal implications.', metadata={'title': 'Beyond Horizons: AI Chronicles', 'author': 'Dr. Cassandra Mitchell'})]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.get_relevant_documents(\n",
" \"Life and ethical dilemmas of AI\",\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
4 changes: 4 additions & 0 deletions libs/community/langchain_community/retrievers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@
PineconeHybridSearchRetriever,
)
from langchain_community.retrievers.pubmed import PubMedRetriever
from langchain_community.retrievers.qdrant_sparse_vector_retriever import (
QdrantSparseVectorRetriever,
)
from langchain_community.retrievers.remote_retriever import RemoteLangChainRetriever
from langchain_community.retrievers.svm import SVMRetriever
from langchain_community.retrievers.tavily_search_api import TavilySearchAPIRetriever
Expand Down Expand Up @@ -93,6 +96,7 @@
"OutlineRetriever",
"PineconeHybridSearchRetriever",
"PubMedRetriever",
"QdrantSparseVectorRetriever",
"RemoteLangChainRetriever",
"SVMRetriever",
"TavilySearchAPIRetriever",
Expand Down

0 comments on commit 60c70ef

Please sign in to comment.