# Qdrant

>[Qdrant](https://qdrant.tech/documentation/)（读作：quadrant）是一个向量相似性搜索引擎。它提供了一个方便的API生产就绪服务，用于存储、搜索和管理带有附加负载的点--向量。`Qdrant`专为扩展过滤器支持而设计。它使其非常适用于各种神经网络或基于语义的匹配、多方面搜索和其他应用程序。


这个笔记本展示了如何使用与`Qdrant`向量数据库相关的功能。

有不同的运行模式`Qdrant`，根据所选择的模式将有一些微妙的区别。选项包括：
-本地模式，不需要服务器
-预置服务器部署
-Qdrant云

请参见[安装说明](https://qdrant.tech/documentation/install/)。

In [None]:
!pip install qdrant-client

我们想要使用`OpenAIEmbeddings`，因此必须获取OpenAI API密钥。

In [2]:
import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

OpenAI API Key: ········


In [3]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Qdrant
from langchain.document_loaders import TextLoader

In [4]:
loader = TextLoader('../../../state_of_the_union.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

## 从LangChain连接到Qdrant

### 本地模式

Python客户端允许您以本地模式运行相同的代码，而无需运行Qdrant服务器。这对于测试或调试或者计划仅存储少量矢量非常有用，嵌入可能完全保留在内存中或写入磁盘。

#### 在内存中

对于某些测试场景和快速实验，您可能更喜欢仅在内存中保存所有数据，因此当客户端销毁时（通常在脚本/笔记本的末尾）会丢失所有数据。

In [5]:
qdrant = Qdrant.from_documents(
    docs, embeddings, 
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="my_documents",
)

#### 磁盘存储

本地模式，不使用Qdrant服务器，还可以将矢量存储在磁盘上，因此它们在运行之间得以保留。

In [6]:
qdrant = Qdrant.from_documents(
    docs, embeddings, 
    path="/tmp/local_qdrant",
    collection_name="my_documents",
)

### 预置服务器部署

无论您选择使用[Docker容器](https://qdrant.tech/documentation/install/)本地启动Qdrant，还是使用[官方Helm图表](https://github.com/qdrant/qdrant-helm)选择Kubernetes部署，连接到此类实例的方式将是相同的。您需要提供指向服务的URL。

In [5]:
url = "<---qdrant url here --->"
qdrant = Qdrant.from_documents(
    docs, embeddings, 
    url, prefer_grpc=True, 
    collection_name="my_documents",
)

### Qdrant云

如果您不想忙于管理基础设施，则可以选择在[Qdrant Cloud](https://cloud.qdrant.io/)上设置完全托管的Qdrant群集。包括永久免费1GB群集，可供试用。使用托管版本的Qdrant的主要区别在于，您需要提供API密钥来保护您的部署不会被公开访问。

In [6]:
url = "<---qdrant cloud cluster url here --->"
api_key = "<---api key here--->"
qdrant = Qdrant.from_documents(
    docs, embeddings, 
    url, prefer_grpc=True, api_key=api_key, 
    collection_name="my_documents",
)

## 重用相同的集合

无论是`Qdrant.from_texts`还是`Qdrant.from_documents`方法在使用LangChain时都很好，但是它们将销毁集合并从头开始创建集合！如果您要重复使用现有的集合，您可以始终创建`QdrantClient`实例并传递连接详细信息的Qdrant实例。

In [7]:
del qdrant

In [8]:
import qdrant_client

client = qdrant_client.QdrantClient(
    path="/tmp/local_qdrant", prefer_grpc=True
)
qdrant = Qdrant(
    client=client, collection_name="my_documents", 
    embeddings=embeddings
)

## 相似度搜索

使用Qdrant向量存储的最简单场景是执行相似度搜索。在幕后，我们的查询将使用`embedding_function`进行编码，并用于在Qdrant集合中查找相似的文档。

In [7]:
query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.similarity_search(query)

In [8]:
print(found_docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


## 带分数的相似度搜索

有时我们可能想要执行搜索，但也要获取相关性得分以知道特定结果的好坏。返回的距离得分是余弦距离。因此，得分越低越好。

In [11]:
query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.similarity_search_with_score(query)

In [12]:
document, score = found_docs[0]
print(document.page_content)
print(f"\nScore: {score}")

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Score: 0.8153784913324512


###元数据过滤

Qdrant具有丰富的类型支持的[广泛过滤系统] (https://qdrant.tech/documentation/concepts/filtering/)。在LangChain中也可以使用过滤器，通过向`similarity_search_with_score`和`similarity_search`方法传递其他参数来实现。

```python
from qdrant_client.http import models as rest

query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.similarity_search_with_score(query, filter=rest.Filter(...))
```

## 最大边际相关性搜索（MMR）

如果您想查找一些相似的文档，但还想接收多样化的结果，则应考虑使用MMR方法。最大边际相关性可优化查询和选定文档之间的相似性以及多样性。

In [13]:
query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.max_marginal_relevance_search(query, k=2, fetch_k=10)

In [14]:
for i, doc in enumerate(found_docs):
    print(f"{i + 1}.", doc.page_content, "\n")

1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. 

2. We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. 

I recently visited the New York City Police Department days after the fu

## Qdrant作为检索器

作为所有其他矢量存储的Qdrant，通过使用余弦相似度，是LangChain检索器。

In [15]:
retriever = qdrant.as_retriever()
retriever

VectorStoreRetriever(vectorstore=<langchain.vectorstores.qdrant.Qdrant object at 0x7fc4e5720a00>, search_type='similarity', search_kwargs={})

还可以指定使用MMR作为搜索策略，而不是相似性。

In [16]:
retriever = qdrant.as_retriever(search_type="mmr")
retriever

VectorStoreRetriever(vectorstore=<langchain.vectorstores.qdrant.Qdrant object at 0x7fc4e5720a00>, search_type='mmr', search_kwargs={})

In [17]:
query = "What did the president say about Ketanji Brown Jackson"
retriever.get_relevant_documents(query)[0]

Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})

## 自定义Qdrant

Qdrant存储您的矢量嵌入以及可选的类似JSON的有效负载。有效负载是可选的，但由于LangChain假定嵌入是从文档生成的，因此我们保留上下文数据，以便您还可以提取原始文本。

默认情况下，您的文档将存储在以下有效负载结构中：

```json
{
    "page_content": "Lorem ipsum dolor sit amet",
    "metadata": {
        "foo": "bar"
    }
}
```

但是，您可以决定使用不同的键来存储页面内容和元数据。如果您已经有一个要重复使用的集合，则始终可以更改它。 

In [19]:
Qdrant.from_documents(
    docs, embeddings, 
    location=":memory:",
    collection_name="my_documents_2",
    content_payload_key="my_page_content_key",
    metadata_payload_key="my_meta",
)

<langchain.vectorstores.qdrant.Qdrant at 0x7fc4e2baa230>