# 前K个
* 决定要检索多少个嵌入以构建您问题的答案

## 创建你的 .env 文件
* 在 GitHub 仓库中，我们包含了一个名为 .env.example 的文件
* 将该文件重命名为 .env 文件，您将在这里添加您的机密 API 密钥。记得包括：
* OPENAI_API_KEY=your_openai_api_key
* LANGCHAIN_TRACING_V2=true
* LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
* LANGCHAIN_API_KEY=your_langchain_api_key
* LANGCHAIN_PROJECT=your_project_name

我们将把我们的LangSmith项目命名为**006-top-k**。

## 连接到位于此笔记本相同目录中的 .env 文件

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

## 向量数据库（即向量存储）：存储和搜索嵌入
* 请参阅[文档页面](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/)。
* 请参阅向量存储的列表[这里](https://python.langchain.com/v0.1/docs/integrations/vectorstores/)。

In [6]:
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_chroma import Chroma

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
loaded_document = TextLoader('./data/state_of_the_union.txt').load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

chunks_of_text = text_splitter.split_documents(loaded_document)

vector_db = Chroma.from_documents(chunks_of_text, OpenAIEmbeddings())

In [7]:
question = "What did the president say about the John Lewis Voting Rights Act?"

response = vector_db.similarity_search(question)

print(response[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


## 检索器：根据问题返回响应
* 检索器是一个接口，它根据非结构化查询返回文档。它比向量存储更为通用。
* 检索器不需要能够存储文档，只需返回（或检索）它们。
* 向量存储可以用作检索器的基础，但也有其他类型的检索器。
* 请查看文档页面[这里](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/)。
* 请查看第三方检索器的列表[这里](https://python.langchain.com/v0.1/docs/integrations/retrievers/)。

#### 向量存储作为检索器

In [8]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("./data/state_of_the_union.txt")

In [10]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loaded_document = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

chunks_of_text = text_splitter.split_documents(loaded_document)

embeddings = OpenAIEmbeddings()

vector_db = FAISS.from_documents(chunks_of_text, embeddings)

In [11]:
retriever = vector_db.as_retriever()

#### 简单使用无需 LCEL

In [12]:
response = retriever.invoke("what did he say about ketanji brown jackson?")

In [13]:
len(response)

4

In [14]:
response[0]

Document(metadata={'source': './data/state_of_the_union.txt'}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.')

### 指定前k个

In [15]:
retriever = vector_db.as_retriever(search_kwargs={"k": 1})

这是对发生情况的简单解释：

- **`search_kwargs={"k": 1}`**：这个参数是在从向量数据库创建检索对象时设置的。在这里，`k`代表要检索的与查询最相似的结果或文档数量。通过将`k`设置为1，代码指定检索器应仅返回与查询最相关的单个文档或文本片段。

因此，当您使用该参数时：
- **目的**：它告诉检索器将检索的文档数量限制为最相关结果中的前1个。
- **用法**：当您仅对查询的最佳匹配感兴趣，而不是其他紧密相关的匹配时，这一点特别有用。

在这个例子中，当检索器被调用以查询有关“基坦吉·布朗·杰克逊”的说法时，它只会从数据库中返回单个最相关的信息或文档，而不是提供几个可能的答案。

这种方法对于需要清晰和精确比广度更有价值的应用非常有用，例如当您需要对特定问题的简洁回答，而不需要可能稍微不相关的额外上下文时。

In [16]:
response = retriever.invoke("what did he say about ketanji brown jackson?")

In [17]:
len(response)

1

In [18]:
response

[Document(metadata={'source': './data/state_of_the_union.txt'}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.')]