LangChain 主要关注于构建索引，目标是使用它们作为检索器。为了更好地理解这意味着什么，有必要突出显示基本检索器接口是什么。LangChain 的 baseRetriever 类如下:

In [1]:
from abc import ABC, abstractmethod
from typing import List
from langchain.schema import Document

class BaseRetriever(ABC):
    @abstractmethod
    def get_relevant_documents(self, query: str) -> List[Document]:
        """Get texts relevant for a query.

        Args:
            query: string to find relevant texts for

        Returns:
            List of relevant documents
        """

通过文件回答问题包括四个步骤:
    创建索引
    从该索引创建检索器
    创建一个问题回答链
    问问题！

In [2]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.document_loaders import TextLoader
loader = TextLoader('./state_of_the_union.txt', encoding='utf8')

创建一行索引
VectorstoreIndexCreator

In [4]:
from langchain.indexes import VectorstoreIndexCreator
index = VectorstoreIndexCreator().from_loaders([loader])

创建了索引，我们可以使用它来询问数据的问题

In [5]:
query = "What did the president say about Ketanji Brown Jackson"
index.query(query)

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

VectorstoreIndexCreator 加载文件后有三个主要步骤:
将文档分割成块
为每个文档创建嵌入
在向量库中存储文档和嵌入

我们将把文档分割成块。

In [7]:
documents = loader.load()


from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)


然后，我们将选择要使用的嵌入。

In [10]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

现在我们创建用作索引的向量存储。

In [11]:
from langchain.vectorstores import Chroma
db = Chroma.from_documents(texts, embeddings)

这就是创建索引的过程，然后，我们在一个检索接口中公开这个索引。
然后，像以前一样，我们创建一个链，并使用它来回答问题！

In [12]:
retriever = db.as_retriever()
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

In [13]:
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence. He also said that she is a former top litigator, a former federal public defender, and from a family of public school educators and police officers. He said she is a consensus builder and has received a broad range of support, from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."