# Retrievers
docs: https://python.langchain.com/docs/modules/data_connection/retrievers/

> Retriever(检索器)是一个接口，可以根据非结构化查询返回文档。它比向量存储更通用。Retriever不需要能够存储文档，只需要返回（或检索）它。向量存储可以用作检索器的支撑，但还有其他类型的检索器。

准确来说是 vectorstore retriever 

**By default, Langchain uses Chroma as the vectorstore to index and search embeddings.**

Q&A over documents 4 steps:
1. Create an index.
2. Create a Retriever from that index.
3. Create a question answering chain.
4. Ask questions.

**`VectorstoreIndexCreator` 包装器可以构建一个最快速，最简单的基于文档的Q&A**


In [1]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.document_loaders import TextLoader

loader = TextLoader('../../example/docs/state_of_the_union.txt', encoding='utf8')

In [3]:
# 创建 index
from langchain.indexes import VectorstoreIndexCreator

index = VectorstoreIndexCreator().from_loaders([loader])

# 利用index来查询
index.query('What did the president say about Ketanji Brown Jackson')

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

In [4]:
index.query_with_sources('What did the president say about Ketanji Brown Jackson')

{'question': 'What did the president say about Ketanji Brown Jackson',
 'answer': " The president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, one of the nation's top legal minds, to continue Justice Breyer's legacy of excellence. He also said that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\n",
 'sources': '../../example/docs/state_of_the_union.txt'}

In [5]:
index.vectorstore

<langchain.vectorstores.chroma.Chroma at 0x10993fd90>

In [6]:
# If we then want to access the VectorstoreRetriever, we can do that with:
index.vectorstore.as_retriever()

VectorStoreRetriever(vectorstore=<langchain.vectorstores.chroma.Chroma object at 0x10993fd90>, search_type='similarity', search_kwargs={})

## VectorstoreIndexCreator 背后的魔法
A lot of the magic is being hid in this `VectorstoreIndexCreator`. What is this doing?

1. Spliting documents into chunks
2. Creating embeddings for each document
3. Storing documents and embeddings in a vectorstore


这一系列动作创建了 vector store，然后index的创建 `index = VectorstoreIndexCreator().from_loaders([loader])`
 
就是:

```python
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type='stuff', retriever=vector_store.as_retriever())
```

`index.query(xx)` 则就是 `qa.run('xx')`


`VectorstoreIndexCreator` 就是上面所有逻辑的包装器

> `VectorstoreIndexCreator` is just a wrapper around all this logic. 
It is configurable in the text splitter it uses, the embeddings it uses, and the vectorstore it uses. For example, you can configure it as below:

```python
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=Chroma,
    embedding=OpenAIEmbeddings(),
    text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
)
```

### 综上, VectorstoreIndexCreator 包装器可以构建一个最快速，最简单的基于文档的Q&A

In [7]:
# 综上, VectorstoreIndexCreator 包装器可以构建一个最快速，最简单的基于文档的Q&A
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator

loader = TextLoader('../../example/docs/state_of_the_union.txt', encoding='utf8')
index = VectorstoreIndexCreator().from_loaders([loader])
index.query('What did the president say about Ketanji Brown Jackson?')

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."