# 4 Ways to Do Question Answering in LangChain

## Load documents

In [1]:
# load document
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("data/example.pdf") # 55 pages document
documents = loader.load()

## Method 1: load_qa_chain

In [3]:
from langchain.chat_models import ChatOpenAI # using gpt-3.5 turbo
from langchain.chains.question_answering import load_qa_chain

chain = load_qa_chain(llm=ChatOpenAI(), chain_type="map_reduce")
query = "How many AI publications?"
chain.run(input_documents=documents, question=query)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 77515e6e1a1fcadaf0d0460f5fea1b5a in your message.).


'The total number of AI publications is not explicitly provided in the given portion of the document. However, it is mentioned that the number of AI publications has more than doubled since 2010 and ranges from thousands to almost 500,000 in 2021, depending on the type of publication and collaboration. Some charts and figures provide specific numbers for certain types of publications, but there is no one definitive answer to the question of how many AI publications.'

### if there is a set of documents

In [None]:
### For multiple documents 
loaders = [....]
documents = []
for loader in loaders:
    documents.extend(loader.load())

### But if the document is supper long that it exceeds the token limit?

Solution 1: Chain Type

The default `chain_type="stuff"` uses ALL of the text from the documents in the prompt. It actually doesn’t work with our example because it exceeds the token limit and causes rate-limiting errors. That’s why in this example, we had to use other chain types for example "map_reduce". What are the other chain types?

- `map_reduce`: It separates texts into batches (as an example, you can define batch size in llm=OpenAI(batch_size=5)), feeds each batch with the question to LLM separately, and comes up with the final answer based on the answers from each batch. 它将文本分成批次（例如，您可以在 llm=OpenAI(batch_size=5) 中定义批次大小），将每个批次与问题分别提供给 LLM，并根据每个批次的答案得出最终答案.并发可能触发`RateLimitError`错误
- `refine` : It separates texts into batches, feeds the first batch to LLM, and feeds the answer and the second batch to LLM. It refines the answer by going through all the batches. 它将文本分成批次，将第一批提供给 LLM，然后再将答案和第二批提供给 LLM。它通过遍历所有批次来优化答案。
- `map-rerank`: It separates texts into batches, feeds each batch to LLM, returns a score of how fully it answers the question, and comes up with the final answer based on the high-scored answers from each batch. 它将文本分成批次，将每批次提供给 LLM，返回它回答问题的完整程度的分数，并根据每批次的高分答案得出最终答案。

Solution 2: RetrievalQA

One issue with using ALL of the text is that it can be very costly because you are feeding all the texts to OpenAI API and the API is charged by the number of tokens. A better solution is to retrieve relevant text chunks first and only use the relevant text chunks in the language model. I’m going to go through the details of RetrievalQA next. 使用所有文本的一个问题是它可能非常昂贵，因为您将所有文本提供给 OpenAI API，而 API 按令牌数量收费。更好的解决方案是先检索相关的文本块，并只使用语言模型中的相关文本块。接下来我将详细介绍 RetrievalQA。

## Method 2: RetrievalQA

`RetrievalQA` chain actually uses `load_qa_chain` under the hood. We retrieve the most relevant chunk of text and feed those to the language model. `RetrievalQA` 链实际上在底层使用了 `load_qa_chain`。我们检索最相关的文本块并将其提供给语言模型。

In [None]:
from langchain.chains import RetrievalQA
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
# select which embeddings we want to use
embeddings = OpenAIEmbeddings()
# create the vectorestore to use as the index
db = Chroma.from_documents(texts, embeddings)
# expose this index in a retriever interface
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k":2}) # defined k as 2 meaning that we are only interested in getting two relevant text chunks.
# create a chain to answer questions 
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(), chain_type="stuff", retriever=retriever, return_source_documents=True)
query = "How many AI publications in 2021?"
result = qa({"query": query})

In [8]:
result

{'query': 'How many AI publications in 2021?',
 'result': 'According to the context, the total number of AI conference publications in 2021 was 85,094, and the total number of AI publications (including journal articles, conference papers, repositories, and patents) was almost 500,000.',
 'source_documents': [Document(page_content='Chapter 1 Preview 17\nArtificial Intelligence\nIndex Report 2023\nAI Conference Publications\nOverview\nThe number of AI conference publications peaked in 2019, and fell 20.4% below the peak in 2021 (Figure 1.1.13). \nThe total number of 2021 AI conference publications, 85,094, was marginally greater than the 2010 total of 75,592.1.1 PublicationsChapter 1: Research and Development\n85.09\n2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021020406080100Number of AI Conference Publications (in Thousands)Number of AI Conference Publications, 2010–21\nSource: Center for Security and Emerging Technology, 2022 | Chart: 2023 AI Index Report\nFigure 1.1.13', 

**Options**:
There are various options for you to choose from in this process:

- [embeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html): In the example, we used OpenAI Embeddings. But there are many other embedding options such as Cohere Embeddings, and HuggingFaceEmbeddings from specific models.
- [TextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html): We used Character Text Splitter in the example where the text is split by a single character. You can also different text splitters and different tokens mentioned in this doc.
- [VectorStore](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html): We used Chroma as our vector database where we store our embedded text vectors. Other popular options are FAISS, Mulvus, and Pinecone.
- [Retrievers](https://python.langchain.com/en/latest/modules/indexes/retrievers.html): We used a VectoreStoreRetriver, which is backed by a VectorStore. To retrieve text, there are two search types you can choose: search_type: “similarity” or “mmr”. search_type="similarity" uses similarity search in the retriever object where it selects text chunk vectors that are most similar to the question vector. search_type="mmr" uses the maximum marginal relevance search where it optimizes for similarity to query AND diversity among selected documents.
- [Chain Type](https://python.langchain.com/en/latest/modules/chains/index_examples/question_answering.html): same as method 1. You can also define the chain type as one of the four options: “stuff”, “map reduce”, “refine”, “map_rerank”.

## Method 3: VectorstoreIndexCreator

`VectorstoreIndexCreator` is a wrapper around the above functionality. It is exactly the same under the hood, but just exposes a higher-level interface to let you get started in three lines of code:

`VectorstoreIndexCreator` 是上述功能的包装器。它底层的实现是完全一样的，只是暴露了一个更高级的接口，让你在三行代码中开始：

In [10]:
index = VectorstoreIndexCreator().from_loaders([loader])
query = "What's the total number in AI publications?"
index.query(llm=ChatOpenAI(), question=query, chain_type="stuff")

'The total number of AI publications has more than doubled from 200,000 in 2010 to almost 500,000 in 2021, according to Figure 1.1.1 in the Artificial Intelligence Index Report 2023.'

![figure 1.1.1](img/figure.png)

`VectorstoreIndexCreator` 可以指定里面的相关参数

In [19]:
index = VectorstoreIndexCreator(
    text_splitter=CharacterTextSplitter(chunk_size=10000, chunk_overlap=0),
    embedding=OpenAIEmbeddings(),
    vectorstore_cls=Chroma,
    vectorstore_kwargs={"k":2} # defined k as 2 meaning that we are only interested in getting two relevant text chunks.
).from_loaders([loader])
query = "What's the total number of AI pulblications?"
index.query(llm=ChatOpenAI(), question=query, chain_type="stuff")

'The total number of AI publications globally from 2010 to 2021 was almost 500,000, according to Figure 1.1.1 in Chapter 1 Preview 5.'

## Method 4: ConversationalRetrievalChain

ConversationalRetrievalChain is very similar to method 2 RetrievalQA. It added an additional parameter chat_history to pass in chat history which can be used for follow-up questions.

ConversationalRetrievalChain = conversation memory + RetrievalQAChain

In [21]:
from langchain.chains import ConversationalRetrievalChain

# load ddocuments
loader = PyPDFLoader("data/example.pdf")
documents = loader.load()

# Splite the ddocuments into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# select which embbeddings we want to use 
embeddings = OpenAIEmbeddings()

# create the vectorstore to use as index
db = Chroma.from_documents(texts, embeddings)

# expose this index in a Retriever interface
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k":2})

# create a chain to answer the quesion
qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(), retriever)

In [23]:
chat_history = []
query = "What's the total number of AI publications?"
result = qa({"question":query, "chat_history":chat_history})

In [24]:
result

{'question': "What's the total number of AI publications?",
 'chat_history': [],
 'answer': 'The total number of AI publications increased from 200,000 in 2010 to almost 500,000 in 2021.'}

In [25]:
result['answer']

'The total number of AI publications increased from 200,000 in 2010 to almost 500,000 in 2021.'

In [27]:
chat_history=[(query, result['answer'])]
query = "What's this number divided by 2?"
result = qa({"question":query, "chat_history":chat_history})

In [28]:
chat_history

[("What's the total number of AI publications?",
  'The total number of AI publications increased from 200,000 in 2010 to almost 500,000 in 2021.')]

In [29]:
result

{'question': "What's this number divided by 2?",
 'chat_history': [("What's the total number of AI publications?",
   'The total number of AI publications increased from 200,000 in 2010 to almost 500,000 in 2021.')],
 'answer': 'Half of the total number of AI publications is approximately 250,000.'}

## Conclusion

Now you know four ways to do question answering with LLMs in LangChain. In summary, 

- load_qa_chain uses all texts and accepts multiple documents;
- RetrievalQA uses load_qa_chain under the hood but retrieves relevant text chunks first;
- VectorstoreIndexCreator is the same as RetrievalQA with a higher-level interface;
- ConversationalRetrievalChain is useful when you want to pass in your chat history to the model.