# Question Answering

This notebook walks through how to use LangChain for question answering over a list of documents. It covers three different types of chaings: `stuff`, `map_reduce`, and `refine`. For a more in depth explanation of what these chain types are, see [here](../../explanation/combine_docs.md).

### Prepare Data
First we prepare the data. For this example we do similarity search over a vector database, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents).

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.faiss import FAISS
from langchain.docstore.document import Document

In [2]:
with open('../state_of_the_union.txt') as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()

In [3]:
docsearch = FAISS.from_texts(texts, embeddings)

In [58]:
query = "what does he say about officer mora"
docs = docsearch.similarity_search(query)

In [59]:
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
from langchain.chains import LLMChain

### The `stuff` Chain

This sections shows results of using the `stuff` Chain to do question answering.

In [64]:
from langchain.prompts import PromptTemplate
template = """Use the following document to answer the given question. In addition to providing an answer, please also give your answer a score from 0-100 in terms of how good it is (higher is better). 

What decides the score? A good score is factually accurate, and FULLY answers the question in a way the user would find helpful. If the document does not contain the answer, the score should be 0. You should only give a score of 100 if you are absolutely positive this is the best answer. Keep in mind that you will also be answering this question with other documents, so one of them could have a better answer.

Use the following format:

Document:
---------------
Document text here
---------------
Question: Question here
Answer: Answer here
Score: Score (between 0 and 100) here

Begin!

Document:
---------------
{context}
---------------
Question: {question}
Answer:"""
from langchain.prompts.base import BaseOutputParser
import re
class ScoreOutputParser(BaseOutputParser):
    
    def parse(self, text: str):
        regex = r"(.*?)\nScore: (.*)"
        match = re.search(regex, text)
        if match:
            question = match.group(1)
            answer = match.group(2)
            return {"answer": question, "score": int(answer)}
        else:
            raise ValueError(f"Could not parse output: {text}")
prompt = PromptTemplate(template=template, input_variables=['context', 'question'], output_parser=ScoreOutputParser())

        
llm_chain = LLMChain(llm = OpenAI(temperature=0), prompt=prompt)

In [65]:
results = llm_chain.apply_and_parse(
    # FYI - this is parallelized and so it is fast.
    [{"context": d.page_content, "question": query} for d in docs]
)

In [70]:
sorted(zip(results, docs), key=lambda x: -x[0]['score'])

[({'answer': ' He says that Officer Mora was 27 years old.', 'score': 100},
  Document(page_content='We have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLet’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.  \n\nLet’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.  \n\nWe can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers

In [56]:
docs

[Document(page_content='I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n\nI’ve worked on these issues a long time. \n\nI know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. \n\nSo let’s not abandon our streets. Or choose between safety and equal justice. \n\nLet’s come together to protect our communities, restore trust, and hold law enforcement accountable. \n\nThat’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. \n\nThat’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and

In [6]:
chain = load_qa_chain(OpenAI(temperature=0), chain_type="stuff")

In [7]:
docs = [Document(page_content=t) for t in texts[:3]]

In [8]:
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': ' The president did not mention Justice Breyer.'}

### The `map_reduce` Chain

This sections shows results of using the `map_reduce` Chain to do question answering.

In [9]:
chain = load_qa_chain(OpenAI(temperature=0), chain_type="map_reduce")

In [10]:
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': ' The president did not mention Justice Breyer.'}

### The `refine` Chain

This sections shows results of using the `refine` Chain to do question answering.

In [11]:
chain = load_qa_chain(OpenAI(temperature=0), chain_type="refine")

In [12]:
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': "\n\nThe president did not mention Justice Breyer in his speech to the European Parliament about building a coalition of freedom-loving nations to confront Putin, unifying European allies, countering Russia's lies with truth, and enforcing powerful economic sanctions."}