# Data Augmented Question Answering

This notebook uses some generic prompts/language models to evaluate an question answering system that uses other sources of data besides what is in the model. For example, this can be used to evaluate a question answering system over your propritary data.

## Setup
Let's set up an example with our favorite example - the state of the union address.

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA

In [2]:
with open('../state_of_the_union.txt') as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()
docsearch = FAISS.from_texts(texts, embeddings)
qa = VectorDBQA.from_llm(llm=OpenAI(), vectorstore=docsearch)

## Examples
Now we need some examples to evaluate. We can do this in two ways:

1. Hard code some examples ourselves
2. Generate examples automatically, using a language model

In [3]:
# Hard-coded examples
examples = [
    {
        "query": "What did the president say about Ketanji Brown Jackson",
        "answer": "He praised her legal ability and said he nominated her for the supreme court."
    },
    {
        "query": "What did the president say about Michael Jackson",
        "answer": "Nothing"
    }
]

In [8]:
# Generated examples
from langchain.evaluation.qa import generate_questions
new_examples = generate_questions(OpenAI(), texts[:5])

In [9]:
# Combine examples
examples += new_examples

## Evaluate
Now that we have examples, we can use the question answering evaluator to evaluate our question answering chain.

In [10]:
from langchain.evaluation.qa import evaluate_question_answering

In [14]:
predictions = qa.apply(examples)

ServiceUnavailableError: The server is overloaded or not ready yet.

In [None]:
llm = OpenAI(temperature=0)
graded_outputs = evaluate_question_answering(llm, examples, predictions)

In [None]:
graded_outputs