## Evaluating RAG Systems

Langchain has the [data connection](https://python.langchain.com/docs/modules/data_connection/) module which helps you connect your own data with LLMs and build Retrieval Augmented Generation pipelines. We will be testing out

1. QA
2. COT_QA
2. CONTEXT_QA


Lets see how we can evaluate those but first lets build a QA system to test.

In [40]:
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator

loader = TextLoader("nyc_text.txt")
index = VectorstoreIndexCreator().from_loaders([loader])

In [57]:
question = "How did New York City get its name?"
index.query(question)

' New York City was named after King Charles II of England, who granted the lands to his brother, the Duke of York.'

In [60]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI()
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=index.vectorstore.as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question})
print(len(result['source_documents']))
result['source_documents'][0]

4


Document(page_content="== Etymology ==\n\nIn 1664, New York was named in honor of the Duke of York, who would become King James II of England. James's elder brother, King Charles II, appointed the Duke as proprietor of the former territory of New Netherland, including the city of New Amsterdam, when England seized it from Dutch control.\n\n\n== History ==", metadata={'source': 'nyc_text.txt'})

In [66]:
examples = [{
    "query": "How did New York City get its name?",
    "context": ["\n".join([c.page_content for c in result["source_documents"]])],
    "answer": "New York City got its name after the Duke of York (the future King James II and VII), who was given part of the colony by proprietors George Carteret and John Berkeley. The settlement was promptly renamed 'New York' after the Duke of York."
}]

predictions = []

In [67]:
for e in examples:
    predictions.append(
        {"result": index.query(e["query"])}
    )

In [72]:
qa_eval = load_evaluator(EvaluatorType.QA)
cot_qa_eval = load_evaluator(EvaluatorType.COT_QA)
context_qa_eval = load_evaluator(EvaluatorType.CONTEXT_QA)

In [69]:
qa_evaluator.evaluate(examples, predictions)

[{'results': 'CORRECT'}]

In [71]:
cot_qa_eval.evaluate(examples, predictions)

[{'text': "The context states that New York was named in honor of the Duke of York, who would become King James II of England. King Charles II, the Duke's elder brother, appointed the Duke as proprietor of the former territory of New Netherland, including the city of New Amsterdam, when England seized it from Dutch control. The student's answer states that New York City was named after King Charles II of England, who granted the lands to his brother, the Duke of York. This is incorrect because the city was named after the Duke of York, not King Charles II. \nGRADE: INCORRECT"}]

In [73]:
context_qa_eval.evaluate(examples, predictions)

[{'text': 'CORRECT'}]