## AI Makerspace Midterm Assessment RAG Pipeline with RAGAS Evaluation

#### Install langchain dependencies

In [1]:
!pip install -U -q langchain langchain-openai langchain_core langchain-community langchainhub openai

#### Install ragas

In [2]:
!pip install -qU ragas

#### Install FAISS, pymupdf and pandas

In [3]:
!pip install -qU faiss_cpu pymupdf pandas

#### Set environment variables

In [22]:
import os
import openai
from getpass import getpass

openai.api_key = getpass("Please provide your OpenAI Key: ")
os.environ["OPENAI_API_KEY"] = openai.api_key

#### Load in NVIDIA data

In [23]:
from langchain_community.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader(
    "data/nvidia.pdf",
)

documents = loader.load()

#### Transform the data

In [24]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 50
)

documents = text_splitter.split_documents(documents)

In [25]:
len(documents)

624

#### Load our OpenAI Embeddings Model

In [26]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

#### Create our FAISS VectorStore

In [27]:
from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_documents(documents, embeddings)

#### Create a Retriever

In [28]:
retriever = vector_store.as_retriever()

#### Create Prompt template

In [29]:
from langchain.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context. If you cannot answer the question with the context, please respond with 'I don't know':

Context:
{context}

Question:
{question}
"""

prompt = ChatPromptTemplate.from_template(template)

#### Set up and Instantiate our QA Chain

In [45]:
from operator import itemgetter

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

qa_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

retrieval_augmented_qa_chain = (
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": prompt | qa_llm, "context": itemgetter("context")}
)

#### Task 1: 

#### Prompt for "Who is the E-VP, Operations - and how old are they?"

In [46]:
question = "Who is the E-VP, Operations - and how old are they?"

result = retrieval_augmented_qa_chain.invoke({"question" : question})

print(result["response"].content)

Debora Shoquist is the Executive Vice President, Operations, and she is 69 years old.


#### Task 2: 

#### Prompt for "What is the gross carrying amount of Total Amortizable Intangible Assets for Jan 29, 2023?"

In [32]:
question = "What is the gross carrying amount of Total Amortizable Intangible Assets for Jan 29, 2023?"

result = retrieval_augmented_qa_chain.invoke({"question" : question})

print(result["response"].content)

$3,539


### RAGAS Evaluation

#### Synthetic test generation

In [33]:
eval_documents = documents

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 400
)

eval_documents = text_splitter.split_documents(eval_documents)

#### Generate test questions

In [36]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

generator = TestsetGenerator.with_openai()

testset = generator.generate_with_langchain_docs(documents, test_size=8, distributions={simple: 0.25, reasoning: 0.25, multi_context: 0.5})

  generator = TestsetGenerator.with_openai()


embedding nodes:   0%|          | 0/1248 [00:00<?, ?it/s]

Filename and doc_id are the same for all nodes.


Generating:   0%|          | 0/8 [00:00<?, ?it/s]

In [37]:
test_df = testset.to_pandas()
test_df

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done
0,How has the NVIDIA accelerated computing platf...,[Table of Contents\nAt the foundation of the N...,The NVIDIA accelerated computing platform has ...,simple,True
1,What types of facilities are included in the l...,[Termination of the Arm Share Purchase Agreeme...,The lease obligations of the company primarily...,simple,True
2,How can government actions impact a company's ...,[business and results of operations.\nGovernme...,"Government actions, including trade protection...",reasoning,True
3,What factors led to the determination that inv...,"[conditions. As of January 28, 2024, the Compa...",The significant judgment by management when de...,reasoning,True
4,How does the company depend on external partie...,"[Table of Contents\nGPUs, which could negative...","The company depends on developers, customers, ...",multi_context,True
5,What are the potential consequences of export ...,[Additional unilateral or multilateral control...,Additional export restrictions and foreign gov...,multi_context,True
6,What are the risks of estimating customer dema...,[to some of ours and can use or develop their ...,Failure to estimate customer demand accurately...,multi_context,True
7,Where are gains and losses on equity securitie...,[equity securities totaled $1.3 billion and $2...,Gains and losses on equity securities investme...,multi_context,True


In [38]:
test_questions = test_df["question"].values.tolist()
test_groundtruths = test_df["ground_truth"].values.tolist()

#### Map over questions and build a response dataset

In [39]:
answers = []
contexts = []

for question in test_questions:
  response = retrieval_augmented_qa_chain.invoke({"question" : question})
  answers.append(response["response"].content)
  contexts.append([context.page_content for context in response["context"]])

In [40]:
from datasets import Dataset

response_dataset = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

#### Build a map of metrics, and evaluate pipeline with RAGAS

In [41]:
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    answer_correctness,
    context_recall,
    context_precision,
)

metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    answer_correctness,
]

In [42]:
results = evaluate(response_dataset, metrics)

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

#### Inspect our results

In [43]:
results

{'faithfulness': 0.8571, 'answer_relevancy': 0.9305, 'context_recall': 0.8750, 'context_precision': 0.9306, 'answer_correctness': 0.6153}

#### Tabulate our data for comparison

In [44]:
results_df = results.to_pandas()
results_df

Unnamed: 0,question,answer,contexts,ground_truth,faithfulness,answer_relevancy,context_recall,context_precision,answer_correctness
0,How has the NVIDIA accelerated computing platf...,The NVIDIA accelerated computing platform has ...,[Table of Contents\nAt the foundation of the N...,The NVIDIA accelerated computing platform has ...,0.5,0.917183,1.0,1.0,0.618457
1,What types of facilities are included in the l...,Data centers,[lease periods expiring between fiscal years 2...,The lease obligations of the company primarily...,,0.843804,0.0,0.805556,0.5043
2,How can government actions impact a company's ...,Government actions such as trade protection po...,[business and results of operations.\nGovernme...,"Government actions, including trade protection...",1.0,0.946352,1.0,1.0,0.439345
3,What factors led to the determination that inv...,The significant judgment by management when de...,[critical audit matter or on the accounts or d...,The significant judgment by management when de...,1.0,0.991135,1.0,1.0,0.739978
4,How does the company depend on external partie...,The company depends on external parties for ac...,"[computing processor products, and providers o...","The company depends on developers, customers, ...",0.5,0.888291,1.0,0.638889,0.473638
5,What are the potential consequences of export ...,The potential consequences include negatively ...,[Additional unilateral or multilateral control...,Additional export restrictions and foreign gov...,1.0,0.879061,1.0,1.0,0.613397
6,What are the risks of estimating customer dema...,The risks of estimating customer demand accura...,[Many additional factors have caused and/or co...,Failure to estimate customer demand accurately...,1.0,0.993877,1.0,1.0,0.533261
7,Where are gains and losses on equity securitie...,Gains and losses on equity securities investme...,[equity securities totaled $1.3 billion and $2...,Gains and losses on equity securities investme...,1.0,0.983971,1.0,1.0,1.0
