<a href="https://colab.research.google.com/github/peremartra/Large-Language-Model-Notebooks-Course/blob/main/4-Evaluating%20LLMs/evaluating_rag_giskard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">
<h1><a href="https://github.com/peremartra/Large-Language-Model-Notebooks-Course">Learn by Doing LLM Projects</a></h1>
    <h3>Understand And Apply Large Language Models</h3>
    <h2>Evaluating a RAG solution with Giskard</h2>
    <p>by <b>Pere Martra</b></p>
</div>

<div align="center">
    &nbsp;
    <a target="_blank" href="https://www.linkedin.com/in/pere-martra/"><img src="https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social"></a>
    
</div>



In the final part of this notebook, you will find the necessary code to evaluate the suitability of the responses provided by the Agent using Giskard.

#Installing libraries & Loading Dataset

In [85]:
!pip install -q langchain==0.1.14
!pip install -q langchain-openai==0.1.1
!pip install langchainhub==0.1.15
#!pip install --upgrade -q tiktoken==0.5.2
!pip install -q datasets==2.16.1
!pip install -q chromadb==0.4.22

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/812.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.4/812.8 kB[0m [31m9.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m812.8/812.8 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m


In [86]:
!pip install -q openai --upgrade

We will download the dataset from the Hugging Face datasets library. It's a dataset with information about diseases.

In [87]:
from datasets import load_dataset

data = load_dataset("keivalya/MedQuad-MedicalQnADataset", split='train')


In [88]:
data = data.to_pandas()
data.head(10)

Unnamed: 0,qtype,Question,Answer
0,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...
1,symptoms,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...
2,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...
3,exams and tests,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos..."
4,treatment,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen..."
5,prevention,How to prevent Lymphocytic Choriomeningitis (L...,LCMV infection can be prevented by avoiding co...
6,information,What is (are) Parasites - Cysticercosis ?,Cysticercosis is an infection caused by the la...
7,susceptibility,Who is at risk for Parasites - Cysticercosis? ?,Cysticercosis is an infection caused by the la...
8,exams and tests,How to diagnose Parasites - Cysticercosis ?,"If you think that you may have cysticercosis, ..."
9,treatment,What are the treatments for Parasites - Cystic...,Some people with cysticercosis do not need to ...


In [89]:
data = data[0:100]

As you can see, the medical information in the dataset is well-organized, and to someone like me, who is not an expert in the field, it appears to be quite valuable. This information could be a useful addition to any general medicine book to support primary care doctors.

Load the langchain libraries to load the document.

In [90]:
from langchain.document_loaders import DataFrameLoader
from langchain.vectorstores import Chroma

The Document is in the Answer column, and the others columns are Metadata.

In [91]:
df_loader = DataFrameLoader(data, page_content_column="Answer")


In [92]:
df_document = df_loader.load()
display(df_document[:2])

[Document(page_content='LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, or the mouth, or presumably, via the bite of an infected rodent. Person-to-person transmission has not been reported, with the exception of vertical transmission from infected mother to fetus, and rarely, through organ transplantation.', metadata={'qtype': 'susceptibility', 'Question': 'Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?'}),
 Document(page_content='LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. \n                \nFor infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial 

We can chunk the documents. The size to which we want to split the document is a design decision. The larger it is, the larger the prompt will be, and the slower the Model's response process.

We also need to consider the maximum prompt size and ensure that the document does not exceed it.

In [93]:
from langchain.text_splitter import CharacterTextSplitter

In [94]:
text_splitter = CharacterTextSplitter(chunk_size=1250, chunk_overlap=100)
texts = text_splitter.split_documents(df_document)


These warnings we see are because it can't perform the partition of the required size. This is because it waits for a page break to divide the text and does so when possible.

In [95]:
first_doc = texts[1]
print(first_doc.page_content)

LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. 
                
For infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial phase, which may last as long as a week, typically begins with any or all of the following symptoms: fever, malaise, lack of appetite, muscle aches, headache, nausea, and vomiting. Other symptoms appearing less frequently include sore throat, cough, joint pain, chest pain, testicular pain, and parotid (salivary gland) pain. 
                
Following a few days of recovery, a second phase of illness may occur. Symptoms may consist of meningitis (fever, headache, stiff neck, etc.), encephalitis (drowsiness, confusion, sensory disturbances, and/or motor abnormalities, such as paralysis), or meningoencephalitis (inflammation 

### Initialize the Embedding Model and Vector DB

We load the text-embedding-ada-002 model from OpenAI.

In [96]:
from getpass import getpass
import os
if not 'OPENAI_API_KEY' in os.environ:
  os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key: ")

In [97]:
from langchain_openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'
#model_name = 'text-embedding-3-small'

embed = OpenAIEmbeddings(
    model=model_name,
    #openai_api_key=OPENAI_API_KEY
)

The execution of this cell may take 3 to 5 minutes. If you want it to be faster, you can reduce the number of records in the dataset.

In [98]:
directory_cdb = '/content/drive/MyDrive/chromadb'
chroma_db = Chroma.from_documents(
    df_document, embed, persist_directory=directory_cdb
)

We are going to create three objects.

* The language model, which can be any of those from OpenAI, the most common being gpt-3.5.
* The memory, responsible for keeping the prompt with all the necessary history.
* The retrieval, used to obtain information stored in ChromaDB.

In [126]:
from langchain.chat_models import ChatOpenAI
from langchain_openai import OpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA

llm=OpenAI(#openai_api_key=OPENAI_API_KEY,
           temperature=0.0)

conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=4, #Number of messages stored in memory
    return_messages=True #Must return the messages in the response.
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=chroma_db.as_retriever()
)

We can try the isolated Retrieval to see if the information it returns is relevant.




In [127]:
qa.run("What is the main symptom of LCM?")

' The main symptom of LCM is neurological disease, which can manifest as meningitis, encephalitis, or meningoencephalitis.'

Perfect! The information returned is exactly what we desired.

## Creating the Agent.

In [101]:
from langchain.agents import Tool, AgentExecutor

#Defining the list of tool objects to be used by LangChain.
tools = [
    Tool(
        name='Medical KB',
        func=qa.run,
        description=(
            'use this tool when answering medical knowledge queries to get '
            'more information about the topic'
        )
    )
]

In [102]:
from langchain.agents import initialize_agent, create_react_agent
from langchain import hub

prompt = hub.pull("hwchase17/react-chat")
agent = create_react_agent(
    #agent='chat-conversational-react-description',
    tools=tools,
    llm=llm,
    prompt=prompt
    #verbose=True,
    #max_iterations=3,
    #early_stopping_method='generate',
    #memory=conversational_memory
)

In [128]:
# Create an agent executor by passing in the agent and tools
agent_executor = AgentExecutor(agent=agent,
                               tools=tools,
                               verbose=True,
                               memory=conversational_memory,
                               max_iterations=30,
                               max_execution_time=600,
                               #early_stopping_method='generate',
                               handle_parsing_errors=True
                               )

### Using the Conversational Agent

To make queries we simply call the `agent` directly.

First i will try a order not related to the Medical field.

In [104]:
agent_executor.invoke({"input": "Give me the area of square of 2x2"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: Medical KB
Action Input: Area of square[0m[36;1m[1;3m I don't know.[0m[32;1m[1;3m Do I need to use a tool? No
Final Answer: The area of a square with sides of 2 units is 4 square units.[0m

[1m> Finished chain.[0m


{'input': 'Give me the area of square of 2x2',
 'chat_history': [],
 'output': 'The area of a square with sides of 2 units is 4 square units.'}

Perfect, the model has responded without accessing the configured knowledge database.

Now I will try with a question that is also not related to health.

In [105]:
agent_executor.invoke({"input": "Do you know who is Clark Kent?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? No
Final Answer: Clark Kent is the secret identity of the superhero Superman.[0m

[1m> Finished chain.[0m


{'input': 'Do you know who is Clark Kent?',
 'chat_history': [HumanMessage(content='Give me the area of square of 2x2'),
  AIMessage(content='The area of a square with sides of 2 units is 4 square units.')],
 'output': 'Clark Kent is the secret identity of the superhero Superman.'}

It has not accessed either, as the model has been able to identify that it is not a question related to the database that LangChain provides.

Now it's time to try with a question related to Medicine. Let's see if the model can understand that it should first look for information in the vector database at its disposal.

In [106]:
 agent_executor.memory.clear()

In [107]:
agent_executor.invoke({"input": """I have a patient that can have Botulism,
how can I confirm the diagnose?"""})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Medical KB
Action Input: Botulism[0m[36;1m[1;3m Botulism is a rare but serious paralytic illness caused by a nerve toxin produced by certain bacteria. There are five main types of botulism, including foodborne, wound, infant, adult intestinal toxemia, and iatrogenic. All forms of botulism can be fatal and are considered medical emergencies. Foodborne botulism is a public health emergency because it can affect many people if they consume contaminated food.[0m[32;1m[1;3mDo I need to use a tool? No
Final Answer: To confirm a diagnosis of botulism, a doctor may perform a physical exam, review symptoms and medical history, and order laboratory tests. These tests may include a stool or blood test to detect the presence of the botulinum toxin or the bacteria that produce it. Imaging tests, such as an MRI or CT scan, may also be used to look for signs of nerve damage. It is importa

{'input': 'I have a patient that can have Botulism,\nhow can I confirm the diagnose?',
 'chat_history': [],
 'output': 'To confirm a diagnosis of botulism, a doctor may perform a physical exam, review symptoms and medical history, and order laboratory tests. These tests may include a stool or blood test to detect the presence of the botulinum toxin or the bacteria that produce it. Imaging tests, such as an MRI or CT scan, may also be used to look for signs of nerve damage. It is important to seek medical attention immediately if botulism is suspected, as early treatment can improve outcomes.'}

Perfect, the most important thing for us is that it has been able to identify that it should go to the medical database to search for information about the symptoms.

In [108]:
agent_executor.invoke({"input": "Is this an important illness?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? No
Final Answer: Yes, botulism is a serious illness that can lead to paralysis and even death if left untreated. It is important to seek medical attention immediately if you suspect you or someone else may have botulism.[0m

[1m> Finished chain.[0m


{'input': 'Is this an important illness?',
 'chat_history': [HumanMessage(content='I have a patient that can have Botulism,\nhow can I confirm the diagnose?'),
  AIMessage(content='To confirm a diagnosis of botulism, a doctor may perform a physical exam, review symptoms and medical history, and order laboratory tests. These tests may include a stool or blood test to detect the presence of the botulinum toxin or the bacteria that produce it. Imaging tests, such as an MRI or CT scan, may also be used to look for signs of nerve damage. It is important to seek medical attention immediately if botulism is suspected, as early treatment can improve outcomes.')],
 'output': 'Yes, botulism is a serious illness that can lead to paralysis and even death if left untreated. It is important to seek medical attention immediately if you suspect you or someone else may have botulism.'}

And the memory works perfectly. We can maintain a conversation, taking into account that the model knows the previous questions and answers.

## Evaluating the solution with Giskard

Install and Load the libraries.

In [109]:
!pip install -q giskard[llm]
from giskard.rag import KnowledgeBase, generate_testset, evaluate

Is necesary to create a Dataframe with just the column containing the information used to create the RAG system.

In [112]:
import pandas as pd
df_giskard = pd.DataFrame([d.page_content for d in df_document], columns=["text"])
df_giskard.head(10)

Unnamed: 0,text
0,LCMV infections can occur after exposure to fr...
1,LCMV is most commonly recognized as causing ne...
2,Individuals of all ages who come into contact ...
3,"During the first phase of the disease, the mos..."
4,"Aseptic meningitis, encephalitis, or meningoen..."
5,LCMV infection can be prevented by avoiding co...
6,Cysticercosis is an infection caused by the la...
7,Cysticercosis is an infection caused by the la...
8,"If you think that you may have cysticercosis, ..."
9,Some people with cysticercosis do not need to ...


Using the information from the dataset, we ask Giskard to create a Knowledge Base, which is nothing more than a set of questions along with their respective answers. Both the questions and answers are generated by OpenAI's most advanced model, which is why it requires our OpenAI key to be provided.

In [113]:
kb_giskard = KnowledgeBase(df_giskard)

In [131]:
# The more questions you generate, the more you will be charged.
test_questions = generate_testset(
    kb_giskard,
    num_questions=30,
    agent_description="Medical assistant for diagnosis and treatment support.",
)

Generating questions:   0%|          | 0/30 [00:00<?, ?it/s]

In [132]:
df_test_questions = test_questions.to_pandas()

In [133]:
df_test_questions.head()

Unnamed: 0_level_0,question,reference_answer,reference_context,conversation_history,metadata
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
d4bb2ef9-c632-42a0-9335-666c0615cc8d,How can humans become infected with Baylisasca...,Humans become infected by ingesting embryonate...,Document 70: Baylisascaris infection can be pr...,[],"{'question_type': 'simple', 'seed_document_id'..."
9985cfc1-6c89-47bd-9bc1-5dae2e3fc6d4,What are the steps to prevent and control the ...,The steps to prevent and control the spread of...,Document 34: Body lice are parasitic insects t...,[],"{'question_type': 'simple', 'seed_document_id'..."
e4a9279e-71ba-4beb-8bf2-23f345ee0d2a,What is the recommended prevention method for ...,Scabies is prevented by avoiding direct skin-t...,Document 51: Transmission\n \nHuman scabies...,[],"{'question_type': 'simple', 'seed_document_id'..."
79cb5115-abfc-4794-87e1-9ff9de886b2e,What is the treatment for botulism?,Botulism can be treated with an antitoxin whic...,Document 81: The classic symptoms of botulism ...,[],"{'question_type': 'simple', 'seed_document_id'..."
93d7f209-4f14-4bd1-bc54-9011e5229ada,What are the first symptoms of rabies and how ...,The first symptoms of rabies may be very simil...,Document 76: The first symptoms of rabies may ...,[],"{'question_type': 'simple', 'seed_document_id'..."


A function is created that will be called from Giskard's evaluate function. This function receives the question and returns the agent's response.

In [168]:
def use_agent(question, history=None):
    return agent_executor.invoke({"input": question})

In [169]:
report = evaluate(use_agent, testset=test_questions, knowledge_base=kb_giskard)

Asking questions to the agent:   0%|          | 0/30 [00:00<?, ?it/s]



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Medical KB
Action Input: Baylisascaris[0m[36;1m[1;3m Baylisascaris is a genus of intestinal parasites that can be found in various animals, with different species being associated with different animal hosts. It can cause severe infections in humans, with Baylisascaris procyonis, found in raccoons, posing the greatest risk due to their close association with human dwellings.[0m[32;1m[1;3m Do I need to use a tool? No
Final Answer: Humans can become infected with Baylisascaris through ingestion of contaminated soil, water, or food, or through direct contact with infected animals. It is important to practice good hygiene and avoid contact with potentially contaminated sources to prevent infection.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Medical KB
Action Input: Preventing and con

Correctness evaluation:   0%|          | 0/30 [00:00<?, ?it/s]



In [137]:
# Summary with the results.
report.correctness_by_question_type()

Unnamed: 0_level_0,correctness
question_type,Unnamed: 1_level_1
complex,0.8
conversational,0.4
distracting element,1.0
double,1.0
simple,1.0
situational,0.6


In [170]:
# Obtaining the incorrect answers
failures = report.get_failures()[:2]
failures

Unnamed: 0_level_0,question,reference_answer,reference_context,conversation_history,metadata,agent_answer,correctness,correctness_reason
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
96df0b44-9b38-46d4-ade0-b92a7013163a,Could you describe the distinct attributes of ...,The seafood contaminated with marine toxins fr...,Document 16: Marine toxins are naturally occur...,[],"{'question_type': 'complex', 'seed_document_id...",{'input': 'Could you describe the distinct att...,False,The agent incorrectly stated that seafood tain...
d562b7a1-e349-40f5-ba27-615f6569455f,What are some of the preventive measures to co...,The preventive measures to control the spread ...,Document 34: Body lice are parasitic insects t...,[],"{'question_type': 'distracting element', 'seed...",{'input': 'What are some of the preventive mea...,False,The agent's answer does not consider the situa...


In [162]:
# Giskard explains the reasons why it considers the answers to be incorrect.
failures['correctness_reason'].iloc[1]

"The agent's answer is not specific enough. It does not mention the specific consequences of LCMV infection during pregnancy such as fetal death, pregnancy termination, birth defects, vision problems, mental retardation, and hydrocephaly."


# Conclusions.
The experiment has been a small success. The Vectorial database has been configured and filled with information from the dataset. A LangChain agent has been created, and it has been able to retrieve information from the database only when necessary. Don't forget that our ChatBot has memory.

All of this in just a few lines of code!


---