### The Goal of this notebook is to evaluate a RAG system.

### Follow carefully the instructions provided in the notebook

##### Import Libraries to evaluate the Retriever

In [1]:
from deepeval.metrics import (
    ContextualPrecisionMetric,
    ContextualRecallMetric,
    ContextualRelevancyMetric)
from deepeval.test_case import LLMTestCase

import os

###### Load your Open_AI_API_KEY

In [2]:

from dotenv import load_dotenv
load_dotenv()
OPEN_API_KEY = os.getenv("OPEN_API_KEY")

In [3]:
contextual_precision = ContextualPrecisionMetric()
contextual_recall = ContextualRecallMetric()
contextual_relevancy = ContextualRelevancyMetric()

###### First test case. To use deepevel, you need 4 elements: the user question, the expected output (the real answer in your document), the actual output(the response from your system), the retrieval context (To get the retieval context for a question, modify my_code.py by uncommenting from line 154-161, and ask your question to your chatbot, it will return a bunch of text follow by the answer to your question, you can copy that bunch of text and use it as your retrieval context)

In [None]:

Test_case = LLMTestCase(
    input = "what is the duration of the treatment of OLANZAPINE oral?",
    expected_output="""Acute psychosis: at least 3 months
    Chronic psychosis: at least one year
    Manic episode: 8 weeks after remission of symptom""",
    actual_output= """Additionally, the following durations are specified:

                        Acute psychosis: at least 3 months
                        Chronic psychosis: at least one year
                        Manic episode: 8 weeks after remission of symptoms
                        But these durations refer to specific indications or situations, not necessarily the general treatment duration.""",

    retrieval_context=["""Page 238 / 656OLANZAPINE oral Last updated: February 2024   Prescription under medical supervision   Due to the numerous and potentially severe adverse effects of olanzapine, patients should be kept under close surveillance.   Therapeutic action Indications Forms and strengths Dosage Duration   Discontinue treatment gradually (over 4 weeks). If signs of relapse occur, increase the dose then decrease it more gradually.  Contra-indications, adverse effects, precautions    Atypical antipsychotic Acute and chronic psychosis and acute manic episode, in the event of intolerance or treatment failure with other antipsychotics (preferably use haloperidol for these indications)  2.5 mg, 5 mg and 10 mg tablets Adult: 10 mg once daily. Increase up to 15 mg daily if necessary (max. 20 mg daily). Reduce the dose by half in older patients (max. 10 mg daily). Acute psychosis: at least 3 months Chronic psychosis: at least one year Manic episode: 8 weeks after remission of symptoms

                            Child and adult: 100 000 IU 4 times daily (1 ml of the oral suspension 4 times daily) for 7 days Take between meals (e.g. at least 30 minutes before eating). Shake oral suspension well before using. Pregnancy: no contra-indication Breast-feeding: no contra-indication Nystatin also comes in: 100 000 IU lozenge for the treatment of oropharyngeal candidiasis; 100 000 IU and 500 000 IU ﬁlm coated tablets for the treatment of oesophageal candidiasis.  For the treatment of moderate to severe oropharyngeal candidiasis and oesophageal candidiasis, oral ﬂuconazole is the ﬁrst-line treatment.Page 238 / 656OLANZAPINE oral Last updated: February 2024   Prescription under medical supervision   Due to the numerous and potentially severe adverse effects of olanzapine, patients should be kept under close surveillance.   Therapeutic action Indications Forms and strengths Dosage Duration   Discontinue treatment gradually (over 4 weeks). If signs of relapse occur, increase the dose then

                            Therapeutic action Indications Forms and strengths Dosage Duration   Discontinue treatment gradually (over 4 weeks). If signs of relapse occur, increase the dose then decrease it more gradually.  Contra-indications, adverse effects, precautions    Atypical antipsychotic Acute and chronic psychosis and acute manic episode, in the event of intolerance or treatment failure with other antipsychotics (preferably use haloperidol for these indications)  2.5 mg, 5 mg and 10 mg tablets Adult: 10 mg once daily. Increase up to 15 mg daily if necessary (max. 20 mg daily). Reduce the dose by half in older patients (max. 10 mg daily). Acute psychosis: at least 3 months Chronic psychosis: at least one year Manic episode: 8 weeks after remission of symptoms Do not administer to patients with cardiac disorders (heart failure, recent myocardial infarction, conduction disorders, bradycardia, etc.), dementia (e.g. Alzheimer's disease), Parkinson's disease,

                            Page 162 / 656Duration   Discontinue treatment gradually (over 4 weeks). If signs of relapse occur, increase the dose then decrease it more gradually. Contra-indications, adverse effects, precautions Storage  Below 25 °C Delirium and acute alcohol intoxication: as short as possible (max. 7 days) Acute psychosis: at least 3 months Chronic psychosis: at least one year Manic episode: 8 weeks after remission of symptoms Do not administer to patients with cardiac disorders (heart failure, recent myocardial infarction, conduction disorders, bradycardia, etc.), dementia (e.g. Alzheimer's disease), Parkinson's disease and history of neuroleptic malignant syndrome. Administer with caution and carefully monitor use in older patients and patients with hypokalaemia, hypotension, hyperthyroidism, renal or hepatic impairment, history of seizures. May cause: drowsiness (caution when driving/operating machinery), extrapyramidal symptoms, early"""]

                            )

##### Now you can run the code below to get the metric used to evaluate your retrieval model.

##### You will get the score for each metric as well as the reason of that score.

In [6]:
contextual_precision.measure(test_case=Test_case)
print("Contextual Precision: ", contextual_precision.score)
print("Reason: ", contextual_precision.reason)


contextual_recall.measure(test_case=Test_case)
print("Contextual Recall: ", contextual_recall.score)
print("Reason: ", contextual_recall.reason)

contextual_relevancy.measure(test_case=Test_case)
print("Contextual Relevancy: ", contextual_relevancy.score)
print("Reason: ", contextual_relevancy.reason)

Contextual Precision:  1.0
Reason:  The score is 1.00 because the relevant node ranks first, providing detailed information that directly answers the query regarding treatment duration for OLANZAPINE oral.


Contextual Recall:  1.0
Reason:  The score is 1.00 because each element of the expected output has perfect alignment with the 2nd node in retrieval context, reflecting complete context retrieval.


Contextual Relevancy:  0.5714285714285714
Reason:  The score is 0.57 because, while there are some relevant statements like 'OLANZAPINE oral treatment should be discontinued gradually' and specified durations for acute and chronic psychosis, they are overshadowed by content focusing on adverse effects and conditions unrelated to treatment duration.


##### We do the same to evaluate your generator model.

##### You will get the score for each metric as well as the reason of that score.

In [7]:
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric

answer_relevancy = AnswerRelevancyMetric()
faithfulness = FaithfulnessMetric()



answer_relevancy.measure(test_case=Test_case)
print("Answer Relevancy: ", answer_relevancy.score)
print("Reason: ", answer_relevancy.reason)

faithfulness.measure(test_case=Test_case)
print("Faithfulness: ", faithfulness.score)
print("Reason: ", faithfulness.reason)

Answer Relevancy:  1.0
Reason:  The score is 1.00 because the response was fully relevant to the input, focusing solely on the duration of Olanzapine oral treatment.


Faithfulness:  1.0
Reason:  The score is 1.00 because the actual output perfectly aligns with the retrieval context, demonstrating exemplary consistency and accuracy. Keep up the excellent work!


##### Perfect!!! You did it, you can try that with many other questions and see how your RAG system behave