# Run Evaluations on our RAG chatbot! 

<div align="center">
    <p style="text-align:left">
        <img alt="phoenix logo" src="https://repository-images.githubusercontent.com/564072810/f3666cdf-cb3e-4056-8a25-27cb3e6b5848" width="800"/>
        <br>
        <a href="https://arize.com/docs/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</div>

## Let's get started! 

In [1]:
%pip install -qqq "arize-phoenix==11.21.0" "openai>=1" nest_asyncio

Note: you may need to restart the kernel to use updated packages.


In [8]:
import os
from getpass import getpass
import phoenix as px
import nest_asyncio

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key

nest_asyncio.apply()

<img alt="Document Retrieval Evaluation Image" src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/phoenix-docs-images/documentRelevanceDiagram.png" width="1000"/>

In [15]:
from phoenix.session.evaluation import get_retrieved_documents
retrieved_documents_df = get_retrieved_documents(px.Client(), project_name="our-rag-project", timeout=None)
retrieved_documents_df

Unnamed: 0_level_0,Unnamed: 1_level_0,context.trace_id,input,reference
context.span_id,document_position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
5fe3e8fd6f82bfe9,0,6dc8afa6bcdc69fa2e3efa818fd69bf5,course name,The purpose of this document is to capture fre...
5fe3e8fd6f82bfe9,1,6dc8afa6bcdc69fa2e3efa818fd69bf5,course name,"Yes, we will keep all the materials after the ..."
5fe3e8fd6f82bfe9,2,6dc8afa6bcdc69fa2e3efa818fd69bf5,course name,GitHub - DataTalksClub data-engineering-zoomca...
5fe3e8fd6f82bfe9,3,6dc8afa6bcdc69fa2e3efa818fd69bf5,course name,You can start by installing and setting up all...
5fe3e8fd6f82bfe9,4,6dc8afa6bcdc69fa2e3efa818fd69bf5,course name,"Yes, even if you don't register, you're still ..."
a001cfe3483647d5,0,05b736da6a425648d3d31dd825698bec,course name,The purpose of this document is to capture fre...
a001cfe3483647d5,1,05b736da6a425648d3d31dd825698bec,course name,"Yes, we will keep all the materials after the ..."
a001cfe3483647d5,2,05b736da6a425648d3d31dd825698bec,course name,GitHub - DataTalksClub data-engineering-zoomca...
a001cfe3483647d5,3,05b736da6a425648d3d31dd825698bec,course name,You can start by installing and setting up all...
a001cfe3483647d5,4,05b736da6a425648d3d31dd825698bec,course name,"Yes, even if you don't register, you're still ..."


In [16]:
from phoenix.session.evaluation import get_qa_with_reference

queries_df = get_qa_with_reference(px.Client(), project_name="our-rag-project", timeout=None)
queries_df

Unnamed: 0_level_0,input,output,reference
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ce2272c6d7fed335,What is the name of this course?,The name of this course is **Data Engineering ...,The purpose of this document is to capture fre...
a5b6a984a3db453f,What is the name of this course?,"The name of the course is ""Data Engineering Zo...",The purpose of this document is to capture fre...
e78d158a6aa21457,What topics will be covered?,It appears that the specific topics covered in...,The purpose of this document is to capture fre...


In [11]:
from phoenix.evals import (
    HallucinationEvaluator,
    OpenAIModel,
    QAEvaluator,
    RelevanceEvaluator,
    run_evals,
)

eval_model = OpenAIModel(model="gpt-4")
relevance_evaluator = RelevanceEvaluator(eval_model)
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_evaluator = QAEvaluator(eval_model)

In [17]:
retrieved_documents_relevance_df = run_evals(
    evaluators=[relevance_evaluator],
    dataframe=retrieved_documents_df,
    provide_explanation=True,
    concurrency=20,
)[0]
retrieved_documents_relevance_df

run_evals |          | 0/15 (0.0%) | ⏳ 00:00<? | ?it/s

Unnamed: 0_level_0,Unnamed: 1_level_0,label,score,explanation
context.span_id,document_position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
5fe3e8fd6f82bfe9,0,unrelated,0,The question is asking for the name of the cou...
5fe3e8fd6f82bfe9,1,unrelated,0,The question is asking for the name of the cou...
5fe3e8fd6f82bfe9,2,unrelated,0,The reference text mentions a GitHub repositor...
5fe3e8fd6f82bfe9,3,unrelated,0,The question is asking for the name of a cours...
5fe3e8fd6f82bfe9,4,unrelated,0,The question is asking for the name of a cours...
a001cfe3483647d5,0,unrelated,0,The question asks for the 'course name'. Howev...
a001cfe3483647d5,1,unrelated,0,The question is asking for the name of the cou...
a001cfe3483647d5,2,unrelated,0,The reference text mentions a GitHub repositor...
a001cfe3483647d5,3,unrelated,0,The question is asking for the name of a cours...
a001cfe3483647d5,4,unrelated,0,The question is asking for the name of a cours...


In [18]:
hallucination_eval_df, qa_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_evaluator],
    provide_explanation=True,
    concurrency=20,
)
hallucination_eval_df

run_evals |          | 0/6 (0.0%) | ⏳ 00:00<? | ?it/s

Unnamed: 0_level_0,label,score,explanation
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ce2272c6d7fed335,factual,0,The query asks for the name of the course. The...
a5b6a984a3db453f,hallucinated,1,The answer states that the name of the course ...
e78d158a6aa21457,hallucinated,1,The answer is hallucinated because it provides...


In [19]:
from phoenix.trace import DocumentEvaluations, SpanEvaluations

px.Client().log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_eval_df),
    DocumentEvaluations(
        eval_name="Retrieval Relevance", dataframe=retrieved_documents_relevance_df
    ),
)