**Installation of Dependencies**

In [None]:
%pip install "arize-phoenix[evals,llama-index]" llama-index-llms-openai openai llama-index-callbacks-arize-phoenix

**Importing Dependencies**

In [None]:
import os
import openai
import dotenv
import pandas as pd
import phoenix as px
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    ServiceContext,
    set_global_handler
)
from llama_index.llms.openai import OpenAI
from tqdm import tqdm

**Running Phoenix in Background to Collect trace data generated by our LlamaIndex app**

In [5]:
session = px.launch_app()

INFO:phoenix.config:📋 Ensuring phoenix working directory: /Users/sachintripathi/.phoenix


🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


**Passing OPEN AI API Key**

In [6]:
dotenv.load_dotenv()

True

**Setting 'arize_phoenix' as the global handler. This will mount Phoenix's OpenInferenceTraceCallback as the global handler so as to enable our LLM app tracing**


In [7]:
set_global_handler("arize_phoenix")

**Reading our data directory, building a searchable vector store and initializing query engine for LlamaIndex**

In [8]:
# Loading the data from the directory using SimpleDirectoryReader and building a VectorStoreIndex
# directory name is data and it comprises a text document on the New York City Wikipedia information
documents = SimpleDirectoryReader("./data/").load_data()
vector_index = VectorStoreIndex.from_documents(documents)

# Initialising a query engine
query_engine = vector_index.as_query_engine()

In [9]:
# Passing our queries (prompts) based on the data we have loaded
queries = ["What is Game of Thrones?", "Who is the main character?", "What is the significance of Game of Thrones?", "How did Game of Thrones get its name?"]
queries

['What is Game of Thrones?',
 'Who is the main character?',
 'What is the significance of Game of Thrones?',
 'How did Game of Thrones get its name?']

**Generating responses for our query list using a loop**



In [10]:
# Calling a for-loop to pass our query list's item in the query engine
for query in tqdm(queries):
    query_engine.query(query)

I0000 00:00:1731572380.046028  244981 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
100%|██████████| 4/4 [00:11<00:00,  2.95s/it]


**Importing Phoenix's Evaluation components**

In [11]:
from phoenix.session.evaluation import get_qa_with_reference, get_retrieved_documents
from phoenix.trace import DocumentEvaluations, SpanEvaluations
from phoenix.evals import (
    HallucinationEvaluator,
    OpenAIModel,
    QAEvaluator,
    RelevanceEvaluator,
    run_evals,
)

In [12]:
queries_df = get_qa_with_reference(session)
queries_df

Unnamed: 0_level_0,input,output,reference
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
d4427fd3c0918450,What is Game of Thrones?,A Game of Thrones is a story that takes place ...,{\rtf1\ansi\ansicpg1252\cocoartf2761\n\cocoate...
8d6278de57300967,Who is the main character?,Ned Stark,{\rtf1\ansi\ansicpg1252\cocoartf2761\n\cocoate...
b340c73c728bfa85,What is the significance of Game of Thrones?,"The significance of ""Game of Thrones"" lies in ...",{\rtf1\ansi\ansicpg1252\cocoartf2761\n\cocoate...
7dc0bb339d93d672,How did Game of Thrones get its name?,Game of Thrones got its name from the first bo...,{\rtf1\ansi\ansicpg1252\cocoartf2761\n\cocoate...


In [13]:
retrieved_documents_df = get_retrieved_documents(session)
retrieved_documents_df

Unnamed: 0_level_0,Unnamed: 1_level_0,context.trace_id,input,reference,document_score
context.span_id,document_position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
fea854b5fca62492,0,5f92e58dc8ef84dd73c8616a44a6b4bf,What is Game of Thrones?,{\rtf1\ansi\ansicpg1252\cocoartf2761\n\cocoate...,0.842734
fea854b5fca62492,1,5f92e58dc8ef84dd73c8616a44a6b4bf,What is Game of Thrones?,"As pets, all of the Stark children have direwo...",0.814789
bcb282b55af30206,0,edc4c5e4cd113eb823a529b695824014,Who is the main character?,{\rtf1\ansi\ansicpg1252\cocoartf2761\n\cocoate...,0.753602
bcb282b55af30206,1,edc4c5e4cd113eb823a529b695824014,Who is the main character?,"As pets, all of the Stark children have direwo...",0.740006
f3b61759c80872ce,0,5618f7496fe5e4ffa6ed70a2b665b0d8,What is the significance of Game of Thrones?,{\rtf1\ansi\ansicpg1252\cocoartf2761\n\cocoate...,0.834673
f3b61759c80872ce,1,5618f7496fe5e4ffa6ed70a2b665b0d8,What is the significance of Game of Thrones?,"As pets, all of the Stark children have direwo...",0.812663
407052e541421bda,0,38db22458929af36d78c4722d11d054c,How did Game of Thrones get its name?,{\rtf1\ansi\ansicpg1252\cocoartf2761\n\cocoate...,0.85234
407052e541421bda,1,38db22458929af36d78c4722d11d054c,How did Game of Thrones get its name?,"Robert is mortally wounded in a hunt, and befo...",0.839644


**Implementing Hallucination, QA Correctness and Relevance Evaluator**

In [14]:
eval_model = OpenAIModel(model_name = "gpt-4o-mini")
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)
relevance_evaluator = RelevanceEvaluator(eval_model)

hallucination_eval_df, qa_correctness_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_correctness_evaluator],
    provide_explanation=True,
)
relevance_eval_df = run_evals(
    dataframe=retrieved_documents_df,
    evaluators=[relevance_evaluator],
    provide_explanation=True,
)[0]

px.log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_correctness_eval_df),
)
px.log_evaluations(DocumentEvaluations(eval_name="Relevance", dataframe=relevance_eval_df))

WARNI [phoenix.evals.executors] 🐌!! If running inside a notebook, patching the event loop with nest_asyncio will allow asynchronous eval submission, and is significantly faster. To patch the event loop, run `nest_asyncio.apply()`.


The `model_name` field is deprecated. Use `model` instead.                 This will be removed in a future release.


run_evals |          | 0/8 (0.0%) | ⏳ 00:00<? | ?it/s

WARNI [phoenix.evals.executors] 🐌!! If running inside a notebook, patching the event loop with nest_asyncio will allow asynchronous eval submission, and is significantly faster. To patch the event loop, run `nest_asyncio.apply()`.


run_evals |          | 0/8 (0.0%) | ⏳ 00:00<? | ?it/s

WARNI [phoenix.session.evaluation] This `log_evaluations` function is deprecated and will be removed in a future release. Please use `px.Client().log_evaluations(*evaluations)` instead.
WARNI [phoenix.session.evaluation] This `log_evaluations` function is deprecated and will be removed in a future release. Please use `px.Client().log_evaluations(*evaluations)` instead.
