## RAG metric

One of the most important steps in any ML and LLM system is measuring prediction accuray aka. metric.  For our use case (harvesting data), since we use RAG, we will use accuracy metric RELEVANCY.  Relevancy measures if the response + source nodes match the query.

If you prefer to measure hallucination, you can use FAITHFULNESS metric

In [1]:
import os, json
from dotenv import load_dotenv

import chromadb

In [2]:
load_dotenv()

True

In [3]:
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

from llama_index.core import (
    SimpleDirectoryReader, 
    VectorStoreIndex, 
    Settings,
)

from llama_index.core.evaluation import RelevancyEvaluator, BatchEvalRunner

In [4]:
os.environ["OPENAI_API_VERSION"] = os.getenv("OPENAI_API_VERSION")
os.environ["OPENAI_API_BASE"] = os.getenv("OPENAI_API_BASE")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

### Define LLM and Embedding models.  These are required for RAG operations

In [5]:
llm = AzureOpenAI(
    engine = "<engine name>",
    model="gpt-4o",
    temperature=0.0,
    azure_endpoint = os.environ['OPENAI_API_BASE'],
    api_key = os.environ['OPENAI_API_KEY'],
    api_version = os.environ['OPENAI_API_VERSION'],
)

embed_model = AzureOpenAIEmbedding (
    model = "text-embedding-ada-002",
    deployment_name= "<deployment name>",
    azure_endpoint = os.environ['OPENAI_API_BASE'],
    api_key = os.environ['OPENAI_API_KEY'],
    api_version = os.environ['OPENAI_API_VERSION'],
)

In [6]:
# set LLM and embeddings at the global level
Settings.llm = llm
Settings.embed_model = embed_model

## Load external business data and index it

In [7]:
data = SimpleDirectoryReader(input_files=["gs_MA_report.pdf"]).load_data()
index = VectorStoreIndex.from_documents(data)

In [8]:
## define required prompts
query_impact = """What was the impact of GenAI on global M&A activity in 2025? 
Show statements in bullet form and show page references after each statement."""

query_quarter = "which quarter typically saw maximum deal activities, as mentioned in the document?"

## RAG metric - Relevancy

In [9]:
relevancy_evaluator = RelevancyEvaluator(llm = llm)

runner = BatchEvalRunner(
    {"relevancy": relevancy_evaluator},
)

In [10]:
eval_results = await runner.aevaluate_queries(
    index.as_query_engine(llm=llm),
    queries=[query_impact],
#     queries=["does the document talk about M&A activities?"]
)

In [11]:
for i, relevancy_result in enumerate(eval_results['relevancy']):
    print(f"Query: {eval_results['relevancy'][i].query}")
    print(f"Relevancy Score: {eval_results['relevancy'][i].score}\n")

Query: What was the impact of GenAI on global M&A activity in 2025? 
Show statements in bullet form and show page references after each statement.
Relevancy Score: 1.0



The score of 1.0 shows that query was answered by RAG nodes.  It implies that the answer is relevant to the RAG query.

### END