https://www.llamaindex.ai/blog/building-and-evaluating-a-qa-system-with-llamaindex-3f02e9d87ce1

1. Question Generation

In [5]:
import os

import nest_asyncio
nest_asyncio.apply()

from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

from llama_index.core.evaluation import DatasetGenerator

In [2]:
LLAMAPARSE_API_KEY = os.environ.get('LLAMAPARSE_API_KEY')
if LLAMAPARSE_API_KEY is not None:
    print('API key found')
else:
    print('Check for API key in environment variable')

API key found


In [3]:
# instantiate parser
parser = LlamaParse(
    api_key=LLAMAPARSE_API_KEY,
    result_type="markdown", # or text
    # num_workers=4 # for multiple files
    verbose=True,
    language="en", # default is english
)

In [4]:
# load document and parse it 
documents = parser.load_data('../data/axis-press-release-q3fy24.pdf')

Started parsing the file under job_id caff5e8d-4ba4-4bb8-a53d-27f4504c1f1c


#### Generate Questions

In [5]:
data_generator = DatasetGenerator.from_documents(documents)
questions = data_generator.generate_questions_from_nodes()

  return cls(
  return QueryResponseDataset(queries=queries, responses=responses_dict)


In [6]:
questions

["What was Axis Bank's operating profit for the quarter ended 31st December 2023?",
 "How much did Axis Bank's PAT increase by quarter-on-quarter for Q3FY24?",
 'What was the consolidated ROE for Axis Bank for the nine months ended 31st December 2023?',
 'How much did the fee income grow year-on-year for Axis Bank?',
 'What was the growth percentage of retail loans for Axis Bank on a year-on-year basis?',
 'What was the GNPA% for Axis Bank in Q3FY24 and how did it change year-on-year?',
 'How many credit cards were issued by Axis Bank in Q3FY24?',
 'What is the market share of Axis Bank in the CIF market for credit cards?',
 'What new digital banking solution did Axis Bank launch in the quarter?',
 'What awards did Axis Bank win during the quarter, according to the press release?',
 "What was the focus of Axis Bank's 'Sparsh Week' initiative mentioned in the context information?",
 "How did Axis Bank celebrate 'Sparsh Week' and what was the reach of this initiative?",
 'What was the Ne

#### Generate Answers from source nodes (Context)

In [6]:
from llama_index.core import VectorStoreIndex, load_index_from_storage, StorageContext

In [9]:
# build index
index = VectorStoreIndex.from_documents(documents)

# save to disk
index.set_index_id("axis_pr_vector_index")
index.storage_context.persist('../data/storage')

In [7]:
# rebuild storage
storage_context = StorageContext.from_defaults(persist_dir='../data/storage')
# load index
index = load_index_from_storage(storage_context, index_id="axis_pr_vector_index")

In [8]:
# generate query engine
query_engine = index.as_query_engine(similarity_top_k=3)

#### Evaluate answers

Evaluation answers 3 questions
* Response and source nodes match - Response + Source Nodes (Context) - Hallucination
* Response, source nodes (context) and query match? - query + response + source nodes (context)
* Which of the retrieved source nodes used to generate a response? - query + response + individual source nodes (context)

##### 1. Response and source nodes match - Response + Source Nodes (Context) - Hallucination

* response object for a query returns both source nodes and response
* we evaluate here without taking into account query
* Checks for model hallucination

In [9]:
from llama_index.llms.openai import OpenAI
from llama_index.core.evaluation import FaithfulnessEvaluator

In [10]:
# set llm and load evaluator
llm = OpenAI(model="gpt-4", temperature=0.0)
evaluator = FaithfulnessEvaluator(llm=llm)

In [12]:
response = query_engine.query("What was Axis Bank's operating profit for the quarter ended 31st December 2023?")
eval_result = evaluator.evaluate_response(response=response)
# print(eval_result)
print(str(eval_result.passing))

True


2. Retrieval Evaluation