https://medium.com/llamaindex-blog/building-and-evaluating-a-qa-system-with-llamaindex-3f02e9d87ce1

In [None]:
!pip install llama-index

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# download paul graham's essay which is used through out the notebook
import requests
response = requests.get("https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1")
essay_txt = response.text
with open("paul_graham_essay.txt", "w") as f:
  f.write(essay_txt)

In [None]:
from llama_index.evaluation import DatasetGenerator, ResponseEvaluator, QueryResponseEvaluator
from llama_index import (SimpleDirectoryReader,
                         ServiceContext,
                         LLMPredictor,
                         GPTVectorStoreIndex,
                         load_index_from_storage,
                         StorageContext)
from langchain.chat_models import ChatOpenAI

In [None]:
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('secretName')

#### Question Generation

In [None]:
# Load documents
reader = SimpleDirectoryReader(input_files=['paul_graham_essay.txt'])
documents = reader.load_data()

# setup gpt-4 model
llm_predictor = LLMPredictor(
    llm=ChatOpenAI(temperature=0, model_name="gpt-4")
)
service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor, chunk_size_limit=3000
)

# Generate Question
data_generator = DatasetGenerator.from_documents(documents, service_context = service_context)
questions = data_generator.generate_questions_from_nodes()

In [None]:
questions

['What were the two main things the author worked on outside of school before college?',
 "Describe the IBM 1401 and its significance in the author's early programming experience.",
 'What was the first programming language the author used, and how did they input their programs into the computer?',
 "How did the introduction of microcomputers change the author's programming experience?",
 "What was the author's initial plan for their college studies, and why did they change their focus to AI?",
 'What novel and documentary inspired the author to pursue AI?',
 "Explain the author's realization about AI and natural language processing during their first year of grad school.",
 'What programming language did the author focus on after realizing the limitations of AI at the time, and what book did they write about it?',
 'What realization did the author have while visiting the Carnegie Institute, and how did it change their plans for the future?',
 "Describe the author's situation during th

#### Generate Answers/Source Nodes

In [None]:
# Create Index
index = GPTVectorStoreIndex.from_documents(documents)

# save index to disk
index.set_index_id("vector_index")
index.storage_context.persist('storage')

# setup gpt-4 model
llm_predictor = LLMPredictor(
    llm=ChatOpenAI(temperature=0, model_name="gpt-4o-mini")
)
service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor
)

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir='storage')
# load index
index = load_index_from_storage(storage_context, index_id="vector_index")

# Query the index
query_engine = index.as_query_engine(similarity_top_k=3, service_context=service_context)
response = query_engine.query(questions[0])


In [None]:
response

Response(response='The two main things the author worked on outside of school before college were writing and programming.', source_nodes=[NodeWithScore(node=Node(text='\t\t\n\nWhat I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines — CPU, disk drives, printer, card rea

### Evaluation

#### Response + Source Nodes (Context)

In [None]:
# build service context
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4o-mini"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# define evaluator
evaluator = ResponseEvaluator(service_context=service_context)

# evaluate using the response object
eval_result = evaluator.evaluate(response)

In [None]:
eval_result

'YES'

#### Query + Response + Source Nodes (Context)

In [None]:
# build service context
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4o-mini"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# define evaluator
evaluator = QueryResponseEvaluator(service_context=service_context)

# evaluate using the response object
eval_result = evaluator.evaluate(questions[0], response)


In [None]:
eval_result

'YES'

#### Query + Response + Individual Source Nodes (Context)

In [None]:
# build service context
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4o-mini"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# define evaluator
evaluator = QueryResponseEvaluator(service_context=service_context)

# evaluate using the response object
eval_result = evaluator.evaluate_source_nodes(questions[0], response)


In [None]:
eval_result

['YES', 'NO', 'NO']