# Step 2: Create evaluation dataset

Next, we're going to create an evaluation dataset for our app.

## Run the earlier notebook

First, we run the code from the previous notebook. The code below does this for us, we don't need to go back to that notebook!

In [1]:
%run 01-llm-app-setup.ipynb

Repo card metadata block was not found. Setting CardData to empty.
100%|██████████| 5/5 [00:00<00:00, 2799.56it/s]


Parsing nodes:   0%|          | 0/5 [00:00<?, ?it/s]

Documents before chunking: 5
Documents after chunking: 35




 = Plain maskray = 
 
 The plain maskray or brown stingray ( Neotrygon annotata ) is a species of stingray in the family Dasyatidae . It is found in shallow , soft-bottomed habitats off northern Australia . Reaching 24 cm ( 9.4 in ) in width , this species has a diamond-shaped , grayish green pectoral fin disc . Its short , whip-like tail has alternating black and white bands and fin folds above and below . There are short rows of thorns on the back and the base of the tail , but otherwise the s


Generating embeddings:   0%|          | 0/35 [00:00<?, ?it/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


The plain maskray is found in the continental shelf of northern Australia, from the Wellesley Islands in Queensland to the Bonaparte Archipelago in Western Australia, including the Gulf of Carpentaria and the Timor and Arafura Seas. There are also unsubstantiated reports that its range extends to southern Papua New Guinea.

**Source:** Plain maskray  
**Relevant Snippet:** "The plain maskray inhabits the continental shelf of northern Australia from the Wellesley Islands in Queensland to the Bonaparte Archipelago in Western Australia, including the Gulf of Carpentaria and the Timor and Arafura Seas. There are unsubstantiated reports that its range extends to southern Papua New Guinea."


## 1. Take the document chunks created earlier

These are the chunks of documents in our index.

In [2]:
nodes[:5]

[TextNode(id_='94eaffc3-04f5-4c5b-b864-40f25cc7474e', embedding=None, metadata={'title': 'Valkyria Chronicles III'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='65f91edb-6707-4511-baaa-dd935be47e6e', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'title': 'Valkyria Chronicles III'}, hash='f7aadfb478d20e04be770cd882b5e6a44c185eb28a53810838586313c39ccc7c'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='4316ab8e-f7f1-4deb-9f28-50c0574fea38', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='32c32274359d6ec7e58a31e940b4b433c53354c2fb60611c7cc6bd2c324d075c')}, text='= Valkyria Chronicles III = \n \n Senjō no Valkyria 3 : Unrecorded Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to as Valkyria Chronicles III outside Japan , is a tactical role-playing video game developed by Sega and Media.Vision for the PlayStation Portable . Released in Ja

In [3]:
print(len(nodes))

35


## 2. Setup a chain to ask the LLM to create question and answer pairs. 

We'll use GPT-4 here to ensure good Q&A generation. These are generated based on a given chunk of text.

In [4]:
from pydantic import BaseModel, Field, field_validator
# from llama_index.core.output_parsers import PydanticOutputParser

# Define your desired data structure.
class QAExample(BaseModel):
    """
    A data structure that holds a question and its answer.
    """
    question: str = Field(description="Question relevant to the given input")
    answer: str = Field(description="Answer to the question")

    # You can add custom validation logic easily with Pydantic.
    @field_validator("question")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field

# parser = PydanticOutputParser(QAExample)

In [5]:
from llama_index.program.openai import OpenAIPydanticProgram

program = OpenAIPydanticProgram.from_defaults(
    output_cls=QAExample,
    prompt_template_str="Given the following text, generate a set of question and answer about an information contained in the text.\nText:\n```\n{query_str}\n```\n",
    verbose=False,
    llm=openai_llm,
)

In [6]:
sample_index = 32
print(nodes[sample_index].text[:500])

In 2008 , Last and William White elevated the kuhlii group to the rank of full genus as Neotrygon , on the basis of morphological and molecular phylogenetic evidence . 
 In a 2012 phylogenetic analysis based on mitochondrial and nuclear DNA , the plain maskray and the Ningaloo maskray ( N. ningalooensis ) were found to be the most basal members of Neotrygon . The divergence of the N. annotata lineage was estimated to have occurred ~ 54 Ma . Furthermore , the individuals sequenced in the study so


In [7]:
out = program(query_str=nodes[sample_index].text)

In [8]:
out

QAExample(question='What did Last and William White do in 2008 regarding the kuhlii group?', answer='In 2008, Last and William White elevated the kuhlii group to the rank of full genus as Neotrygon, based on morphological and molecular phylogenetic evidence.')

## 3. Generate the dataset

The output above looks good, so let's run the chain over our entire dataset.

In [9]:
gen_qa = []

for node in nodes:
    gen_qa.append(program(query_str=node.text))

In [10]:
gen_qa[:5]

[QAExample(question='What is the full Japanese title of Valkyria Chronicles III?', answer='Senjō no Valkyria 3: Unrecorded Chronicles (Japanese: 戦場のヴァルキュリア3)'),
 QAExample(question='What is the main progression system in the game?', answer='The player progresses through a series of linear missions, gradually unlocked as maps that can be freely scanned through and replayed as they are unlocked.'),
 QAExample(question='What are the five classes of troops mentioned in the text?', answer='The five classes of troops are Scouts, Shocktroopers, Engineers, Lancers, and Armored Soldier.'),
 QAExample(question='Who is the commanding officer of the Nameless?', answer='Ramsey Crowe'),
 QAExample(question='Why was Valkyria Chronicles III developed for PlayStation Portable?', answer="The team wanted to refine the mechanics created for Valkyria Chronicles II and had not come up with a 'revolutionary' idea that would warrant a new entry for the PlayStation 3.")]

## 4. Generate negative samples

Let's also make some questions where the answer is not in any parts of the text.

In [21]:
negative_sample_program = OpenAIPydanticProgram.from_defaults(
    output_cls=QAExample,
    prompt_template_str="Given the following text, generate a question about information not contained in the text, with the answer confirming that the information is not included.\nText:\n```\n{query_str}\n```\n",
    verbose=False,
    llm=OpenAI(model="gpt-4o", temperature=0.9, max_tokens=2048),
)

In [12]:
print(nodes[sample_index].text[:500])

In 2008 , Last and William White elevated the kuhlii group to the rank of full genus as Neotrygon , on the basis of morphological and molecular phylogenetic evidence . 
 In a 2012 phylogenetic analysis based on mitochondrial and nuclear DNA , the plain maskray and the Ningaloo maskray ( N. ningalooensis ) were found to be the most basal members of Neotrygon . The divergence of the N. annotata lineage was estimated to have occurred ~ 54 Ma . Furthermore , the individuals sequenced in the study so


In [13]:
out = negative_sample_program(query_str=nodes[sample_index].text)

In [14]:
out

QAExample(question='What is the geographical distribution of the plain maskray?', answer='The geographical distribution of the plain maskray is not included in the text.')

Checking to make sure the questions generated are varied

In [22]:
for _ in range(2):
    print(negative_sample_program(query_str=nodes[sample_index].text))

question='What is the size range of the Neotrygon ningalooensis?' answer='The text does not provide information about the size range of the Neotrygon ningalooensis.'
question='What is the habitat or geographic distribution of the plain maskray?' answer='The text does not provide information about the habitat or geographic distribution of the plain maskray.'


This looks good, let's run it a few more times

In [16]:
gen_qa_no_answer = []

for i in range(10):
    gen_qa_no_answer.append(negative_sample_program(query_str=nodes[i].text))

In [17]:
gen_qa_no_answer

[QAExample(question='What is the budget for the development of Valkyria Chronicles III?', answer='The budget for the development of Valkyria Chronicles III is not mentioned in the text.'),
 QAExample(question='What is the setting or world in which the game takes place?', answer='The information about the setting or world in which the game takes place is not included in the text.'),
 QAExample(question='What specific battles or missions did the Nameless undertake during the Second Europan War?', answer='The specific battles or missions that the Nameless undertook during the Second Europan War are not mentioned in the text.'),
 QAExample(question='What was the budget for the development of Valkyria Chronicles III?', answer='The text does not provide information about the budget for the development of Valkyria Chronicles III.'),
 QAExample(question='What are the specific sales figures for Valkyria Chronicles III?', answer='The specific sales figures for Valkyria Chronicles III are not men

## 5. Save the datasets

Let's put this in a dataframe so we don't need to rerun all the code again.

In [18]:
import pandas as pd

gen_qa_lst = []

for i in range(len(gen_qa)):
    qa_dict = gen_qa[i].dict()
    qa_dict["ground_truth_context"] = nodes[i].text
    gen_qa_lst.append(qa_dict)
    
for qa in gen_qa_no_answer:
    qa_dict = qa.dict()
    qa_dict["ground_truth_context"] = ""
    gen_qa_lst.append(qa_dict)

gen_dataset = pd.DataFrame(gen_qa_lst)
gen_dataset.rename(columns={"answer": "ground_truth"}, inplace=True)
gen_dataset

Unnamed: 0,question,ground_truth,ground_truth_context
0,What is the full Japanese title of Valkyria Ch...,Senjō no Valkyria 3: Unrecorded Chronicles (Ja...,= Valkyria Chronicles III = \n \n Senjō no Val...
1,What is the main progression system in the game?,The player progresses through a series of line...,The player progresses through a series of line...
2,What are the five classes of troops mentioned ...,"The five classes of troops are Scouts, Shocktr...",Troops are divided into five classes : Scouts ...
3,Who is the commanding officer of the Nameless?,Ramsey Crowe,"Hounded by both allies and enemies , and combi..."
4,Why was Valkyria Chronicles III developed for ...,The team wanted to refine the mechanics create...,"Like its predecessor , Valkyria Chronicles III..."
5,Who produced the anime opening?,Production I.G.,The anime opening was produced by Production I...
6,When was the game Valkyria Chronicles III rele...,"The game was released on January 27, 2011.","During the publicity , story details were kept..."
7,Who wrote the 'Play Test' article for 4Gamer.n...,Naohiko Misuosame,"4Gamer.net writer Naohiko Misuosame , in a "" P..."
8,What is the full title of the anime 'Senjō no ...,Senjō no Valkyria 3: Taga Tame no Jūsō (戦場のヴァル...,Titled Senjō no Valkyria 3 : Taga Tame no Jūsō...
9,Who sang the ending theme 'Someday the Flowers...,Minami Kuribayashi,"The ending theme , "" Someday the Flowers of Li..."


In [19]:
gen_dataset.to_csv("generated_qa.csv", index=False)