# AI Agent Evaluation by Using Dria

This notebook shows how to generate an evaluation set for your AI agents by using Dria's [QA pipeline](https://docs.dria.co/factory/qa/). In the end, you can evaluate these agents with [promptfoo](https://www.promptfoo.dev/) and see the evaluation and assessment results.


## Step 1: Initialization


### Install Dependencies

Initially, you need to install the necessary dependencies for this notebook by running following code block. We ***recommend*** using your local machine instead of Google Colab due to incompatibilities between the dependencies in Google Colab and some of those we use. After, you create an Python virtual environment, you can run the following command: 

In [1]:
%pip install --upgrade pip

%pip install requests openai pandas nltk matplotlib firecrawl requests upstash_vector cohere python-dotenv

%pip install dria==0.0.108

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting dria==0.0.108
  Downloading dria-0.0.108-py3-none-any.whl.metadata (1.8 kB)
Downloading dria-0.0.108-py3-none-any.whl (10.1 MB)
   ---------------------------------------- 0.0/10.1 MB ? eta -:--:--
   - -------------------------------------- 0.3/10.1 MB ? eta -:--:--
   ------ --------------------------------- 1.6/10.1 MB 5.2 MB/s eta 0:00:02
   ------------- -------------------------- 3.4/10.1 MB 6.7 MB/s eta 0:00:01
   -------------------- ------------------- 5.2/10.1 MB 7.2 MB/s eta 0:00:01
   --------------------------- ------------ 6.8/10.1 MB 7.4 MB/s eta 0:00:01
   ---------------------------------- ----- 8.7/10.1 MB 7.6 MB/s eta 0:00:01
   ---------------------------------------- 10.1/10.1 MB 7.5 MB/s eta 0:00:00
Installing collected packages: dria
  Attempting uninstall: dria
    Found existing installation: dria 0.0.107
    Uninstallin

### Set Environmental Variables

To run and use external applications in this notebook, you need to have API keys. You can obtain API keys from the following providers' websites:

- **Firecrawl**: https://www.firecrawl.dev/

- **Jina Reader**: https://jina.ai/reader/

- **Upstash**: https://upstash.com/docs/introduction

- **Cohere**: https://cohere.com/

- **OpenAI**: https://openai.com/

- **Open Router**: https://openrouter.ai/docs/api-keys

After, obtaining these keys, you need to create an *.env* file with the following structure and content:

```yaml

FIRECRAWL_KEY: "<YOUR_FIRECRAWL_API_KEY>"

OPEN_ROUTER_KEY: "<YOUR_OPEN_ROUTER_API_KEY>"

JINA_KEY: "<YOUR_JINA_READER_API_KEY>"

OPENAI_KEY: "<YOUR_OPENAI_API_KEY>"

COHERE_KEY: "<YOUR_COHERE_API_KEY>"

UPSTASH_KEY: "<YOUR_UPSTASH_KEY>"
```

On the other hand, you **do not** need to have any API keys to use Dria. If you intend to run this notebook solely for evaluation set generation with Dria, then skip obtaining API keys part. To set the keys as environmental variables, run the following block:

In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

try: 
    firecrawl_api_key = os.getenv("FIRECRAWL_KEY")

    upstash_key = os.getenv("UPSTASH_KEY")

    jina_api_key = os.getenv("JINA_KEY")

    openai_api_key = os.getenv("OPENAI_KEY")

    cohere_api_key = os.getenv("COHERE_KEY")

    open_router_key = os.getenv("OPEN_ROUTER_KEY")

    print("All keys loaded successfully")
except:
    print("Error loading keys")


All keys loaded successfully


## Step 2: Get Proprietary Data for QA Generation

To generate the evaluation set, we need to acquire a proprietary data. This is an essential step for providing necessary inputs (context and personas) to the QA pipeline. Therefore, we need to get contexts from documents and generate personas dataset. Both datasets must be structured appropriately for use with Dria.

### Generate Context Data for QA

One effective method for getting this data is by scraping and fetching documents from specific web domains. Using Firecrawl, you can scrape a single URL or entire documents within a domain. After scraping, you can fetch the content for each URL using Jina Reader. In this notebook, we decided to use Dria's documents as proprietary data. The result will be saved in the following format:

```json
{
    "url": "url",
    "content": "response.text"
}
```

where *url* represents the document's URL and *content* is the fetched content of the document in Markdown format.

In [5]:
import requests
from firecrawl import FirecrawlApp

# Initialize Firecrawl
firecrawl = FirecrawlApp(api_key=firecrawl_api_key)


def fetch_content_with_jina(urls):
    """
    Fetch content from a list of URLs using the Jina Reader.

    Args:
        urls (list): List of URLs to fetch content from.

    Returns:
        list: List of dictionaries containing URL and content.
    """
    content_data = []
    for url in urls:
        try:
            headers = {
                'Authorization': f'Bearer {jina_api_key}',
                'X-Retain-Images': 'none'
            }
            response = requests.get(f'https://r.jina.ai/{url}', headers=headers)
            if response.status_code == 200:
                content_data.append({'url': url, 'content': response.text})
            else:
                print(f"Failed to fetch content for {url}: {response.status_code}")
        except Exception as e:
            print(f"Error fetching content for {url}: {e}")
    return content_data


def scrape_single_url(url):
    """
    Scrape content from a single URL using the Jina Reader.

    Args:
        url (str): URL to scrape.

    Returns:
        dict: Dictionary containing the URL and content.
    """
    try:
        headers = {
            'Authorization': f'Bearer {jina_api_key}',
            'X-Retain-Images': 'none'
        }
        response = requests.get(f'https://r.jina.ai/{url}', headers=headers)
        if response.status_code == 200:
            print(f"Successfully fetched content for {url}")
            return {'url': url, 'content': response.text}
        else:
            print(f"Failed to fetch content for {url}: {response.status_code}")
            return None
    except Exception as e:
        print(f"Error scraping {url}: {e}")
        return None


def map_and_scrape_domain(domain):
    """
    Map all URLs under a domain using Firecrawl and fetch content.

    Args:
        domain (str): The domain to map and scrape.

    Returns:
        list: List of dictionaries containing URLs and content.
    """
    try:
        # Map the domain to gather all URLs
        response = firecrawl.map_url(domain)
        print("Firecrawl Response:", response)  # Log the full response for debugging

        # Check if the 'links' key is present
        if 'links' in response:
            urls = response['links']
            print(f"Mapped {len(urls)} URLs from {domain}")

            # Fetch content for all URLs
            return fetch_content_with_jina(urls)
        else:
            print(f"Unexpected response structure: {response}")
            return []  # Return empty list if mapping fails
    except Exception as e:
        print(f"Error mapping domain {domain}: {e}")
        return []

To simplify the process of scraping content from web domains, you can use the command-line interface below for user interaction. Within this interface, you have the option to either scrape an entire domain or a single URL. After making your selection, you need to write the domain or URL based on your previous choice. In our example, we chose scrapping entire [Dria Docs](https://docs.dria.co/) domain.

In [5]:
import json

# A Command Line Interface for scrapping URLs and domains
print("Choose an option:")
print("1. Scrape all URLs under a domain")
print("2. Scrape only the given URL")

choice = input("Enter your choice (1 or 2): ").strip()

if choice == "1":
    domain = input("Enter the domain (e.g., https://example.com): ").strip()
    domain_content = map_and_scrape_domain(domain)
    if domain_content:
        print("Scraping complete. Here's the content:")
        print(domain_content)

        # Save in structured JSON format
        with open("scraped_domain_content.json", "w") as f:
            json.dump(domain_content, f, indent=2)

        print("Data saved to 'scraped_domain_content.json'")
    else:
        print("No content was scraped.")
elif choice == "2":
    url = input("Enter the URL to scrape: ").strip()
    if url:
        result = scrape_single_url(url)
        if result:
            print("Scraping complete. Here's the content:")
            print(result)

            # Save in structured JSON format
            with open("scraped_single_url_content.json", "w") as f:
                json.dump([result], f, indent=2)
                
            print("Data saved to 'scraped_single_url_content.json'")
        else:
            print("Failed to scrape the URL.")
    else:
        print("Invalid URL.")
else:
    print("Invalid choice. Exiting.")

Choose an option:
1. Scrape all URLs under a domain
2. Scrape only the given URL
Firecrawl Response: {'success': True, 'links': ['https://docs.dria.co', 'https://docs.dria.co/installation', 'https://docs.dria.co/node', 'https://docs.dria.co/quickstart', 'https://docs.dria.co/factory/search', 'https://docs.dria.co/cookbook/eval', 'https://docs.dria.co/factory/text_retrieval', 'https://docs.dria.co/cookbook/nemotron_qa', 'https://docs.dria.co/factory/quality_evolution', 'https://docs.dria.co/factory/multihopqa', 'https://docs.dria.co/factory/csv_extender', 'https://docs.dria.co/factory/instruction_backtranslation', 'https://docs.dria.co/factory/list_extender', 'https://docs.dria.co/factory/qa', 'https://docs.dria.co/factory/subtopic', 'https://docs.dria.co/factory/iterate_code', 'https://docs.dria.co/factory/text_matching', 'https://docs.dria.co/factory/web_multi_choice', 'https://docs.dria.co/factory/persona', 'https://docs.dria.co/factory/evolve_complexity', 'https://docs.dria.co/facto

The fetched contents will be stored in *scraped_domain_content.json* file. For example, the first item in the JSON file is:

In [6]:
import json

# Load the scraped content
with open("scraped_domain_content.json", "r") as f:
    content = json.load(f)

# Display the content
display(content[0])


{'url': 'https://docs.dria.co',
 'content': 'Title: What is Dria? - Dria Docs\n\nURL Source: https://docs.dria.co/\n\nMarkdown Content:\nDria is the only synthetic data infrastructure that you can balance data quality, diversity, and complexity all together in a single interface.\n\n*   A framework for creating, managing, and orchestrating synthetic data pipelines.\n*   A multi-agent network that can synthesize data from web and siloed sources.\n\n### Why use Dria?[¶](https://docs.dria.co/#why-use-dria "Permanent link")\n\nDria provides the scalable and versatile tools you need to accelerate your AI development with high-quality, diverse synthetic datasets.\n\n**No GPUs needed**:\n\nAs a network, Dria allows you to directly offload your compute needs to the network, leveraging massive parallelization. This means faster processing and more efficient resource utilization without the need for personal GPU infrastructure.\n\n**Model Rich**:\n\nDria provides flexible tools to write custom p

### Generate Personas Data with Dria

The next step for generating proprietary data is obtaining a personas dataset to feed into the QA pipeline. Dria provides a [Persona Pipeline](https://docs.dria.co/factory/persona/) made of four singletons that generates backstories or bios for characters based on their traits and simulation descriptions. In the pipeline, there are two schemas: PersonaBio for short bios and PersonaBackstory for longer backstory. In this notebook, we focus on generating short bio to describe the each character's traits and background knowledge.

Additionally, you can use single or multiple language models to generate these bios. Furthermore, you can change the simulation description with your own. To ensure a structured output, we save the generated dataset in JSON format.

In [7]:
from dria import DriaDataset, DatasetGenerator, Model
from dria.factory.persona import PersonaBio

# Create Dria Dataset with any name and description you want
my_dataset = DriaDataset(
    name="personas-dataset-v8", 
    description="A persona dataset for agent evaluation by using QA pipeline",
    schema=PersonaBio[-1].OutputSchema,
)

# Create generator
generator = DatasetGenerator(dataset=my_dataset)

# Define your simulation description
simulation_desc = """AI engineers and researchers trying to generate high-quality synthetic data with Dria. Dria is the only synthetic data infrastructure that you can balance data quality, diversity, and complexity all together in a single interface."""

# Define instructions with simulation description and number of samples
instructions = [
    {
        "simulation_description": simulation_desc,
        "num_of_samples": 10,
    }
]

# Generate personas data using the generator
# You can use a single model or a list of models
await generator.generate(
    instructions=instructions,
    singletons=PersonaBio,
    models=[
        Model.ANTHROPIC_SONNET_3_5_OR,
        Model.QWEN2_5_72B_OR,
        Model.GPT4O,
    ],
)

# Export results using to_json() method
with open("personas.json", "w") as f:
    my_dataset.to_json(f)

# Print first item as example
df = my_dataset.to_pandas()
print("\nExample Generated Persona:")
print(df.iloc[0]['bio'])

  from .autonotebook import tqdm as notebook_tqdm
Fetching results...: 100%|██████████| 1/1 [02:08<00:00, 128.98s/it]
Adding entries to DB: 100%|██████████| 10/10 [00:00<00:00, 195.75it/s]
Fetching results...:  90%|█████████ | 9/10 [01:10<00:07,  7.84s/it]
Adding entries to DB: 100%|██████████| 10/10 [00:00<00:00, 290.96it/s]
Adding entries to DB: 100%|██████████| 10/10 [00:00<00:00, 293.51it/s]


Example Generated Persona:
At 42, Kai Ngata, a retired Native Hawaiian who once juggled college studies with a rewarding career, now faces frequent healthcare visits for unresolved health concerns, often frustrated by long wait times at the clinic 14 miles away, yet navigates the system with a discerning eye and moderate tech usage, reflecting a deep-rooted resilience and tenacity earned through life's twists and turns, supported by Medicaid and with the unwavering partnership of a beloved spouse by their side.





2024-12-22 20:51:46,646 - ERROR - Failed to get content topic results: 429, message='Attempt to decode JSON with unexpected mimetype: text/plain; charset=utf-8', url='https://community.rpc.dria.co/rpc/results'
2024-12-22 20:51:46,648 - ERROR - Failed to get content topic pong: 429, message='Attempt to decode JSON with unexpected mimetype: text/plain; charset=utf-8', url='https://community.rpc.dria.co/rpc/pong'
2024-12-22 20:51:46,649 - ERROR - Error during heartbeat process: 429, message='Attempt to decode JSON with unexpected mimetype: text/plain; charset=utf-8', url='https://community.rpc.dria.co/rpc/pong' (Topic: pong)
Traceback (most recent call last):
  File "c:\Users\Sertac B. Afsari\dria-usecases\.venv\Lib\site-packages\dria\request\rest.py", line 74, in get_content_topic
    res_json = await response.json()
               ^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Sertac B. Afsari\dria-usecases\.venv\Lib\site-packages\aiohttp\client_reqrep.py", line 1281, in json
    raise ContentT

### Combine Personas with Context Randomly

Last step for generating proprietary data is to combine personas data with context data. With this step, we will have comprehensive inputs for the QA pipeline. Additionally, we select personas randomly to decrease the bias in the data. The resulted JSON file will have a format as:

```json
  {
    "persona_bio": "<PERSONA BIO>",
    "context": "<FETCHED CONTENT>",
    "url": "<URL OF THE CONTENT>"
  }

```

In [8]:
import json
import random

# Load scraped content data from a JSON file as contexts
scraped_content_path = "scraped_domain_content.json"  # Replace with your actual path to the scraped content

with open(scraped_content_path, "r") as f:
    scraped_content = json.load(f)

# Load personas from a JSON file
personas_path = "personas.json"  # Replace with your actual path to the personas dataset

with open(personas_path, "r") as f:
    personas = json.load(f)

# Combine personas and scraped contexts
combined_data = []
for item in scraped_content:
    persona = random.choice(personas)  # Randomly select a persona
    combined_data.append({
        "persona_bio": persona["bio"], # Include the persona bio
        "context": item["content"], # Use the scraped content as the context
        "url": item["url"],  # Include the URL for reference
    })

# Save combined data to a JSON file
combined_json_path = "combined_data.json"
with open(combined_json_path, "w") as f:
    json.dump(combined_data, f, indent=2)

As an example, the first item of the combined data is:

In [9]:
import json

# Load the combined data
with open("combined_data.json", "r") as f:
    combined_data = json.load(f)

# Display the combined data
display(combined_data[0])

{'persona_bio': "Dr. Smith's patient, a single 78-year-old Native Hawaiian woman pursuing a Master’s degree while balancing a complicated pregnancy, adeptly navigates the challenges of modern healthcare with private insurance, frequent five-mile trips to her healthcare provider, and a keen eye on the cost of services despite her dissatisfaction with facility cleanliness, all while marveling at the integration of synthetic data in shaping her personalized healthcare experiences.",
 'context': 'Title: What is Dria? - Dria Docs\n\nURL Source: https://docs.dria.co/\n\nMarkdown Content:\nDria is the only synthetic data infrastructure that you can balance data quality, diversity, and complexity all together in a single interface.\n\n*   A framework for creating, managing, and orchestrating synthetic data pipelines.\n*   A multi-agent network that can synthesize data from web and siloed sources.\n\n### Why use Dria?[¶](https://docs.dria.co/#why-use-dria "Permanent link")\n\nDria provides the 

## Step 3: Generate an Evaluation Dataset by Using Dria




In this step, we enrich a dataset by generating Question-Answer (QA) pairs based on the provided persona and context. To accomplish this, utilize the following singletons: 

- QuestionGeneration: Generates questions derived from the persona's bio and the given context.
- AnswerGeneration: Produces answers using the persona's bio, the context, and the generated questions.

This approach ensures that each QA pair is tailored to the specific persona and context, thereby creating a well-defined evaluation set for the AI agent. In conclusion, the output data has the following format:

```json
  {
    "persona_bio": "<PERSONA BIO>",
    "context": "<FETCHED CONTENT>",
    "url": "<URL OF THE CONTENT>",
    "question": "<GENERATED QUESTION>",
    "answer": "<GENERATED ANSWER>"
  }

```

In [None]:
from dria import DriaDataset, DatasetGenerator, Model
from qa_pipeline.question import QuestionGeneration
from qa_pipeline.answer import AnswerGeneration
import json
import os

# If the JSON file is in the same directory as the notebook
json_file_path = 'combined_data.json'

# Check if the file exists to prevent errors
if not os.path.exists(json_file_path):
    raise FileNotFoundError(f"The file {json_file_path} does not exist.")

# Open and load the JSON data
with open(json_file_path, 'r', encoding='utf-8') as file:
    instructions = json.load(file)

# Optional: Verify the loaded data
print(f"Loaded {len(instructions)} instruction(s) from {json_file_path}.")

# Initialize the dataset
my_dataset = DriaDataset(
    name="QA_dataset-v4",
    description=" ",
    schema=AnswerGeneration.OutputSchema
)


# Initialize the generator
generator = DatasetGenerator(dataset=my_dataset)

# Run the asynchronous generate function using await
await generator.generate(
    instructions=instructions,
    singletons=[QuestionGeneration, AnswerGeneration],
    models=[Model.GPT4O,Model.GEMINI_15_FLASH,Model.GPT4O_MINI]
)

# Export the dataset to JSON
my_dataset.to_json("QA_dataset.json")

  from .autonotebook import tqdm as notebook_tqdm


Loaded 48 instruction(s) from combined_data.json.


Fetching results...:  98%|█████████▊| 47/48 [05:55<00:07,  7.57s/it]
Adding entries to DB: 100%|██████████| 47/47 [00:00<00:00, 257.04it/s]
Fetching results...: 100%|██████████| 47/47 [03:05<00:00,  3.95s/it]
Adding entries to DB: 100%|██████████| 47/47 [00:00<00:00, 259.19it/s]
Adding entries to DB: 100%|██████████| 47/47 [00:00<00:00, 259.17it/s]


2024-12-22 23:14:44,295 - ERROR - Failed to get content topic pong: 
2024-12-22 23:14:44,297 - ERROR - Error during heartbeat process: 
Traceback (most recent call last):
  File "c:\Users\Sertac B. Afsari\dria-usecases\.venv\Lib\site-packages\aiohttp\streams.py", line 347, in _wait
    await waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\Sertac B. Afsari\dria-usecases\.venv\Lib\site-packages\dria\client\monitor.py", line 56, in run
    await self._check_heartbeat()
  File "c:\Users\Sertac B. Afsari\dria-usecases\.venv\Lib\site-packages\dria\client\monitor.py", line 75, in _check_heartbeat
    topic_responses = await self.rpc.get_content_topic(HEARTBEAT_OUTPUT_TOPIC)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Sertac B. Afsari\dria-usecases\.venv\Lib\site-packages\dria\request\rest.py", line 74, in get_content_topic
    

### Convert QA Pairs to Embedding Model

Before the evaluation, we decided to convert the QA pairs into embedding vectors. To achieve this, we utilized OpenAI's embedding creation function and saved the resulting data as a Pandas DataFrame in CSV format. This approach allows us to enable efficient and quick comparison and evaluation for our AI agent.

In [6]:
import openai
import pandas as pd

# Load the generated dataset
df = pd.read_json("QA_dataset.json") # Replace with the actual path to the generated dataset

print("Dataset loaded successfully!")

# Save as CSV
df.to_csv("QA_dataset.csv", index=False) # You can change the filename if needed

print("Dataset in JSON saved as CSV!")

# Initialize OpenAI client
client = openai.OpenAI(api_key=openai_api_key)

# A function to get embeddings
def get_embeddings(text):
    try:
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=text,
            encoding_format="float"
        )
        return response.data[0].embedding
    except Exception as e:
        print(f"An error occurred: {e}")

# Load the evaluation dataset
data_frame = pd.read_csv("QA_dataset.csv", encoding='latin1')

print("Creating embeddings for the evaluation dataset...")
data_frame['question_embedding'] = data_frame['question'].apply(get_embeddings)

# Save the embeddings
data_frame.to_csv('db_evaluation_embeddings.csv', index=False)

print("Embeddings created and saved successfully!")

Dataset loaded successfully!
Dataset in JSON saved as CSV!
Creating embeddings for the evaluation dataset...
Embeddings created and saved successfully!


## Step 4: Create a Vector Index for Embeddings

In this step, we create and upload a vector index to Upstash using the evaluation embeddings generated in the previous step. This process is important for implementing vector-based search which allows for efficient evaluation for embeddings and retrieval of numerical vector embeddings. Specifically, we upsert the evaluation of each question's embedding along with its metadata which contains the question itself, golden answer (generated answer in previous step), and the relevant context. 

In [7]:
import pandas as pd
from upstash_vector import Index

# Connect to the Upstash index
index = Index(
    url="https://champion-tetra-58691-eu1-vector.upstash.io",
    token= upstash_key
)

# Load the CSV file
csv_path = "db_evaluation_embeddings.csv"  # Replace with your actual file path
data = pd.read_csv(csv_path)

# Prepare the data for upsert
vectors = []
for index_, row in data.iterrows():
    vector_id = f"vector_{index_}"
    embedding = eval(row["question_embedding"])
    print(f"Uploading: ID={vector_id}, Length={len(embedding)}")  # Log the vector ID and length for debugging
    metadata = {
        "Question": row["question"],
        "Golden Answer": row["answer"],
        "Context": row["context"]
    }
    vectors.append((vector_id, embedding, metadata))

# Upsert the vectors to Upstash
print("Uploading data to Upstash...")
index.upsert(vectors=vectors)
print("Data uploaded successfully!")

Uploading: ID=vector_0, Length=1536
Uploading: ID=vector_1, Length=1536
Uploading: ID=vector_2, Length=1536
Uploading: ID=vector_3, Length=1536
Uploading: ID=vector_4, Length=1536
Uploading: ID=vector_5, Length=1536
Uploading: ID=vector_6, Length=1536
Uploading: ID=vector_7, Length=1536
Uploading: ID=vector_8, Length=1536
Uploading: ID=vector_9, Length=1536
Uploading: ID=vector_10, Length=1536
Uploading: ID=vector_11, Length=1536
Uploading: ID=vector_12, Length=1536
Uploading: ID=vector_13, Length=1536
Uploading: ID=vector_14, Length=1536
Uploading: ID=vector_15, Length=1536
Uploading: ID=vector_16, Length=1536
Uploading: ID=vector_17, Length=1536
Uploading: ID=vector_18, Length=1536
Uploading: ID=vector_19, Length=1536
Uploading: ID=vector_20, Length=1536
Uploading: ID=vector_21, Length=1536
Uploading: ID=vector_22, Length=1536
Uploading: ID=vector_23, Length=1536
Uploading: ID=vector_24, Length=1536
Uploading: ID=vector_25, Length=1536
Uploading: ID=vector_26, Length=1536
Uploading: 

After uploading the data, you can check the result with a sample request:

In [8]:
query_vector = vectors[0][1]  # Using the first vector for querying
results = index.query(vector=query_vector, top_k=1, include_metadata=True)
print("Query Result:", results)

Query Result: [QueryResult(id='vector_0', score=1.0, vector=None, metadata={'Question': "Given my complex health situation and the five-mile round trip to my healthcare provider, how could Databricks' use of synthetic data, similar to Dria's capabilities, potentially improve the efficiency and cost-effectiveness of my personalized care while ensuring data privacy and security?", 'Golden Answer': "Databricks' use of synthetic data, leveraging capabilities similar to those of Dria, could enhance the efficiency and cost-effectiveness of your personalized healthcare in several ways, while maintaining data privacy and security:\n\n1. **Synthetic Data for Personalization**: By creating synthetic datasets that represent a wide range of health scenarios and outcomes, Databricks could enable healthcare providers to better understand your specific conditions. This would allow for highly personalized treatment plans tailored to your unique health situation, as synthetic data can be designed to re

## Step 5: Running an Evaluation with Promptfoo

In the last step, we will evaluate the AI agent by using the Promptfoo. For this process, we will run three different evaluations for

- Vanilla RAG
- RAG + Jina Reranker
- RAG + Cohere Reranker

across multiple models and see which model perform best with each methodology in our use case.

### Preparing Datasets and Configurations

At first, we prepare our datasets:

In [9]:
import pandas as pd
import requests
from upstash_vector import Index
import cohere

# Jina Reranker endpoint and headers
jina_url = "https://api.jina.ai/v1/rerank"
jina_headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {jina_api_key}"
}

# Cohere Reranker client setup
cohere_client = cohere.Client(api_key=cohere_api_key)

# Upstash Index
index = Index(
    url="https://champion-tetra-58691-eu1-vector.upstash.io",
    token=upstash_key
)

# File path
file_path = "db_evaluation_embeddings.csv" # Replace with the actual path to the Embeddings CSV

# Load the CSV
df = pd.read_csv(file_path)

# Ensure columns exist and are of type `object`
for col in ["Simple_context", "Jina_context", "Cohere_context"]:
    if col not in df.columns:
        df[col] = ""
    df[col] = df[col].astype("object")

# Function to check if embedding is valid
def is_valid_embedding(embedding):
    return isinstance(embedding, str) and not pd.isna(embedding)

# Function to fetch documents from Upstash
def fetch_documents_from_upstash(embedding):
    try:
        # Query the Upstash index
        response = index.query(
            vector=eval(embedding),  # Ensure the embedding is a list of floats
            top_k=5,  # Number of top documents to retrieve
            include_metadata=True
        )
        print(f"Upstash Response: {response}")  # Debug: Inspect response structure

        # Extract documents from the response
        documents = []
        for item in response:  # Iterate over the list of QueryResult objects
            if hasattr(item, "metadata") and "text" in item.metadata:
                documents.append(item.metadata["text"])

        # Combine documents into a single string
        return "|||".join(documents)
    except Exception as e:
        print(f"Error fetching documents from Upstash: {e}")
        return ""

# Function to get reranked context from Jina
def get_jina_reranked_context(question, documents):
    try:
        payload = {
            "model": "jina-reranker-v2-base-multilingual",
            "query": question,
            "top_n": 1,
            "documents": documents.split("|||")
        }
        response = requests.post(jina_url, headers=jina_headers, json=payload)
        response.raise_for_status()
        return response.json()["results"][0]["document"]["text"]
    except Exception as e:
        print(f"Jina reranker failed for question: {question}, error: {e}")
        return ""

# Function to get reranked context from Cohere
def get_cohere_reranked_context(question, documents):
    try:
        doc_list = documents.split("|||")  # Split the documents into a list
        response = cohere_client.rerank(
            model="rerank-v3.5",
            query=question,
            documents=doc_list,
            top_n=1
        )
        top_result = response["results"][0]  # Get the top result
        top_index = top_result["index"]  # Get the index of the top-ranked document
        return doc_list[top_index]  # Return the document corresponding to the index
    except Exception as e:
        print(f"Cohere reranker failed for question: {question}, error: {e}")
        return ""

# Process each row in the CSV
for idx, row in df.iterrows():
    try:
        print(f"Processing Question: {row['question']}")

        # Check if question_embedding is valid
        if not is_valid_embedding(row["question_embedding"]):
            print(f"Skipping row {idx}: Invalid embedding.")
            continue

        # Fetch Simple Context from Upstash
        simple_context = fetch_documents_from_upstash(row["question_embedding"])
        print(f"Simple_context for row {idx}: {simple_context}")
        df.at[idx, "Simple_context"] = simple_context

        # Apply Jina Reranker
        jina_context = get_jina_reranked_context(row["question"], simple_context)
        print(f"Jina Reranker Result for row {idx}: {jina_context}")
        df.at[idx, "Jina_context"] = jina_context

        # Apply Cohere Reranker
        cohere_context = get_cohere_reranked_context(row["question"], simple_context)
        print(f"Cohere Reranker Result for row {idx}: {cohere_context}")
        df.at[idx, "Cohere_context"] = cohere_context

    except Exception as e:
        print(f"Error processing row {idx}: {e}")

# Save the updated DataFrame to CSV
df.to_csv(file_path, index=False, encoding="utf-8")
print(f"Updated CSV saved to {file_path}")

Processing Question: Given my complex health situation and the five-mile round trip to my healthcare provider, how could Databricks' use of synthetic data, similar to Dria's capabilities, potentially improve the efficiency and cost-effectiveness of my personalized care while ensuring data privacy and security?
Upstash Response: [QueryResult(id='vector_0', score=1.0, vector=None, metadata={'Question': "Given my complex health situation and the five-mile round trip to my healthcare provider, how could Databricks' use of synthetic data, similar to Dria's capabilities, potentially improve the efficiency and cost-effectiveness of my personalized care while ensuring data privacy and security?", 'Golden Answer': "Databricks' use of synthetic data, leveraging capabilities similar to those of Dria, could enhance the efficiency and cost-effectiveness of your personalized healthcare in several ways, while maintaining data privacy and security:\n\n1. **Synthetic Data for Personalization**: By crea

After, the updated *db_evaluation_embeddings.csv* file is saved, we prepare YAML configuration files for Promptfoo.

In [10]:
import yaml
import os

def generate_yaml(config_name, description, context_field, csv_path, output_path, providers):
    """
    Generates a YAML configuration file for promptfoo evaluation.

    Args:
        config_name (str): Name of the YAML file.
        description (str): Description of the evaluation.
        context_field (str): Context field to use in the prompt (e.g., Simple_context, Jina_context, Cohere_context).
        csv_path (str): Path to the input CSV file.
        output_path (str): Path for the output evaluation results.
        providers (list): List of provider configurations with their API keys.

    Returns:
        None
    """
    yaml_data = {
        "description": description,
        "providers": providers,
        "prompts": [
            {
                "id": f"{config_name}_prompt",
                "label": f"{description} Prompt",
                "raw": f"""
Context:
{{{{{context_field}}}}}

Question:
{{{{question}}}}

Provide a detailed, accurate answer.
"""
            }
        ],
        "tests": csv_path,
        "defaultTest": {
            "assert": [
                {
                    "type": "llm-rubric",
                    "value": """
Evaluate the responses based on the following criteria against the golden answer.
Golden Answer:{{golden_answer}} 
- Relevance: How well does the response answer the question?
- Completeness: Does the response fully address the question?
- Clarity: Is the response clear and coherent?

If you don't receive any reference answer, fail all models.
"""
                }
            ]
        },
        "outputPath": output_path
    }

    # Write to a YAML file
    yaml_path = f"./{config_name}.yaml"
    with open(yaml_path, 'w') as yaml_file:
        yaml.dump(yaml_data, yaml_file, sort_keys=False)
    print(f"YAML configuration saved: {yaml_path}")

# Generate YAMLs
csv_path = "db_evaluation_embeddings.csv"
output_dir = "/dria-cookbook/dria-rag-eval/"

# Provider configurations with API keys
providers = [
    {"id": "openrouter:openai/gpt-4o", "config": {"apiKey": open_router_key}},
    {"id": "openrouter:anthropic/claude-3.5-sonnet:beta", "config": {"apiKey": open_router_key}},
    {"id": "openrouter:x-ai/grok-2-1212", "config": {"apiKey": open_router_key}},
    {"id": "openrouter:meta-llama/llama-3.2-3b-instruct:free", "config": {"apiKey": open_router_key}},
    {"id": "openrouter:openai/o1", "config": {"apiKey": open_router_key}},
    {"id": "openrouter:meta-llama/llama-3.3-70b-instruct", "config": {"apiKey": open_router_key}},

]

generate_yaml(
    "simple_rag_config",
    "Simple RAG",
    "Simple_context",
    csv_path,
    os.path.join(output_dir, "simple_rag_results.csv"),
    providers
)

generate_yaml(
    "jina_reranker_config",
    "Jina Reranker RAG",
    "Jina_context",
    csv_path,
    os.path.join(output_dir, "jina_reranker_results.csv"),
    providers
)

generate_yaml(
    "cohere_reranker_config",
    "Cohere Reranker RAG",
    "Cohere_context",
    csv_path,
    os.path.join(output_dir, "cohere_reranker_results.csv"),
    providers
)

YAML configuration saved: ./simple_rag_config.yaml
YAML configuration saved: ./jina_reranker_config.yaml
YAML configuration saved: ./cohere_reranker_config.yaml


### Install Promptfoo

Now, we can install Promptfoo and use it for evaluation.

In [11]:
#Install promptfoo
%env npm_config_yes=true
!npx promptfoo@latest

env: npm_config_yes=true


Usage: promptfoo [options] [command]

Options:
  -V, --version               output the version number
  -h, --help                  display help for command

Commands:
  eval [options]              Evaluate prompts
  init [options] [directory]  Initialize project with dummy files or download
                              an example
  view [options] [directory]  Start browser ui
  redteam                     Red team LLM applications
  share [options] [evalId]    Create a shareable URL of an eval (defaults to
                              most recent)
  
  auth                        Manage authentication
  cache                       Manage cache
  config                      Edit configuration settings
  debug [options]             Display debug information for troubleshooting
  delete [options] <id>       Delete various resources
  export [options] <evalId>   Export an eval record to a JSON file
  feedback [message]          Send feedback to the promptfoo developers
  generate      

We need to add OpenAI API key to the environment for using GPT-4 as judge model in our evaluations

In [13]:
import os

# Add your API key to the environment
os.environ["OPENAI_API_KEY"] = openai_api_key

# Verify the API key is set
if "OPENAI_API_KEY" in os.environ:
    print("OpenAI API Key is set.")

OpenAI API Key is set.


Finally we'll run all three evaluations and see the results:

In [14]:
import os

# Define paths to YAML configurations
yaml_paths = [
    "/dria-cookbook/dria-rag-eval/simple_rag_config.yaml",
    "/dria-cookbook/dria-rag-eval/jina_reranker_config.yaml",
    "/dria-cookbook/dria-rag-eval/cohere_reranker_config.yaml"
]

# Run evaluations for each YAML configuration
for yaml_path in yaml_paths:
    print(f"Running promptfoo eval for {yaml_path}...")
    try:
        # Remove the '!' as it's not necessary in os.system
        command = f"!npx promptfoo@latest eval -c {yaml_path} --no-progress-bar"
        exit_code = os.system(command)
        
        if exit_code == 0:
            print(f"Evaluation completed successfully for {yaml_path}")
        else:
            print(f"Error occurred while evaluating {yaml_path}. Exit code: {exit_code}")
    except Exception as e:
        print(f"Unexpected error during evaluation of {yaml_path}: {e}")

Running promptfoo eval for /dria-usecases/agent-evaluation/simple_rag_config.yaml...
Error occurred while evaluating /dria-usecases/agent-evaluation/simple_rag_config.yaml. Exit code: 1
Running promptfoo eval for /dria-usecases/agent-evaluation/jina_reranker_config.yaml...
Error occurred while evaluating /dria-usecases/agent-evaluation/jina_reranker_config.yaml. Exit code: 1
Running promptfoo eval for /dria-usecases/agent-evaluation/cohere_reranker_config.yaml...
Error occurred while evaluating /dria-usecases/agent-evaluation/cohere_reranker_config.yaml. Exit code: 1
