
<div style="background: linear-gradient(90deg, #00a4ef, #7fba00, #ffb900, #f25022); padding: 20px; border-radius: 10px; text-align: left; color: black;">
    <h1> üîç | Step 0: Simulate Datasets For Evaluation </h1>
</div>

<p>
You want to evaluate the quality, safety, and agentic efficiency, of your application. To do this you need three things:
1. A dataset of prompts - to serve as inputs
2. A related set of context items - to serve as ground truth
3. The actual response from the model/agent being evaluated 

In this section, we look at how the Simulator capability of the evaluation SDK can be used with a valid source (e.g., the Zava Products index) to identify a valid set of questions and grounding context. _This can then be fed to the model or agent being tested, to retrieve responses that can then be **judged** for various evaluation metrics.

The simulator can _also_ collect responses from a targeted model - generating a dataset that has {question, response, context} triples that can be used for other purposes like fine-tuning.
</p>

---

## Overview

This notebook demonstrates how to generate a synthetic dataset of queries and responses using your Azure Search index with the Simulator tool. The generated dataset can be useful for:

- Testing and evaluating RAG workflows
- Fine-tuning prompts
- Benchmarking search capabilities
- Creating synthetic training data


## Pre-Requisites

1. An Azure OpenAI model deployment (chat completion)
1. An Azure AI Search index ("contoso-products")

---

## 1. Setup Environment

This section loads and validates all required Azure service credentials from environment variables. The code will:
- Load environment variables from the `.env` file using `dotenv`
- Check that all required Azure credentials are available
- Initialize connection parameters for Azure AI Search
- Configure the Azure OpenAI model for the simulator

In [None]:
# 1. Load environment variables and verify Azure services configuration
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Verify all required Azure service credentials are available
required_vars = {
    "AZURE_OPENAI_API_KEY": "Azure OpenAI API key for chat completions",
    "AZURE_OPENAI_ENDPOINT": "Azure OpenAI service endpoint URL", 
    "AZURE_OPENAI_API_VERSION": "Azure OpenAI API version",
    "AZURE_OPENAI_DEPLOYMENT": "Azure OpenAI chat model deployment name",
    "AZURE_SEARCH_ENDPOINT": "Azure AI Search service endpoint URL",
    "AZURE_SEARCH_API_KEY": "Azure AI Search service API key",
    "AZURE_SEARCH_INDEX_NAME": "Azure AI Search index name containing product data"
}

print("üîç Checking environment configuration...")
missing_vars = []
for var, description in required_vars.items():
    if not os.environ.get(var):
        missing_vars.append(f"‚ùå {var}: {description}")
    else:
        print(f"‚úÖ {var}: Configured")

if missing_vars:
    print("\n‚ö†Ô∏è Missing required environment variables:")
    for var in missing_vars:
        print(var)
    raise EnvironmentError("Please set all required environment variables in your .env file")
else:
    print("\nüéâ All environment variables are properly configured!")

In [None]:
# 2. Initialize Azure AI Search connection parameters
search_endpoint = os.environ.get("AZURE_SEARCH_ENDPOINT")
search_api_key = os.environ.get("AZURE_SEARCH_API_KEY") 
search_index_name = os.environ.get("AZURE_SEARCH_INDEX_NAME")

print(f"üîé Azure AI Search Configuration:")
print(f"   Endpoint: Configured ({search_endpoint.split('//')[1].split('.')[0] if search_endpoint else 'Not found'})")
print(f"   Index: {search_index_name if search_index_name else 'Not configured'}")

---

## 2. Initialize the Simulator

This section creates the Azure AI Evaluation Simulator that will generate synthetic datasets.

### 2.1 Create a Model Configuration

The code below creates an Azure OpenAI model configuration using environment variables. This configuration will be used by the simulator to generate synthetic queries and responses.

In [None]:
# 3. Configure Azure OpenAI model for the simulator
from azure.ai.evaluation import AzureOpenAIModelConfiguration

# Create model configuration using environment variables
model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    azure_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT"), 
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
)

print(f"ü§ñ Azure OpenAI Model Configuration:")
endpoint = model_config['azure_endpoint']
print(f"   Endpoint: Configured ({endpoint.split('//')[1].split('.')[0] if endpoint else 'Not found'})")
print(f"   Deployment: {model_config['azure_deployment'] if model_config['azure_deployment'] else 'Not configured'}")
print(f"   API Version: {model_config['api_version'] if model_config['api_version'] else 'Not configured'}")

### 2.2 Instantiate Simulator with the model

This code creates the Azure AI Evaluation Simulator instance using the model configuration from above. The simulator will use this configuration to generate synthetic queries and responses for evaluation purposes.

In [None]:
# 4. Initialize the Azure AI Evaluation Simulator
from azure.ai.evaluation.simulator import Simulator

# Create simulator instance with the configured model
simulator = Simulator(model_config=model_config)
print("üìä Simulator initialized successfully!")

---

## 3. Connect to the Search Index

This section establishes connection to Azure AI Search and defines functions to retrieve relevant content.

### 3.1 Define function to retrieve search results for query

The function below performs the following operations:
- Constructs Azure AI Search API requests with proper authentication
- Searches the index for relevant content based on user queries
- Handles API responses and error conditions
- Returns combined content from search results for use in RAG workflows

In [None]:
# 5. Define function to search Azure AI Search index and retrieve content
import requests
import json

def search_index_for_content(query: str, top_k: int = 5) -> str:
    """
    Search the Azure AI Search index for relevant content based on a query.
    
    Args:
        query (str): Search query to find relevant content
        top_k (int): Number of top results to retrieve (default: 5)
    
    Returns:
        str: Combined content from search results, truncated to 1000 characters
    """
    # Construct the search API endpoint
    search_url = f"{search_endpoint}/indexes/{search_index_name}/docs/search?api-version=2023-11-01"
    
    # Set up request headers with API key authentication
    headers = {
        "Content-Type": "application/json",
        "api-key": search_api_key
    }
    
    # Define the search query payload
    search_payload = {
        "search": query,
        "top": top_k,
        "select": "content,title"  # Updated to match available fields
    }
    
    try:
        # Execute the search request
        response = requests.post(url=search_url, headers=headers, json=search_payload)
        response.raise_for_status()  # Raise an exception for HTTP error codes
        
        # Parse the search results
        search_results = response.json()
        combined_content = ""
        
        # Extract and combine content from search results
        for result in search_results.get("value", []):
            # Prioritize 'content' field, fall back to 'title'
            content = result.get("content") or result.get("title", "")
            if content:
                combined_content += content + " "
        
        # Limit content length to prevent token limits and improve performance
        return combined_content[:1000].strip()
    
    except requests.exceptions.RequestException as e:
        print(f"‚ùå Error searching index: {e}")
        return f"Error retrieving content for query: {query}"
    except json.JSONDecodeError as e:
        print(f"‚ùå Error parsing search response: {e}")
        return f"Error processing search results for query: {query}"

### 3.2 Test the function works with a query

This cell tests the search function with a sample query to verify that:
- The Azure AI Search connection is working properly
- The search index contains retrievable content
- The function returns relevant results for the query

In [None]:
# 6. Test the search function with a sample query
test_query = "spray paint"  # Get all documents to see what's available
retrieved_content = search_index_for_content(test_query)

print(f"üîç Test search for: '{test_query}'")
print(f"üìÑ Retrieved content length: {len(retrieved_content)} characters")
print(f"üìù Sample content preview:")
print("-" * 50)
print(retrieved_content[:300] + "..." if len(retrieved_content) > 300 else retrieved_content)

---

## 4. Create Application Callback

This section defines the RAG (Retrieval-Augmented Generation) application callback that the simulator will use to generate responses.

The callback function:
- Extracts user queries from conversation messages
- Searches the Azure AI Search index for relevant context
- Uses Azure OpenAI to generate responses based on retrieved content
- Returns properly formatted responses for the simulator

In [None]:
# 7. Define the application callback function for the simulator
from typing import Dict, Any, Optional
from openai import AzureOpenAI

async def rag_application_callback(
    messages: Dict,
    stream: bool = False,
    session_state: Any = None,
    context: Optional[Dict[str, Any]] = None,
) -> Dict:
    """
    Callback function that simulates a RAG (Retrieval-Augmented Generation) application.
    
    This function:
    1. Extracts the user query from the message
    2. Searches the Azure AI Search index for relevant content
    3. Uses Azure OpenAI to generate a response based on the retrieved content
    4. Returns the response in the expected format
    
    Args:
        messages (Dict): Message history containing user queries
        stream (bool): Whether to stream the response (not used in this implementation)
        session_state (Any): Session state information
        context (Optional[Dict[str, Any]]): Additional context information
    
    Returns:
        Dict: Response containing the generated message and metadata
    """
    # Extract the user's query from the latest message
    messages_list = messages["messages"]
    user_query = messages_list[-1]["content"]
    
    # Initialize Azure OpenAI client
    openai_client = AzureOpenAI(
        azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
        api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
        api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    )
    
    # Retrieve relevant content from the search index
    retrieved_context = search_index_for_content(user_query)
    
    # Create a system prompt that instructs the model to use the retrieved context
    system_prompt = """You are a polite and helpful assistant that answers questions based on the provided context. 
Use the context information to provide accurate and relevant responses. If the context doesn't contain 
enough information to answer the question, say so politely. If the context mentions a product by name, reference it in the response."""
    
    # Generate response using Azure OpenAI
    try:
        completion = openai_client.chat.completions.create(
            model=os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": f"Context: {retrieved_context}"},
                {"role": "user", "content": f"Question: {user_query}"}
            ],
            max_tokens=500,
            temperature=0.7,
        )
        
        # Extract the generated response
        ai_response = completion.choices[0].message.content
        
    except Exception as e:
        ai_response = f"Sorry, I encountered an error while generating a response: {str(e)}"
    
    # Format the response according to the expected structure
    response_message = {
        "content": ai_response,
        "role": "assistant",
        "context": retrieved_context,
    }
    
    # Add the response to the message history
    messages["messages"].append(response_message)
    
    # Return the complete response structure
    return {
        "messages": messages["messages"], 
        "stream": stream, 
        "session_state": session_state, 
        "context": retrieved_context
    }

---

## 5. Generate & Save Dataset

This section runs the simulator to create synthetic query-response pairs and saves them for evaluation.

### 5.1 Define tasks and run the simulator

The code below:
- Uses content retrieved from the search index as seed material
- Generates realistic queries based on the content
- Creates responses using the RAG application callback
- Produces a dataset of query-response pairs with context for evaluation

In [None]:
# 8. Generate synthetic dataset using the simulator
from pathlib import Path

print("üéØ Starting dataset generation...")
print("This process will:")
print("1. Use content from your search index to generate realistic queries")
print("2. Generate responses using your RAG application callback")
print("3. Create query-response pairs for evaluation purposes")
print()

# Run the simulator to generate synthetic data
# Note: Using the retrieved_content from our previous test as seed content
synthetic_outputs = await simulator(
    target=rag_application_callback,  # Our RAG application function
    text=retrieved_content,           # Seed content from the search index
    num_queries=5,                    # Number of query-response pairs to generate
    max_conversation_turns=1,         # Keep conversations simple (single turn)
)

print(f"‚úÖ Generated {len(synthetic_outputs)} synthetic query-response pairs!")

### 5.2 Save the generated dataset

This code saves the generated synthetic dataset to a JSONL file for use in evaluation workflows. Each line contains a query-response pair with context information.

In [None]:
# 9. Save the generated dataset to file
output_file = Path("lab-simulate-datasets.jsonl")

print(f"üíæ Saving dataset to: {output_file.absolute()}")

# Write each output as a JSON line to the file
with output_file.open("w") as f:
    for output in synthetic_outputs:
        f.write(output.to_eval_qr_json_lines())

print(f"‚úÖ Dataset successfully saved!")
print(f"üìÅ File location: {output_file.absolute()}")
print(f"üìä Total records: {len(synthetic_outputs)}")

### 5.3 Review the generated dataset

This code loads and displays a preview of the generated dataset using pandas to help verify:
- The dataset structure and format
- Sample query-response pairs
- Data quality and relevance

In [None]:
# 10. Preview the generated dataset
import pandas as pd

print("üìã Dataset Preview:")
print("=" * 50)

# Load and display the first few records
try:
    dataset_df = pd.read_json(output_file, lines=True)
    
    # Display basic dataset information
    print(f"Dataset shape: {dataset_df.shape}")
    print(f"Columns: {list(dataset_df.columns)}")
    print()
    
    # Show first few records with limited content for readability
    display_df = dataset_df.head(3).copy()
    
    # Truncate long content for better display
    for col in display_df.columns:
        if display_df[col].dtype == 'object':
            display_df[col] = display_df[col].apply(
                lambda x: str(x)[:100] + "..." if len(str(x)) > 100 else str(x)
            )
    
    display_df
    
except Exception as e:
    print(f"‚ùå Error reading dataset: {e}")
    print("The dataset file may not have been created successfully.")

### 5.4 Review the saved dataset file

Manual inspection step:
- Open the `lab-simulated-datasets.jsonl` file in your Visual Studio Code editor
- Examine the structure of generated {query-response-context} lines 
- Verify that the synthetic data is relevant and useful for evaluation purposes

---

## 6. Next Steps

Now that you have generated a synthetic evaluation dataset, you can:

1. **Evaluate retrieval quality** - Use the dataset to test how well your search index retrieves relevant information
2. **Fine-tune prompts** - Analyze common query patterns to improve your system prompts
3. **Create test cases** - Use generated queries as test cases for your RAG application
4. **Identify improvements** - Analyze the dataset to find areas where your application could be enhanced
5. **Benchmark performance** - Establish baseline metrics for your RAG system using this synthetic data