# [Generate synthetic and simulated data for evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/simulator-interaction-data)
**Azure AI Evaluation SDK's** `Simulator` provides an end-to-end synthetic data generation capability to help developers test their application's response to typical user queries in the absence of production data. AI developers can use an index or text-based query generator and fully customizable simulator to create robust test datasets around non-adversarial tasks specific to their application. The `Simulator` class is a powerful tool designed to generate synthetic conversations and simulate task-based interactions. This capability is useful for:
- **Testing Conversational Applications**: Ensure your chatbots and virtual assistants respond accurately under various scenarios.
- **Training AI Models**: Generate diverse datasets to train and fine-tune machine learning models.
- **Generating Datasets**: Create extensive conversation logs for analysis and development purposes.
By automating the creation of synthetic data, the Simulator class helps streamline the development and testing processes, ensuring your applications are robust and reliable.
<br/>
By automating the creation of synthetic data, the `Simulator` class helps streamline the development and testing processes, ensuring your applications are robust and reliable.

In [None]:
# !az login

In [None]:
# Constants and Libraries
import os, json
from azure.identity import DefaultAzureCredential, get_bearer_token_provider #requires azure-identity
from pprint import pprint
from dotenv import load_dotenv # requires python-dotenv
from typing import List, Dict, Any, Optional
from promptflow.client import load_flow
from pprint import pprint

if not load_dotenv("./../../config/credentials_my.env"):
    print("Environment variables not loaded, cell execution stopped")
    sys.exit()
os.environ["AZURE_OPENAI_API_VERSION"] = os.environ["OPENAI_API_VERSION"]

credential = DefaultAzureCredential()

In [None]:
# Initialize Azure OpenAI connection

model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
    "azure_deployment": os.environ.get("MODEL_DEPLOYMENT_NAME"),
    "api_version": os.environ.get("AZURE_OPENAI_API_VERSION"),
    "type": "AzureOpenAI" # NEEDED FOR \Lib\site-packages\promptflow\core\_prompty_utils.py
}

## Specify application Prompty
The following `application.prompty` file specifies how a chat application behaves.

In [None]:
%%writefile ./eval_assets/application.prompty
---
name: ApplicationPrompty
description: Chat RAG application
model:
    api: chat
    parameters:
        temperature: 0.0
        top_p: 1.0
        presence_penalty: 0
        frequency_penalty: 0
        response_format:
            type: text
 
inputs:
    context:
        type: string
    query:
        type: string
    conversation_history:
        type: dict
---
system:
You are a helpful assistant and you're helping with the user's query. 
Keep the conversation engaging and interesting.

Keep your conversation grounded in the provided context: 
{{ context }}

Output with a string that continues the conversation, responding to 
the latest message from the user query using it same language:
{{ query }}

given the conversation history:
{{ conversation_history }}

In [None]:
prompty_path = "eval_assets/application.prompty"
flow = load_flow(source=prompty_path, model={"configuration": model_config})

pprint(flow(context="", query="Come preparare una pizza fatta in casa", conversation_history=[]))

In [None]:
context = "Conversazione amichevole fra due persone"

conversation_history = [
    {
        "content": "Cosa fai di bello questa sera?",
        "role": "user",
        "context": "Domanda amichevole",
    },
    {
        "content": "Vado al cinema",
        "role": "assistant",
        "context": "Esposizione di un'attività ricreativa divertente",
    },
    {
        "content": "Ah fantastico! Che cosa vai a vedere?",
        "role": "user",
        "context": "Ulteriore domanda amichevole a dimostrare interesse verso l'interlocutore",
    },
    {
        "content": "Lo SQUALO di Steven Spielberg",
        "role": "assistant",
        "context": "Informazione specifica di un'attività divertente che si intende compiere",
    },
]

flow(context=conversation_history[-1]["context"], query=conversation_history[-1]["content"], conversation_history=conversation_history)

In [None]:
i = 0
for ch in conversation_history:
    print(f'Message {i} - {ch["role"]}: {ch["content"]}\n')
    i += 1
    
response = flow(context=conversation_history[-1]["context"], query=conversation_history[-1]["content"], conversation_history=conversation_history)

conversation_history.append({"content": response, "role": "user" if conversation_history[-1]["role"]=="assistant" else "assistant", "context": ""})
print(f'Message {i} - {conversation_history[-1]["role"]}: {conversation_history[-1]["content"]}\n')

## Specify target callback to simulate against
You can bring any application endpoint to simulate against by specifying a target callback function such as the following given an application that is an LLM with a Prompty file like `application.prompty`

In [None]:
async def callback(
    messages: Dict,
    stream: bool = False,
    session_state: Any = None,  # noqa: ANN401
    context: Optional[Dict[str, Any]] = None,
    subfolder: str = "eval_assets",
) -> dict:
    messages_list = messages["messages"]
    latest_message = messages_list[-1] # Get the last message
    query = latest_message["content"]
    context = latest_message.get("context", None) # looks for context, default None
    # Call your endpoint or AI application here
    prompty_path = os.path.join(os.getcwd(), subfolder, "application.prompty") 
    _flow = load_flow(source=prompty_path, model={"configuration": model_config})
    response = _flow(query=query, context=context, conversation_history=messages_list)
    # Format the response to follow the OpenAI chat protocol
    formatted_response = {
        "content": response,
        "role": "user" if conversation_history[-1]["role"]=="assistant" else "assistant",
        "context": context,
    }
    messages["messages"].append(formatted_response)
    return {
        "messages": messages["messages"],
        "stream": stream,
        "session_state": session_state,
        "context": context
    }

## Generate text or index-based synthetic data as input
In the first part, we prepare the text for generating the input to our simulator

In [None]:
import asyncio
from azure.identity import DefaultAzureCredential
import wikipedia
import os
# Prepare the text to send to the simulator
wiki_search_term = "Leonardo da Vinci"
wiki_title = wikipedia.search(wiki_search_term)[0]
wiki_page = wikipedia.page(wiki_title)
wiki_text = wiki_page.summary[:5000]
print(f"{wiki_text[:100]}...")

In [None]:
from azure.ai.evaluation.simulator import Simulator
simulator = Simulator(model_config=model_config)

In [None]:
outputs = await simulator(
    num_queries = 1,  # Number of queries
    text        = wiki_text,
    target      = callback   
)

In [None]:
outputs

In [None]:
i = 0
for ch in outputs[0]["messages"]:
    print(f'Message {i} - {ch["role"]}: {ch["content"]}\n')
    i += 1