In [None]:
# import os
# os.environ["OPENAI_API_KEY"] = "..."

### 🧪 Tutorial: Synthetic Data Generation with Evidently

In this tutorial, we'll explore the new `evidently.llm.datagen` API designed for generating synthetic datasets useful for testing, evaluation, and experimentation with LLMs. You'll see how to generate data using:

1. Few-shot generation
2. RAG (Retrieval-Augmented Generation) approaches
3. Domain-specific generation like code reviews
4. Fully custom templated data pipelines

---

### 🐦 Example 1: Few-Shot Generation for Twitter Posts

In this section, we will demonstrate how to use the `FewShotDatasetGenerator` to create synthetic Twitter-style posts. We'll provide a few example tweets and a user profile, and the generator will produce similar posts.

---

### ⚙️ Construct the Few-Shot Generator

We define the user profile and example messages, then initialize the generator. You can tweak parameters like `count`, `tone`, or `intent` to guide generation style.




In [None]:
from evidently.llm.datagen import UserProfile
from evidently.llm.datagen import FewShotDatasetGenerator


twitter_generator = FewShotDatasetGenerator(
    kind='twitter posts',
    count=2,
    user=UserProfile(
        role="ML engineer",
        intent="user is trying to promote Evidently AI opensource library for llm chatbot testing",
        tone="confident"),
    complexity="medium",
    examples=[
        "CI/CD is as crucial in AI systems as in traditional software. #mlops #cicd",
        "Without test coverage for your data pipelines, you're flying blind.",
        "Monitoring drift isn't a nice-to-have anymore. It's operational hygiene."
    ]
)



### 📄 Preview the Tweet Template

We can inspect the automatically prepared prompt template that will be used to guide generation of each tweet.


In [None]:
twitter_generator.prepared_sample_template

### ✨ Generate the Tweets

Now we trigger the generation of new Twitter posts based on our few-shot prompt and user profile.

In [None]:
twitter_generator.generate()

---

### 🧠 Example 2: RAG-Based Generation for Test User Queries

In this section, we’ll demonstrate how to use a small knowledge base file and the `RagDatasetGenerator` to generate user questions that an LLM could ask, simulating interaction with a booking website.

---

### 📚 Prepare a Sample Knowledge Base

We create a tiny knowledge base that will serve as our source of information for the RAG-based generation. In real scenarios, this could be a product FAQ, policy document, or knowledge article.





In [None]:
# we will use this single text as knowledge base. you can use your own files
example_knowledge_base = """Knowledge Base Entry: Hotel Booking Policies and Procedures

Hotels generally offer two types of booking rates: refundable and non-refundable. Refundable rates allow cancellations or modifications up to 24–48 hours before check-in with no charge, making them ideal for travelers with uncertain plans. Non-refundable rates are typically lower in price but carry a cancellation fee or no refund at all if changes are made.

Check-in time usually begins around 2:00 PM to 3:00 PM, and check-out is expected by 11:00 AM or 12:00 PM. Guests requesting early check-in or late check-out should contact the hotel in advance, as these options may involve additional fees and depend on room availability.

Upon arrival, guests are required to present a valid government-issued photo ID and a credit or debit card. Some hotels may also request a security deposit, which is refundable upon check-out if no damage or extra charges are incurred.

Payment policies vary: for prepaid bookings, the total amount may be charged at the time of reservation, especially for discounted or promotional rates. In other cases, payment is collected at the property during check-in or check-out.

Special requests, such as extra beds, cribs, connecting rooms, allergy-friendly accommodations, or pet-friendly rooms, should be submitted at the time of booking. These are not guaranteed and must be confirmed by the hotel directly.

Some hotels provide complimentary services like Wi-Fi, breakfast, or parking, while others charge extra. Guests should carefully review amenities, location details, and cancellation terms before finalizing the reservation."""

with open("booking_kb.txt", "w") as f:
    f.write(example_knowledge_base)


### 🔍 Initialize the RAG Generator

We load the knowledge base and initialize the `RagDatasetGenerator` with user intent and context. This will generate realistic user queries that reference the provided information.





In [None]:
from evidently.llm.datagen import RagDatasetGenerator
from evidently.llm.rag.index import FileDataCollectionProvider

data = FileDataCollectionProvider(path="booking_kb.txt")
booking_rag = RagDatasetGenerator(
    data,
    count=2,
    include_context=False,
    user=UserProfile(intent="get to know system", role="new user"),
    service="booking website",
)



### 📄 View the Prepared Templates

This shows how the generator structures prompts to generate questions from the knowledge base.

Next, we look at the structure used to generate LLM responses to the generated queries.


In [None]:
booking_rag.prepared_query_template

In [None]:
booking_rag.prepared_response_template

### 💾 Export the Generation Configuration

We export the generation setup to a YAML file so that it can be reused, shared, or version-controlled.


In [None]:
booking_rag.dump("booking_rag.yaml")

### 🧾 Review the YAML File

We check the contents of the generated YAML spec for transparency and reproducibility.

In [None]:
! cat booking_rag.yaml


### 📦 Load From YAML and Generate Data

We load the generation spec from file and run the actual query/response generation pipeline.

In [None]:
booking_rag = RagDatasetGenerator.load("booking_rag.yaml")
booking_rag.generate()

---

### 🧬 Example 3: Custom RAG-Based Generation for Code Reviews

In this example, we’ll generate synthetic code diffs from the Evidently codebase and then simulate realistic code review comments for those diffs.

---

### 📁 Load Python Files as Knowledge Base

We use `FileDataCollectionProvider` to treat Evidently's source `.py` files as a corpus from which we can extract diffs.

### 🧾 Generate Git Diffs

We use a RAG query generator configured to simulate `git diff` entries from the codebase.

In [None]:
import os
from evidently.llm.datagen import RagQueryDatasetGenerator, GenerationSpec
import evidently

data = FileDataCollectionProvider(path=os.path.dirname(evidently.__file__), recursive=True, pattern="*.py")

diff_generator = RagQueryDatasetGenerator(
    data,
    count=2,
    chunks_per_query=1,
    query_spec=GenerationSpec(kind="git diff"),
)
diff_generator.prepared_query_template


In [None]:
git_diffs = diff_generator.generate()
git_diffs


### 🧪 Inspect a Generated Git Diff

Let’s preview one of the generated synthetic diffs that the model will review.


In [None]:
print(git_diffs["queries"][0])

### 🧠 Generate Code Reviews for Diffs

We now pass the synthetic diffs into a new response generator, which produces simulated code review comments using a custom `code review` response spec.

In [None]:

from evidently.llm.datagen import RagResponseDatasetGenerator

code_review_generator = RagResponseDatasetGenerator(
    data,
    query_spec=diff_generator.query_spec,
    response_spec=GenerationSpec(kind="code review"),
    queries=list(git_diffs["queries"]),
)
code_review_generator.prepared_response_template

In [None]:
code_review_generator.generate()

---

### 🤖 Example 4: Custom Personal Assistant Data with Template Blocks

In this final example, we explore full customization — including custom prompt templates, prompt blocks, and user-defined generation specs.

---

### 🧱 Define Custom Prompt Block

We create a fun prompt block that adds flavor to responses by appealing to the user’s mother. This demonstrates how to inject specific motivations, tones, or structural elements into generated prompts.



In [None]:
from evidently.llm.utils.blocks import PromptBlock

class MotherIncentiveBlock(PromptBlock):
    """If you perform {performance}, your mother will be {emotion} with you and give you {reward}."""
    performance: str
    emotion: str
    reward: str

### 🧠 Build a Fully Custom RAG Generator

We define a personal assistant service with user queries and AI responses, using custom templates and additional prompt blocks.


In [None]:
from evidently.llm.datagen import ServiceSpec
from evidently.llm.rag.index import ChunksDataCollectionProvider

data = ChunksDataCollectionProvider(chunks=[
        "this AI personal assistant can help book things, set up reminders, answer stupid emails"
    ])

my_template = """
    Please answer in style of Darth Vader

    {% super() %}
"""

pa_generator = RagDatasetGenerator(
    data,
    count=2,
    query_spec=GenerationSpec(kind="user requests"),
    response_spec=GenerationSpec(kind="AI Personal Assistant responses"),
    query_template=my_template,
    response_template=my_template,
    additional_prompt_blocks=[
      MotherIncentiveBlock(performance="good", emotion="pleased", reward="10$"),
      MotherIncentiveBlock(performance="bad", emotion="displeased", reward="condescending look"),
    ],
    service=ServiceSpec(kind="AI Personal Assistant", purpose="help user solve simple tasks"),
)


### 📄 View Custom Templates

Let’s check how the query prompt looks with the Darth Vader theme and our mother-incentive block added.

Similarly, we inspect how the assistant’s response prompt is structured using the custom blocks.



In [None]:
pa_generator.prepared_query_template

In [None]:
pa_generator.prepared_response_template

### ✨ Generate PA Queries and Responses

Finally, we run the generator to produce synthetic queries and AI responses using our fully customized setup.



In [None]:
pa_queries = pa_generator.generate()


### 📦 View the Output

We preview the generated examples from our personal assistant scenario.

In [None]:
pa_queries

---
---

### ✅ Summary: What We Learned

In this tutorial, we explored the capabilities of the new `evidently.llm.datagen` API for generating high-quality synthetic datasets for testing and evaluation of LLM systems. Here's a recap of the key concepts and tools demonstrated:

#### 🧪 Dataset Generators

* **FewShotDatasetGenerator**: Allows generation based on a few manual examples and a user profile. Ideal for generating social media content, slogans, or short texts.
* **RagDatasetGenerator**: Enables generation grounded in a knowledge base, supporting realistic question/answer generation from documents or FAQs.
* **RagQueryDatasetGenerator / RagResponseDatasetGenerator**: Allow fine-grained control over multi-stage generation workflows, such as producing diffs and corresponding code reviews.

#### 🧩 Core Building Blocks

* **UserProfile and ServiceSpec**: Provide structured user and service descriptions to simulate realistic scenarios.
* **GenerationSpec**: Lets you define the kind of content to generate (e.g., `"git diff"`, `"code review"`, `"AI assistant responses"`).
* **PromptBlock**: Enables reusable components to structure prompts, inject motivations, or define response tone and format.
* **Templates**: Custom Jinja-style templates can be used to control prompt layout and stylistic constraints.

#### 🔧 Use Cases Covered

* Creating LLM-ready Twitter datasets with domain knowledge and tone control.
* Simulating RAG-style user queries and chatbot responses from a knowledge base.
* Generating synthetic developer workflows like diffs and reviews using real code.
* Building end-to-end assistant datasets with templated queries, responses, and structured prompt blocks.

This API gives you composable building blocks for generating test data tailored to your product, domain, and evaluation needs. Whether you’re testing chatbot robustness, fine-tuning with synthetic data, or building eval suites for new LLM apps — `evidently.llm.datagen` provides a flexible, inspectable foundation to start from.

---
