<a href="https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/demo/navigator/navigator-data-designer-sdk-structured-outputs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🎨 Data Designer SDK: Structured Outputs

Let's explore how to use Data Designer's structured outputs feature to generate complex, nested data structures, with support for both Pydantic and JSON schema definitions.

> **Note:** The [Data Designer](https://docs.gretel.ai/create-synthetic-data/gretel-data-designer-beta) functionality demonstrated in this notebook is currently in **Early Preview**. To access these features and run this notebook, please [join the waitlist](https://gretel.ai/navigator/data-designer#waitlist).

# 📘 Getting Started

First, let's install and import the required packages:

In [None]:
%%capture
%pip install -U gretel_client 

In [None]:
from gretel_client.navigator import DataDesigner

# 🥗 Building a Fruit Salad Generator

To demonstrate structured outputs, we'll create a fruit salad recipe generator. This example showcases how to:
- Handle nested data structures (recipes containing multiple fruits)
- Generate variable-length lists (different numbers of fruits per salad)
- Maintain relationships between components (total cost based on individual fruits)
- Create derivative content (HTML presentations of our recipes)

## 1. Setting Up Data Designer

First, we'll create a Data Designer instance and define our seed data:

In [None]:
## Create our DD Instance
data_designer = DataDesigner(
    api_key="prompt",
    model_suite="apache-2.0",
)

## Generate some regions for our fruit salad recipes
data_designer.add_categorical_seed_column(
    name="region",
    description="Regions of the world with an exciting culinary tradition.",
    values=["Thailand", "France", "South Africa"],
    num_new_values_to_generate=5
)

## 2. Defining Our Data Model

The power of structured outputs comes from defining exact schemas for our generated data. We'll use Pydantic to create our data models:

In [None]:
## Now, we're making a recipe, which is pretty structured.
## So let's give data designer a recipe to follow!

from pydantic import BaseModel, Field

class Fruit(BaseModel):
    name: str = Field(..., description="Name of the fruit.")
    cost: float = Field(..., description="Dollar value of the fruit.")
    weight: float = Field(..., description="Weight in lbs.")
    flavor: str = Field(..., description="Primary flavor profile of the fruit.")
    preparation: str = Field(..., description="How to prepare the fruit for a fruit salad.")


class FruitSalad(BaseModel):
    total_cost: float = Field(..., description="Total cost of all fruits.")
    name: str = Field(..., description="Name of this unique fruit salad.")
    haiku: str = Field(..., description="A beautiful haiku about this fruit salad.")
    ascii_art: str = Field(..., description="A small ASCII art depiction of the fruit salad.")
    fruits: list[Fruit]

Now, we can pass ths Pydantic data model to DataDesigner and have a contract that we'll get back data in the format we specified above (or none!)

Our implementation also permits one to specify [JSON Schema](https://json-schema.org/) directly via any source.


## 3. Generating Structured Data

With our models defined, we can now tell Data Designer exactly what kind of data to generate:

In [None]:
## Tell DD to generate some fruit salads
data_designer.add_generated_data_column(
    name="fruit_salad",
    generation_prompt=(
        "Create a description of fruits to go in a regional fruit salad from {region}!"
    ),
    data_config={"type": "structured", "params": {"model": FruitSalad}}

    ## We also could have initialized from a JSON Schema alone
    # data_config={"type": "structured", "params": {"json_schema": FruitSalad.to_json_schema()}},
)

## 4. Creating Presentation Layer

Finally, we'll generate HTML presentations of our fruit salads, demonstrating how to use structured data in subsequent generation steps:

In [None]:
data_designer.add_generated_data_column(
    name="fruit_salad_html",
    generation_prompt=(
        "<data>\n{fruit_salad}\n</data>\n\n"
        "Given the provided <data>, write a self-contained HTML webpage "
        "which provides all of the provided information. Embed your own CSS into the document.\n"
        "The page and its text should be in a color palette and style matching the national flag of {region}.\n"
        "The page background and the page text should be in contrasting colors.\n"
        "Make sure to structure your fruit information so that the information is displayed clearly, like a table format.\n"
        "Place the haiku and ASCII art side by side and above the info table.\n"
        "ASCII art should be displayed in a <code> block.\n"
        "Use fancy HTML transforms and animations on different elements of the webpage."
    ),
    llm_type="code",
    data_config={"type": "code", "params": {"syntax": "html"}}
)



## 5. Previewing Results

Let's take a look at what we've created:

In [None]:
preview = data_designer.generate_dataset_preview()

In [None]:
from IPython.display import HTML
from itertools import cycle
websites = iter(cycle(preview.dataset["fruit_salad_html"].values))

In [None]:
HTML(next(websites))

In [None]:
preview.display_sample_record()

In [None]:
preview.dataset