# Getting Started with Data Designer


Welcome! In this guide, we’ll walk through how to use the SDK to generate rich, diverse datasets — from designing your columns to injecting variability and logic into your data.

### 🧙 Magic vs 🛠 Manual Usage

This guide supports two usage styles:

- **Magic Mode** 🪄  
  Enlist the help of an LLM to automatically generate your dataset configuration. Perfect for quick starts and exploring schema ideas.

- **Manual Mode** 🧩  
  Take full control and define your configuration by hand. Ideal when you want precision and complete customization.

---

### 🔁 Generation Methods

Our SDK supports **three flexible generation methods**:

- **Sampling** 🎲  
  Generate values using internal distributions and reference datasets to enforce realistic, balanced distributions.

- **LLM-generated** 🤖  
  Leverage large language models to create contextual data — such as natural language, code snippets, or structured text.

- **Seeded** 🌱  
  Provide your own seed dataset and sample from it to produce similar or derivative outputs.

---

### 🧠 Expression-Based Columns with Jinja

For more advanced control, you can define **expression-based columns** using Jinja templating. This unlocks:

- 🔀 Conditional logic (`if`, `else`, `for`)
- 🔗 Cross-column references (`{{ column_name }}`)
- ➕ Basic arithmetic and transformations

This lets you dynamically shape values based on other columns and inject logic directly into your data schema.

---

Let’s dive in and start building your dataset! 💡


## Installation and Setup

In [None]:
%%capture
!pip install -U git+https://github.com/gretelai/gretel-python-client networkx datasets

In [None]:
# Import
import time
import pandas as pd

from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P


## Create Gretel Client
gretel = Gretel(
    api_key="prompt",
    endpoint="https://api.dev.gretel.ai"
)


model_suite="apache-2.0"
dd = gretel.data_designer.new(model_suite=model_suite) # we will be building on top of this object


Gretel API Key: ··········
Logged in as eric.phamhung@gretel.ai ✅


INFO:gretel_client.navigator_client:Gretel client configured to use project: proj_2umUWllmbkP1S7Y98Z2k7E07t98


# 🌱 Beginner: Kickstart with Magic + Seeded Data

In this section, we’re keeping things simple and powerful. You’ll learn how to get up and running quickly by combining:

- **Magic Mode** 🪄 — Let an LLM generate your dataset configuration for you, based on a few high-level inputs.
- **Seeded Generation** 🌾 — Provide a sample dataset, and we’ll draw from it to create new rows with similar structure and variety.

This is a great way to start experimenting with the SDK while getting meaningful output fast. You don’t need to worry about crafting configs by hand just yet — we’ll guide you through using your own data and a touch of AI magic to get impressive results with minimal effort.

📦 By the end of this section, you’ll have a generated dataset built from your seed data and enriched through a config designed by the LLM.

Let’s dive in! 🔍


In [None]:
# @title
# Generated using RNG with add_sampling_column function
dd.magic.add_sampling_column(
    name = "patient",
    description = "An american male living in San Diego",
    interactive=False,
    preview=True
)

# Sampling based off of a distribution
dd.magic.add_sampling_column(
    name = "bmi",
    description = "body mass index of an average person",
    interactive=False,
    preview=True
)

Output()

[20:24:15] [INFO] 🚀 Generating preview


Output()

[20:24:16] [INFO] 🦜 Step 1: Generate columns using samplers
[20:24:20] [INFO] 🎉 Your dataset preview is ready!


Output()

[20:24:22] [INFO] 🚀 Generating preview


Output()

[20:24:23] [INFO] 🦜 Step 1: Generate columns using samplers
[20:24:27] [INFO] 🎉 Your dataset preview is ready!


In [None]:
#Adding Seed Data

# Load the seed dataset

from datasets import load_dataset
df_seed = load_dataset("gretelai/symptom_to_diagnosis")["train"].to_pandas()
df_seed = df_seed.rename(columns={"output_text": "diagnosis", "input_text": "patient_summary"})

print(f"Number of records: {len(df_seed)}")



df_seed.head(n=3)

Number of records: 853


Unnamed: 0,diagnosis,patient_summary
0,cervical spondylosis,I've been having a lot of pain in my neck and ...
1,impetigo,I have a rash on my face that is getting worse...
2,urinary tract infection,I have been urinating blood. I sometimes feel ...


In [None]:
# Add seed columns

dd.with_seed_dataset(
        df_seed,
        sampling_strategy="shuffle", #what are the options?
        with_replacement=False,
    )

[20:24:30] [INFO] 🌱 Using seed dataset with file ID: file_3b58b60fe8c744f7b74f4be36f6239a2


In [None]:
dd.preview().dataset.df

[20:18:25] [INFO] 🚀 Generating preview


  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='person', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='gaussian', input_type=str])
  return self.__pydantic_serializer__.to_python(


[20:18:27] [INFO] 🎲 Step 1: Sample from dataset
[20:18:29] [INFO] 🦜 Step 2: Generate columns using samplers
[20:18:33] [INFO] 🔗 Step 3: Concat datasets
[20:18:33] [INFO] 🎉 Your dataset preview is ready!


Unnamed: 0,diagnosis,patient_summary,patient,bmi
0,drug reaction,I have a fever and I feel really lightheaded a...,"{'first_name': 'James', 'middle_name': 'Lou', ...",29.821963
1,diabetes,"I've been feeling very thirsty lately, and I'v...","{'first_name': 'Jose', 'middle_name': 'Alexand...",15.861001
2,malaria,"I have a high fever, chills, nausea, and a hea...","{'first_name': 'Winston', 'middle_name': 'Hern...",25.304261
3,gastroesophageal reflux disease,"I have a burning sensation in my throat, espec...","{'first_name': 'Phong', 'middle_name': 'T', 'l...",25.276754
4,common cold,I have a really bad cold. I'm coughing up a st...,"{'first_name': 'Tarran', 'middle_name': 'Alan'...",18.163104
5,pneumonia,"I've been having a fever for a few days, and i...","{'first_name': 'Marcos', 'middle_name': '', 'l...",30.2641
6,bronchial asthma,"I've been feeling really tired and weak, and I...","{'first_name': 'Dwyatt', 'middle_name': 'O', '...",35.002956
7,peptic ulcer disease,I've lost a lot of blood and iron because of m...,"{'first_name': 'Paul', 'middle_name': 'Lamar',...",34.591844
8,pneumonia,I've been feeling really sick lately. I've bee...,"{'first_name': 'Martin', 'middle_name': 'L', '...",34.647329
9,cervical spondylosis,I have been having a lot of pain in my back an...,"{'first_name': 'Michael', 'middle_name': '', '...",25.513413


In [None]:
# Add an LLM-generated column. This requires atleast one non-LLM generated column (either through seed or sampling)

dd.magic.add_column(
    name = "api_response",
    description = "An API response that updates a database with with a generated UUID and the diagnosis column",
    must_depend_on=["diagnosis", "patient"],
    interactive=True, #give feedback
    preview=True
)

Output()

[20:24:37] [INFO] 🚀 Generating preview


Output()

[20:24:38] [INFO] 🎲 Step 1: Sample from dataset
[20:24:40] [INFO] 🦜 Step 2: Generate columns using samplers
[20:24:42] [INFO] 🔗 Step 3: Concat datasets
[20:24:42] [INFO] 🦜 Step 4: Generate column from template
[20:24:51] [INFO] 🎉 Your dataset preview is ready!


retry


Output()

[20:25:25] [INFO] 🚀 Generating preview


Output()

[20:25:26] [INFO] 🎲 Step 1: Sample from dataset
[20:25:28] [INFO] 🦜 Step 2: Generate columns using samplers
[20:25:31] [INFO] 🔗 Step 3: Concat datasets
[20:25:31] [INFO] 🦜 Step 4: Generate column from template


no, provide only a structured output of the uuid and the diagnosis


Output()

[20:26:27] [INFO] 🚀 Generating preview


Output()

[20:26:28] [INFO] 🎲 Step 1: Sample from dataset
[20:26:29] [INFO] 🦜 Step 2: Generate columns using samplers
[20:26:32] [INFO] 🔗 Step 3: Concat datasets
[20:26:32] [INFO] 🦜 Step 4: Generate column from template
[20:26:36] [INFO] 🎉 Your dataset preview is ready!


accept


In [None]:
# # display sample record
# dd.preview().display_sample_record()

# display sample df
dd.preview().dataset.df

[20:28:08] [INFO] 🚀 Generating preview
[20:28:10] [INFO] 🎲 Step 1: Sample from dataset
[20:28:11] [INFO] 🦜 Step 2: Generate columns using samplers
[20:28:14] [INFO] 🔗 Step 3: Concat datasets
[20:28:14] [INFO] 🦜 Step 4: Generate column from template
[20:28:18] [INFO] 🎉 Your dataset preview is ready!


Unnamed: 0,diagnosis,patient_summary,patient,bmi,api_response
0,diabetes,I have to pee a lot and I'm always hungry. I g...,"{'first_name': 'Devonte', 'middle_name': 'G', ...",26.55983,"{""uuid"": ""d294698a-1ba5-453b-a749-58e3b52ec6ef..."
1,gastroesophageal reflux disease,I belch and burp a lot. I get chest pain that ...,"{'first_name': 'Luis', 'middle_name': '', 'las...",25.697135,"{""uuid"": ""d5b4bbd6-6211-4ef9-bb49-0eb5ea013438..."
2,migraine,I've been feeling really grumpy and gloomy lat...,"{'first_name': 'Roberto', 'middle_name': 'Anto...",28.926699,"{""uuid"": ""baca48d6-8bb0-4d78-bd36-ba141a47daf6..."
3,common cold,I've been feeling really sick and exhausted la...,"{'first_name': 'Xavier', 'middle_name': 'L', '...",25.331917,"{""uuid"": ""bdae1d93-7cef-443c-af4a-defb05802e7e..."
4,urinary tract infection,I have been experiencing a burning sensation w...,"{'first_name': 'Oscar', 'middle_name': 'Ivan',...",17.230608,"{""uuid"": ""cfdc3c32-1fa5-41c9-892c-cc15d8f1561e..."
5,typhoid,I have been having diarrhea for a few days now...,"{'first_name': 'Jayson', 'middle_name': 'R', '...",22.138409,"{""uuid"": ""5764db4c-a3ba-406b-a19a-438ef7213b04..."
6,impetigo,I have a rash around my nose and lips that is ...,"{'first_name': 'David', 'middle_name': 'Leroy'...",31.562063,"{""uuid"": ""bf327a4a-4045-40aa-bf7b-66c75dd8115c..."
7,diabetes,I have a dry cough that just won't go away. I ...,"{'first_name': 'Hugo', 'middle_name': 'T', 'la...",16.66077,"{""uuid"": ""da88e9ee-78d0-4f00-b240-d1df09e99e5c..."
8,fungal infection,I have a rash on my arms and legs that is itch...,"{'first_name': 'Rafael', 'middle_name': '', 'l...",28.284006,"{""uuid"": ""1bc60ef3-7c53-4317-a3a1-0a5fe9845f2a..."
9,common cold,I've been feeling really tired and sick. My th...,"{'first_name': 'Omar', 'middle_name': '', 'las...",27.159849,"{""uuid"": ""f918a475-9c8b-4f1f-ba9f-f3abb424e2c0..."


# 🧪 Intermediate: Manual Configuration, Sampling, Jinja & LLM Generation

Now that you’ve gotten a feel for the basics, it’s time to roll up our sleeves and explore more of what the SDK can do.

In this section, we’ll move beyond Magic Mode and **manually create a configuration** to define our dataset structure. Here’s what you’ll be working with:

- **📊 Sampling Columns**  
  Learn how to:
  - Sample values from built-in or custom datasets
  - Use statistical distributions (like normal, uniform, categorical) to shape your data

- **🧠 Expression Columns with Jinja**  
  Add powerful logic and dynamic behavior to your data by:
  - Referencing values from other columns (`{{ other_column }}`)
  - Using conditional logic (`if`, `else`, `for`)
  - Performing basic arithmetic (`+`, `-`, `*`, `/`)

- **🤖 LLM-Based Columns**  
  Inject creativity and contextual intelligence by leveraging an LLM to generate column values like text, summaries, or even simple code.

This section will give you the tools to build smart, expressive datasets that go beyond static values — all by hand. It’s a perfect step toward mastering the full flexibility of the SDK.

Ready to get your hands dirty? 🛠 Let’s go!


#### 👩‍🚀 Person Attributes

| Field Name | Type | Default | Alias | Description |
|------------|------|---------|-------|-------------|
| first_name | str | Required | | Person's first name |
| middle_name | str \| None | Required | | Person's middle name (optional) |
| last_name | str | Required | | Person's last name |
| sex | SexT | Required | | Person's sex (enum type) |
| age | int | Required | | Person's age |
| postcode | str | Required | zipcode | Postal/ZIP code |
| street_number | int \| str | Required | | Street number (can be numeric or alphanumeric) |
| street_name | str | Required | | Name of the street |
| unit | str | Required | | Unit/apartment number |
| city | str | Required | | City name |
| region | str \| None | Required | state | Region/state (optional) |
| district | str \| None | Required | county | District/county (optional) |
| country | str | Required | | Country name |
| ethnic_background | str \| None | Required | | Ethnic background (optional) |
| marital_status | str \| None | Required | | Marital status (optional) |
| education_level | str \| None | Required | | Education level (optional) |
| bachelors_field | str \| None | Required | | Field of bachelor's degree (optional) |
| occupation | str \| None | Required | | Occupation (optional) |
| uuid | UUID | Required | | Unique identifier |
| locale | str | "en_US" | | Locale setting |
| phone_number | PhoneNumber \| None | Computed | | Generated phone number based on location (None for age < 18) |
| email_address | EmailStr \| None | Computed | | Generated email address (None for age < 18) |
| birth_date | date | Computed | | Calculated birth date based on age |
| national_id | str \| None | Computed | | National ID (SSN for US locale) |
| ssn | str \| None | Alias to national_id | | Alias for national_id |

In [None]:
# Add in sampled columns
dd.add_column(
    C.SamplerColumn(
        name="emergency_contact",
        type=P.SamplingSourceType.PERSON,
        params=P.PersonSamplerParams(locale='en_US',
                                     sex='Female',
                                     city='San Diego')
    )
)


# do vector weights need to sum to 1?
# what happens if sampling category set dont match with subcategory?
# what is scipy distribution?

dd.add_column(
    C.SamplerColumn(
        name="pet_type",
        type=P.SamplingSourceType.CATEGORY,
        params=P.CategorySamplerParams(values=["dog", "cat"],
                                       weights=[0.7, 0.3]),
    )
)

dd.add_column(
    C.SamplerColumn(
        name="first_pet_name",
        type=P.SamplingSourceType.SUBCATEGORY,
        params=P.SubcategoryParams(
            category="pet_type",
            values={
                "dog": ["Buddy", "Max", "Charlie", "Cooper", "Daisy", "Lucy"],
                "cat": ["Oliver", "Leo", "Milo", "Charlie", "Simba", "Luna"],

            }
        )
    )
)

# Sampling from a distribution (e.g. Bernoulli, binomial, poisson, etc..)
dd.add_column(
    C.SamplerColumn(
        name="household_income",
        type=P.SamplingSourceType.POISSON,
        params=P.PoissonSamplerParams(mean=100000)
    )
)

In [None]:
# Expression
# https://documentation.bloomreach.com/engagement/docs/jinja-syntax

# Referring to existing columns
dd.add_column(
    C.ExpressionColumn(
        name="patient_full_name",
        expr="{{ patient.first_name }} {{ patient.last_name }}"
    )
)

# Deterministically determine outcome based on arithmetic
dd.add_column(
    C.ExpressionColumn(
        name="net_worth",
        expr="{{ household_income - 50000}}"
    )
)


# Conditionally generate the values using expressions based on jinja templating
dd.add_column(
    C.ExpressionColumn(
        name="number_of_children",
        expr="{% if household_income > 100000 %}{{2}}{% else %}1{% endif %}"
    )
)


In [None]:
dd.preview().dataset.df

[20:38:57] [INFO] 🚀 Generating preview


  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='person', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='gaussian', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='person', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='category', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='subcategory', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='poisson', input_type=str])
  return self.__pydantic_serializer__.to_python(


[20:39:00] [INFO] 🎲 Step 1: Sample from dataset
[20:39:01] [INFO] 🦜 Step 2: Generate columns using samplers
[20:39:06] [INFO] 🔗 Step 3: Concat datasets
[20:39:06] [INFO] 🦜 Step 4: Generate column from template
[20:39:11] [INFO] 🦜 Step 5: Generate column from expression
[20:39:11] [INFO] 🦜 Step 6: Generate column from expression 1
[20:39:12] [INFO] 🦜 Step 7: Generate column from expression 2
[20:39:12] [INFO] 🎉 Your dataset preview is ready!


Unnamed: 0,diagnosis,patient_summary,patient,bmi,emergency_contact,pet_type,first_pet_name,household_income,api_response,patient_full_name,net_worth,number_of_children
0,impetigo,I have a sore on my nose that has crusted over...,"{'first_name': 'Scottie', 'middle_name': 'G', ...",23.548156,"{'first_name': 'Kyung', 'middle_name': 'Lin', ...",dog,Cooper,99731,"{""uuid"": ""bf5b5520-20d6-4a41-b759-c74fa80862ee...",Scottie Cabrera,49731,1
1,fungal infection,I've been having a lot of itching and rashes o...,"{'first_name': 'Paul', 'middle_name': 'Joseph'...",26.000753,"{'first_name': 'Lilian', 'middle_name': '', 'l...",dog,Cooper,100620,"{""uuid"": ""c7148fcd-d195-47eb-b7ce-ba3ae4d9a153...",Paul Wood,50620,2
2,urinary tract infection,I have been getting blood in my pee. Sometimes...,"{'first_name': 'Gregory', 'middle_name': 'Scot...",24.913054,"{'first_name': 'Nancy', 'middle_name': 'Elizab...",dog,Cooper,99921,"{""uuid"": ""601f5b8f-3f4d-4bc8-8a6c-0d796e5c9b10...",Gregory Thompson,49921,1
3,diabetes,I have a wound that doesn't seem to heal. My h...,"{'first_name': 'John', 'middle_name': 'M', 'la...",21.416606,"{'first_name': 'Maria', 'middle_name': '', 'la...",cat,Milo,100222,"{""uuid"": ""ee73f89a-8c12-4c4f-ad7f-05afb6b5486d...",John Batchelor,50222,2
4,fungal infection,I've been having these red rashes that keep po...,"{'first_name': 'Michael', 'middle_name': 'Trav...",21.240915,"{'first_name': 'Veronica', 'middle_name': 'Gab...",dog,Buddy,100184,"{""uuid"": ""aaeca475-df97-4084-846a-95ba4308644f...",Michael Gosnell,50184,2
5,allergy,I've been feeling sick for a few days. I have ...,"{'first_name': 'Robert', 'middle_name': 'Lowel...",28.676062,"{'first_name': 'Charlotte', 'middle_name': 'Re...",cat,Simba,100193,"{""uuid"": ""4e02ab70-9438-4308-8484-631b52e2831a...",Robert Stlouis,50193,2
6,jaundice,I have been feeling really sick lately. I have...,"{'first_name': 'Luis', 'middle_name': 'Pablo',...",21.344636,"{'first_name': 'Ana', 'middle_name': '', 'last...",dog,Lucy,99887,"{""uuid"": ""a35667d0-e827-4823-8c26-88f8317a21c0...",Luis Bianco,49887,1
7,common cold,I have a really bad cold. My nose is running a...,"{'first_name': 'Cameron', 'middle_name': 'Fran...",27.293993,"{'first_name': 'Gena', 'middle_name': '', 'las...",cat,Leo,100006,"{""uuid"": ""ce843d7b-a5e4-4e6f-b8a6-8b65b2084c09...",Cameron Miranda,50006,2
8,jaundice,I have been feeling really sick lately. I have...,"{'first_name': 'Kang', 'middle_name': 'Xu', 'l...",19.630156,"{'first_name': 'Trang', 'middle_name': 'Itty',...",dog,Max,99675,"{""uuid"": ""51ed00b9-8e89-44d7-8ef3-3dad814f7604...",Kang Yang,49675,1
9,bronchial asthma,"I've been coughing a lot, and I'm really tired...","{'first_name': 'Marquis', 'middle_name': 'Doug...",25.748421,"{'first_name': 'Cres', 'middle_name': 'V', 'la...",dog,Max,100312,"{""uuid"": ""55961330-3acd-4b4c-b8b7-8b5e0f832137...",Marquis Jackson,50312,2


In [None]:
## Add in LLM generated Column

dd.add_column(
    C.LLMGenColumn(
        name="potential_cause",
        prompt=(
            "Write a brief backstory for how {{ patient }} got {{ diagnosis }}."
            "Ensure it is consistent with {{patient_summary}}."
            "Make it no more than 2 sentences."
        ),
    )

)

In [None]:
dd.preview().dataset.df

[20:42:29] [INFO] 🚀 Generating preview


  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='person', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='gaussian', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='person', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='category', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='subcategory', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='poisson', input_type=str])
  return self.__pydantic_serializer__.to_python(


[20:42:31] [INFO] 🎲 Step 1: Sample from dataset
[20:42:33] [INFO] 🦜 Step 2: Generate columns using samplers
[20:42:38] [INFO] 🔗 Step 3: Concat datasets
[20:42:39] [INFO] 🦜 Step 4: Generate column from template
[20:42:42] [INFO] 🦜 Step 5: Generate column from template 1
[20:42:46] [INFO] 🦜 Step 6: Generate column from expression
[20:42:47] [INFO] 🦜 Step 7: Generate column from expression 1
[20:42:47] [INFO] 🦜 Step 8: Generate column from expression 2
[20:42:48] [INFO] 🎉 Your dataset preview is ready!


Unnamed: 0,diagnosis,patient_summary,patient,bmi,emergency_contact,pet_type,first_pet_name,household_income,api_response,potential_cause,patient_full_name,net_worth,number_of_children
0,cervical spondylosis,I've been having trouble with my neck. It's be...,"{'first_name': 'Pedro', 'middle_name': '', 'la...",28.229593,"{'first_name': 'Maria', 'middle_name': '', 'la...",dog,Charlie,100080,"{""uuid"": ""7abe5a42-f041-4aca-87db-013e77ad2bca...","Pedro Jaquez, a 9-year-old boy living in San D...",Pedro Jaquez,50080,2
1,fungal infection,I have a rash on my skin that is itchy and has...,"{'first_name': 'Dan', 'middle_name': 'L', 'las...",22.177708,"{'first_name': 'Lydia', 'middle_name': '', 'la...",dog,Lucy,99691,"{""uuid"": ""38f78671-43e0-44b0-bbce-6f01701f5aac...","Dan Griffin, a dedicated registered nurse livi...",Dan Griffin,49691,1
2,cervical spondylosis,I have been having a lot of pain in my back an...,"{'first_name': 'Justin', 'middle_name': '', 'l...",20.837955,"{'first_name': 'Anacristina', 'middle_name': '...",dog,Lucy,100073,"{""uuid"": ""72f6ca37-436d-431c-b0e0-ffd3eb62a485...","Justin Diaz, an 88-year-old Mexican-American c...",Justin Diaz,50073,2
3,malaria,I've been feeling really unwell and have been ...,"{'first_name': 'William', 'middle_name': 'Cree...",25.98385,"{'first_name': 'Alice', 'middle_name': 'Grenel...",dog,Buddy,100802,"{""uuid"": ""69755ee3-d01a-465c-9963-921c296efdc4...","William Holland, an 18-year-old automotive ser...",William Holland,50802,2
4,jaundice,I've been feeling really sick and tired. I've ...,"{'first_name': 'Juan', 'middle_name': 'Mendoza...",29.49322,"{'first_name': 'Adriana', 'middle_name': '', '...",cat,Luna,100307,"{""uuid"": ""03693edc-d267-4017-a1e2-2997ce2305c3...","Juan Mendez, a 48-year-old janitor living in S...",Juan Mendez,50307,2
5,hypertension,I've been having a headache and chest pain for...,"{'first_name': 'Luis', 'middle_name': 'Pablo',...",25.886988,"{'first_name': 'Nargis', 'middle_name': 'Cathe...",cat,Oliver,99918,"{""uuid"": ""a35667d0-e827-4823-8c26-88f8317a21c0...","Luis Bianco, an 80-year-old Mexican-American t...",Luis Bianco,49918,1
6,urinary tract infection,I've had a low grade fever for the past four d...,"{'first_name': 'Kevin', 'middle_name': '', 'la...",27.277227,"{'first_name': 'Gloria', 'middle_name': '', 'l...",cat,Leo,100368,"{""uuid"": ""ca08a047-3ac7-4883-91df-e71e39c2e158...","Kevin Hernandez, an 81-year-old Mexican-Americ...",Kevin Hernandez,50368,2
7,cervical spondylosis,I've been having a lot of pain in my neck and ...,"{'first_name': 'Rafael', 'middle_name': 'Carlo...",30.019385,"{'first_name': 'Hue', 'middle_name': '', 'last...",dog,Daisy,100446,"{""uuid"": ""fdd5174e-0baf-45b0-9f3b-53587beef104...","Rafael Carlos Sanchez, a 35-year-old civil eng...",Rafael Sanchez,50446,2
8,typhoid,I've been having terrible stomach pains and I'...,"{'first_name': 'David', 'middle_name': 'Roger'...",23.492612,"{'first_name': 'Maria', 'middle_name': 'Beatri...",dog,Charlie,99889,"{""uuid"": ""64af935e-a98f-47b5-8cd2-a985bd750568...","David Mallon, a 20-year-old manager living in ...",David Mallon,49889,1
9,jaundice,I've been feeling really sick. I've had a feve...,"{'first_name': 'Joel', 'middle_name': 'M', 'la...",27.464196,"{'first_name': 'Kayleen', 'middle_name': '', '...",dog,Lucy,99625,"{""uuid"": ""6363dfcb-07da-4170-91be-1b4486973369...","Joel Janssen, a 17-year-old living in San Dieg...",Joel Janssen,49625,1


# 🧠 Advanced: Bringing It All Together with Complex LLM Prompts

Welcome to the final section — time to go full wizard mode. 🧙‍♂️

Here, we’ll combine everything you've learned so far into a **powerful, LLM-driven workflow**. Instead of manually configuring each column or relying solely on basic sampling, we’ll focus on crafting **rich, detailed prompts** that instruct the LLM to generate entire datasets with structure, logic, and nuance.

In this section, you’ll:

- 🧩 Bring together **Jinja expressions**, and **LLM columns** into a single cohesive config
- ✨ Design and refine complex LLM prompts to guide dataset creation with high fidelity and variability
- 🧠 Leverage the LLM’s contextual understanding to generate multi-column data with realistic relationships and patterns

This approach is ideal when you want to prototype ideas, simulate user behavior, generate synthetic logs, or create diverse, semi-structured content at scale.

By the end of this section, you’ll be able to use the SDK as a creative tool — part data engine, part storytelling assistant.

Let’s take it to the next level. 🚀


In [None]:
## Add in LLM generated Column

dd.add_column(
    C.LLMGenColumn(
        name="outcome",
        prompt=(
            "Write a breif outcome for what {{patient}} will result in. No more than 1 sentence."
            "{% if bmi > 25 %}"
            "They will have a negative outcome unless they change their lifestyle"
            "They will need the support of {{emergency_contact.first_name}}"
            "{% else %}"
            "They must start spending {{net_worth * 0.5}} dollars on a treatment plan."
            "{% endif %}"
        ),
    )

)

In [None]:
dd.preview().dataset.df

[20:45:22] [INFO] 🚀 Generating preview


  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='person', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='gaussian', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='person', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='category', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='subcategory', input_type=str])
  PydanticSerializationUnexpectedValue(Expected `enum` - serialized value may not be as expected [input_value='poisson', input_type=str])
  return self.__pydantic_serializer__.to_python(


[20:45:24] [INFO] 🎲 Step 1: Sample from dataset
[20:45:25] [INFO] 🦜 Step 2: Generate columns using samplers
[20:45:30] [INFO] 🔗 Step 3: Concat datasets
[20:45:31] [INFO] 🦜 Step 4: Generate column from template
[20:45:34] [INFO] 🦜 Step 5: Generate column from template 1
[20:45:38] [INFO] 🦜 Step 6: Generate column from expression
[20:45:39] [INFO] 🦜 Step 7: Generate column from expression 1
[20:45:39] [INFO] 🦜 Step 8: Generate column from expression 2
[20:45:40] [INFO] 🦜 Step 9: Generate column from template 2
[20:45:43] [INFO] 🎉 Your dataset preview is ready!


Unnamed: 0,diagnosis,patient_summary,patient,bmi,emergency_contact,pet_type,first_pet_name,household_income,api_response,potential_cause,net_worth,patient_full_name,number_of_children,outcome
0,urinary tract infection,"I have to go to the bathroom a lot, but only a...","{'first_name': 'Ronnie', 'middle_name': 'J', '...",22.661263,"{'first_name': 'Rebecca', 'middle_name': '', '...",dog,Buddy,99760,"{""uuid"": ""261d3122-b936-4c76-9e6c-223d39a39139...","Ronnie Haynes, a 29-year-old manager living in...",49760,Ronnie Haynes,1,Ronnie J. Haynes will begin a treatment plan c...
1,hypertension,"I've been having chest pain, dizziness, and a ...","{'first_name': 'Denno', 'middle_name': '', 'la...",27.589675,"{'first_name': 'Kimberly', 'middle_name': 'C',...",cat,Luna,99984,"{""uuid"": ""666592f9-0f00-4fdf-9d98-23f251c31509...","Denno Fong, a 27-year-old sales representative...",49984,Denno Fong,1,"Denno Fong, a 27-year-old sales representative..."
2,dengue,I have a high fever and a severe headache. My ...,"{'first_name': 'Steven', 'middle_name': 'Clift...",26.829031,"{'first_name': 'Estrella', 'middle_name': 'Phu...",cat,Milo,100292,"{""uuid"": ""cadbf75b-5ae4-4eb6-9476-d002be49a49c...","Steven Householder, a 1-year-old infant living...",50292,Steven Householder,2,"Based on the provided information, Steven Clif..."
3,peptic ulcer disease,I have a gnawing hunger and apetite. Sometimes...,"{'first_name': 'David', 'middle_name': 'Joseph...",23.834832,"{'first_name': 'Yovani', 'middle_name': '', 'l...",cat,Leo,100010,"{""uuid"": ""71b0ed2f-a411-4412-9cb4-f904ca4e2677...","David Balsley, a dedicated food service manage...",50010,David Balsley,2,"David Joseph Balsley, a 55-year-old married fo..."
4,drug reaction,I feel really sick when I have a fever. I get ...,"{'first_name': 'David', 'middle_name': 'Mckeit...",20.916935,"{'first_name': 'Tori', 'middle_name': 'Ann', '...",dog,Max,100214,"{""uuid"": ""985b4c3a-6a31-4a2f-8df6-c389fb05b989...","David White, a dedicated elementary school tea...",50214,David White,2,"David White, a 42-year-old married elementary ..."
5,chicken pox,I have rashes on my arms and neck that itch wh...,"{'first_name': 'Rolando', 'middle_name': 'Dean...",18.84514,"{'first_name': 'Tabethia', 'middle_name': 'A',...",dog,Buddy,99735,"{""uuid"": ""4f151371-57bd-4450-981b-7bfea875674a...","Rolando Dean Hipolito, a 54-year-old Micronesi...",49735,Rolando Hipolito,1,"Based on the provided information, Rolando Dea..."
6,varicose veins,I have some veins in my legs that are bulging ...,"{'first_name': 'Hermenegildo', 'middle_name': ...",25.121627,"{'first_name': 'Elizabeth', 'middle_name': 'Is...",dog,Daisy,100264,"{""uuid"": ""dea89e87-6d00-4499-987d-5f5e763b7fe3...","Hermenegildo M. Mendoza, a 39-year-old cost es...",50264,Hermenegildo Mendoza,2,"Based on the provided information, Hermenegild..."
7,drug reaction,I have rashes that sometimes cause my skin to ...,"{'first_name': 'Sengchanh', 'middle_name': 'G'...",27.736508,"{'first_name': 'Myong', 'middle_name': '', 'la...",dog,Cooper,99824,"{""uuid"": ""0956f872-84db-4b40-953c-59f1ce6e74c1...","Sengchanh Kotha, a 22-year-old courier from Sa...",49824,Sengchanh Kotha,1,"Based on the provided information, Sengchanh G..."
8,common cold,I've been feeling really sick and exhausted la...,"{'first_name': 'Andrew', 'middle_name': 'David...",24.374416,"{'first_name': 'Beryl', 'middle_name': 'Free',...",dog,Buddy,99868,"{""uuid"": ""1c7020a6-d95c-4a7e-ab56-a6c6a5631fff...","Andrew David Albert, a 26-year-old chief execu...",49868,Andrew Albert,1,"Based on the provided information, Andrew Davi..."
9,migraine,I've been having problems with my vision. I've...,"{'first_name': 'Randy', 'middle_name': 'Patric...",26.393032,"{'first_name': 'Wen', 'middle_name': '', 'last...",dog,Cooper,99807,"{""uuid"": ""abd6bb6d-56d5-4228-ab7d-0b5df09b9e88...","Randy Tran, a 28-year-old assembler from San D...",49807,Randy Tran,1,"Based on the provided information, Randy Patri..."
