## Install Dependences

Install the required dependencies to execute this notebook.

In [1]:
%pip install --upgrade "nemo-microservices[data-designer]" python-dotenv pandas -qqq


Note: you may need to restart the kernel to use updated packages.


## Configure Data Designer

Load our NVIDIA API Key (available from `https://build.nvidia.com/`), import required libraries, and configure our base API URL endpoint.

> We assume you have an API key called `NVIDIA_API_KEY` in a `.env` file in the same directory as this notebook. 

In [3]:
import os
from dotenv import load_dotenv
import pandas as pd

from nemo_microservices.data_designer.essentials import (
    CategorySamplerParams,
    DataDesignerConfigBuilder,
    LLMTextColumnConfig,
    NeMoDataDesignerClient,
    PersonSamplerParams,
    SamplerColumnConfig,
    SamplerType,
    SubcategorySamplerParams,
    UniformSamplerParams,
)

# Load .env and get NVIDIA_API_KEY
load_dotenv()
api_key = os.getenv("NVIDIA_API_KEY")

# Initialize hosted NeMo Data Designer client
data_designer_client = NeMoDataDesignerClient(
    base_url="https://ai.api.nvidia.com/v1/nemo/dd",
    default_headers={"Authorization": f"Bearer {api_key}"}
)

model_alias = "nemotron-nano-v2"


## Define Data Schema

Our data schema defines a single Purchases table with the following columns:
- Purchase ID
- Purchase Date
- Purchase Type
- Amount
- Balance


In [7]:
from nemo_microservices.data_designer.essentials import (
    DataDesignerConfigBuilder,
    SamplerColumnConfig,
    SamplerType,
    ExpressionColumnConfig,
)
from nemo_microservices.data_designer.config.sampler_params import (
    UUIDSamplerParams,
    CategorySamplerParams,
    DatetimeSamplerParams,
    GaussianSamplerParams,
)

config_builder = DataDesignerConfigBuilder()

config_builder.add_column(
    SamplerColumnConfig(
        name="purchase_id",
        sampler_type=SamplerType.UUID,
        params=UUIDSamplerParams(
            prefix="P-",
            short_form=True,
            uppercase=False,
        ),
    )
)

config_builder.add_column(
    SamplerColumnConfig(
        name="purchase_date",
        sampler_type=SamplerType.DATETIME,
        params=DatetimeSamplerParams(
            start="2024-01-01T00:00:00",
            end="2024-12-31T23:59:59",
            unit="D",
        ),
    )
)

config_builder.add_column(
    SamplerColumnConfig(
        name="purchase_type",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["DEBIT", "CREDIT", "REFUND"],
            weights=[0.7, 0.25, 0.05],
        ),
    )
)

config_builder.add_column(
    SamplerColumnConfig(
        name="amount",
        sampler_type=SamplerType.GAUSSIAN,
        params=GaussianSamplerParams(
            mean=100.0,
            stddev=40.0,
            decimal_places=2,
        ),
    )
)

STARTING_BALANCE = 500.0

balance_expression = """
{%- if purchase_type == 'DEBIT' -%}
-1 * {{ amount }}
{%- else -%}
{{ amount }}
{%- endif -%}
"""

config_builder.add_column(
    ExpressionColumnConfig(
        name="balance",
        expr=balance_expression,
    )
)


## Generate Preview Data

The `preview` function allows us to probe Data Designer for a certain number of samples. We've chosen 100 samples here.

In [8]:
NUM_PREVIEW_ROWS = 100

preview = data_designer_client.preview(config_builder, num_records=NUM_PREVIEW_ROWS)
preview_df = preview.dataset

preview_df.head()


[09:41:26] [INFO] ‚úÖ Validation passed
[09:41:26] [INFO] üöÄ Starting preview generation
[09:41:27] [INFO] ‚õìÔ∏è Sorting column configs into a Directed Acyclic Graph
[09:41:27] [INFO] ü©∫ Running health checks for models...
[09:41:30] [INFO]   |-- üëÄ Checking 'nvidia/nvidia-nemotron-nano-9b-v2'...
[09:41:30] [INFO]   |-- ‚úÖ Passed!
[09:41:32] [INFO]   |-- üëÄ Checking 'nvidia/llama-3.3-nemotron-super-49b-v1.5'...
[09:41:32] [INFO]   |-- ‚úÖ Passed!
[09:41:33] [INFO]   |-- üëÄ Checking 'mistralai/mistral-small-24b-instruct'...
[09:41:33] [INFO]   |-- ‚úÖ Passed!
[09:41:38] [INFO]   |-- üëÄ Checking 'openai/gpt-oss-20b'...
[09:41:50] [INFO]   |-- ‚úÖ Passed!
[09:41:50] [INFO]   |-- üëÄ Checking 'openai/gpt-oss-120b'...
[09:41:50] [INFO]   |-- ‚úÖ Passed!
[09:41:52] [INFO]   |-- üëÄ Checking 'meta/llama-4-scout-17b-16e-instruct'...
[09:41:52] [INFO]   |-- ‚úÖ Passed!
[09:41:52] [INFO] ‚è≥ Processing batch 1 of 1
[09:41:52] [INFO] üé≤ Preparing samplers to generate 100 records

Unnamed: 0,purchase_id,purchase_date,purchase_type,amount,balance
0,P-4108209a,2024-05-05,CREDIT,104.67,104.67
1,P-1ce831f0,2024-02-21,DEBIT,88.35,-1 * 88.35
2,P-ba72b88b,2024-02-04,DEBIT,34.38,-1 * 34.38
3,P-cb9a0c6c,2024-02-10,DEBIT,137.84,-1 * 137.84
4,P-de124a38,2024-09-17,DEBIT,81.57,-1 * 81.57


## Save Data

Save our output table to CSV.

In [11]:
os.makedirs("data", exist_ok=True)
output_path = "data/purchases.csv"
preview_df.to_csv(output_path, index=False)