<a target="_parent" href="https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/demo/navigator/getting-started/custom-model-configs.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# 🎨 Navigator Data Designer SDK: Using Custom Model Configurations

This notebook demonstrates how to use custom model configurations with Data Designer. We'll show how to:

1. Set up custom model configurations with different parameters
2. Create a data designer with these custom models
3. Define various column types (samplers, expressions, LLM-generated)
4. Preview and generate synthetic data

In [None]:
%%capture
# Install the latest version of Gretel client and dependencies
%pip install -U gretel_client 

## Setup and Initialization

Import the necessary libraries and initialize the Gretel client. We're using:
- `ModelConfig` and `GenerationParameters` for configuring custom models
- Column types (`C`) for defining data structure
- Parameter types (`P`) for configuring column behavior

In [11]:
from gretel_client.navigator_client import Gretel
from gretel_client.workflows.configs.workflows import ModelConfig, GenerationParameters

# We have a new way to build with concrete types.
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P

# Initialize the Gretel client
# Note: In a production environment, you would use your actual API key
gretel = Gretel(api_key="prompt", endpoint="https://api.dev.gretel.ai")

Found cached Gretel credentials
Logged in as kirit.thadaka@gretel.ai ✅
Using project: default-sdk-project-1b613ec72030408
Project link: https://console-eng.gretel.ai/proj_2uY0cfM0kjiegpyEZvCHNKZYxGf


## Custom Model Configurations

In this section, we define two custom model configurations:

1. `mistral-small-static-higher-temp` - Uses a fixed temperature (0.75) for more diverse outputs
2. `mistral-small-variable-higher-temp` - Uses a variable temperature range (0.50 to 0.90) for each generation

These configurations allow us to control the creativity and variability of the LLM outputs.

In [12]:
# Define custom model configurations
model_configs = [
    # Configuration with static temperature
    ModelConfig(alias="mistral-small-static-higher-temp",
                model_name="gretel/mistralai/Mistral-Small-24B-Instruct-2501",
                generation_parameters=GenerationParameters(temperature=0.75, top_p=0.9)),
    
    # Configuration with variable temperature (uniform distribution)
    ModelConfig(alias="mistral-small-variable-higher-temp",
                model_name="gretel/mistralai/Mistral-Small-24B-Instruct-2501",
                generation_parameters=GenerationParameters(
                    temperature={"type": "uniform", "params": {"low": 0.50, "high": 0.90}},
                    top_p=0.9
                ))
]

## Initialize Data Designer with Custom Models

Create a new data designer instance with our custom model configurations. We're using the "apache-2.0" model suite, which provides models that can be used under the Apache 2.0 license.

In [13]:
# Initialize the data designer with our custom model configurations
aidd = gretel.data_designer.new(model_suite="apache-2.0",
                                model_configs=model_configs)

## Configure Person Samplers

Person samplers are pre-configured generators for realistic person data. Here we define two samplers:
1. `person1` - A male person located in San Francisco
2. `person2` - A female person with default location

These samplers can generate a wide range of personal attributes as shown in the table below.

In [14]:
aidd.add_column(
    C.SamplerColumn(
        name="person1",  # This creates a nested object with all person attributes
        type=P.SamplerType.PERSON,
        params=P.PersonSamplerParams(
            locale="en_US",  # Set the locale for appropriate formatting
            sex="Male"
        )
    )
)

aidd.add_column(
    C.SamplerColumn(
        name="person2",  # This creates a nested object with all person attributes
        type=P.SamplerType.PERSON,
        params=P.PersonSamplerParams(
            locale="en_US",  # Set the locale for appropriate formatting
            sex="Female"
        )
    )
)

## Adding Category Samplers

Next, we'll add a category sampler for pet types with weighted probabilities:
- dog: 50% probability
- cat: 30% probability
- fish: 20% probability

We also define a conditional parameter that will return "none" if the number of pets is 0.

In [15]:
# Add pet_type column with conditional logic
aidd.add_column(
    C.SamplerColumn(
        name="pet_type",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(values=["dog", "cat", "fish"], weights=[0.5, 0.3, 0.2]),
        conditional_params={
            "number_of_pets == 0": P.CategorySamplerParams(values=["none"])
        }
    )
)

## Adding Subcategory Samplers

Subcategory samplers allow us to select values based on another column's value. Here, we'll create a pet name sampler that depends on the pet type:
- Different name options for each pet type (dog, cat, fish)
- "n/a" for those with no pets

In [16]:
# Add first_pet_name column that depends on pet_type
aidd.add_column(
    C.SamplerColumn(
        name="first_pet_name",
        type=P.SamplerType.SUBCATEGORY,
        params=P.SubcategorySamplerParams(
            category="pet_type",
            values={
                "dog": ["Buddy", "Max", "Charlie", "Cooper", "Daisy", "Lucy"],
                "cat": ["Oliver", "Leo", "Milo", "Charlie", "Simba", "Luna"],
                "fish": ["Bubbles", "Nemo", "Goldie", "Dory", "Finley", "Splash"],
                "none": ["n/a"]
            }
        )
    )
)

## Adding Statistical Samplers

Here we add a Poisson sampler for the number of pets. A Poisson distribution is good for modeling count data, where we expect a certain average number (in this case, 2 pets on average).

In [17]:
# Add number_of_pets column using Poisson distribution
aidd.add_column(
    C.SamplerColumn(
        name="number_of_pets",
        type=P.SamplerType.POISSON,
        params=P.PoissonSamplerParams(mean=2)
    )
)

## Adding Expression Columns

Expression columns allow us to create new columns based on expressions involving other columns. Here we:
1. Calculate the number of children based on the number of pets
2. Create full name columns for both person samplers

In [18]:
# Add number_of_children column based on number_of_pets
aidd.add_column(
    C.ExpressionColumn(
        name="number_of_children",
        expr="{% if number_of_pets > 0 %}{{ 2 * number_of_pets - 1}}{% else %}0{% endif %}"
    )
)

# Add full name columns for both person samplers
aidd.add_column(
    C.ExpressionColumn(
        name="person1_full_name",
        expr="{{ person1.first_name }} {{ person1.last_name }}"
    )
)

aidd.add_column(
    C.ExpressionColumn(
        name="person2_full_name",
        expr="{{ person2.first_name }} {{ person2.last_name }}"
    )
)

## Adding LLM-Generated Columns

Finally, we'll add columns that use our custom model configurations to generate text. We're creating:

1. `first_pet_backstory` - A backstory for the couple's first pet using the static temperature model
2. `couple_backstory` - A narrative of how the couple met using the variable temperature model

Notice how we use Jinja templating to conditionally format the prompts and incorporate values from other columns.

In [19]:
# Add first_pet_backstory column using static temperature model
aidd.add_column(
    C.LLMTextColumn(
        name="first_pet_backstory",
        prompt=(
            "{% if number_of_pets > 0 %}"
            "Write a sweet backstory for {{ person1.first_name }} and "
            "{{ person2.first_name }}'s first pet {{ pet_type }} named {{ first_pet_name }}. "
            "Keep it concise, no more than 8 sentences."
            "{% else %}"
            "Repeat exactly these words: 'They had no pets.'"
            "{% endif %}"
        ),
        model_alias="mistral-small-static-higher-temp",  # Using our custom model with static temperature
    )
)
# Add couple_backstory column using variable temperature model
aidd.add_column(
    C.LLMTextColumn(
        name="couple_backstory",
        prompt=(
            "Write a thoughtful, funny backstory for how {{ person1_full_name }} and {{ person2_full_name }} met. "
            "{% if number_of_pets > 0 %}"
            "Make sure to include how they decided to get a pet together, ultimately leading to {{ number_of_pets }} pets. "
            "Note their first pet was named {{ first_pet_name }}, with the following backstory:\n\n{{ first_pet_backstory }}"
            "{% else %}"
            "Make sure to include how they decided to not get a pet together."
            "{% endif %}"
        ),
        model_alias="mistral-small-variable-higher-temp",  # Using our custom model with variable temperature
    )
)

## Preview Generated Data

Now that we've configured all our columns, let's preview a sample record to see how our data will look. The `verbose_logging` parameter will show detailed information about the generation process.

In [20]:
# Generate a preview with verbose logging
preview = aidd.preview(verbose_logging=True)

[09:41:11] [INFO] 🚀 Generating preview
[09:41:12] [INFO] ⛓️ Representing generation steps as a Directed Acyclic Graph
[09:41:12] [INFO]   |-- 🔗 `couple_backstory` depends on `first_pet_backstory`
[09:41:12] [INFO]   |-- 🔗 `couple_backstory` depends on `person1_full_name`
[09:41:12] [INFO]   |-- 🔗 `couple_backstory` depends on `person2_full_name`
[09:41:14] [INFO] 🎲 Step 1: Using samplers to generate 5 columns
[09:41:14] [INFO]   |-- 🎲 👩‍🎤 Creating person generator
[09:41:14] [INFO]   |-- 🎲 Using numerical samplers to generate 10 records across 5 columns
[09:41:24] [INFO] 🦜 Step 2: Generating text column `first_pet_backstory`
[09:41:24] [INFO]   |-- 📝 Preparing template to generate data column `first_pet_backstory`
[09:41:24] [INFO]   |   |-- model_alias: mistral-small-static-higher-temp
[09:41:24] [INFO]   |-- Model config being used for model alias 'mistral-small-static-higher-temp': {"alias": "mistral-small-static-higher-temp", "model_name": "gretel/mistralai/Mistral-Small-24B-Instru

In [21]:
# Display a sample record
preview.display_sample_record()

## Create a Full Dataset

Finally, we can generate a full dataset with our configured columns. Here we'll create 100 records and assign a workflow run name to help identify this run later.

In [None]:
# Generate a full dataset of 100 records
aidd.create(num_records=100, name="custom-model-config-demo", wait_until_done=True)