<a target="_parent" href="https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/demo/navigator/data-designer-101/4-custom-model-configs.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# 🎨 Data Designer 101: Using Custom Model Configurations

In this notebook, we will see how to create and use custom model configurations in `DataDesigner`.

If this is your first time using `DataDesigner`, we recommend starting with the [first notebook](https://github.com/gretelai/gretel-blueprints/blob/main/docs/notebooks/demo/navigator/data-designer-101/1-the-basics.ipynb) in this 101 series.

<br>

### 💾 Install `gretel-client` and its dependencies

In [None]:
%%capture
%pip install git+https://github.com/gretelai/gretel-python-client

In [None]:
from gretel_client.navigator_client import Gretel
from gretel_client.workflows.configs.workflows import ModelConfig, GenerationParameters

# We import AIDD column and parameter types using this shorthand for convenience.
import gretel_client.data_designer.params as P
import gretel_client.data_designer.columns as C

# The Gretel object is the SDK's main entry point for interacting with Gretel's API.
gretel = Gretel(api_key="prompt", endpoint="https://api.dev.gretel.ai")

## ⚙️ Custom Model Configurations

- `DataDesigner` comes with sensible defaults for LLMs and their generation settings, but sometimes you need more control.

- This is where custom model configurations come in.

- Below, we create two new "model aliases" that we can set as the LLM for any task that has `model_alias` as an argument. 

- Note that the models selected for the model alias must be one of the allowed models with in the selected Model Suite.

In [2]:
model_configs = [
    # Configuration with static temperature
    ModelConfig(
        alias="mistral-small-static-higher-temp",
        model_name="gretel/mistralai/Mistral-Small-24B-Instruct-2501",
        generation_parameters=GenerationParameters(temperature=0.75, top_p=0.9)
    ),
    # Configuration with variable temperature (uniform distribution), which 
    # is sampled for every LLM call.
    ModelConfig(
        alias="mistral-small-variable-higher-temp",
        model_name="gretel/mistralai/Mistral-Small-24B-Instruct-2501",
        generation_parameters=GenerationParameters(
            temperature={"type": "uniform", "params": {"low": 0.50, "high": 0.90}},
            top_p=0.9
        )
    )
]

In [3]:
# Initialize the data designer with our custom model configurations
aidd = gretel.data_designer.new(model_suite="apache-2.0", model_configs=model_configs)

## 👩‍⚕️ Designing our synthetic dataset

New features demonstrated below:

- Using custom model aliases 

- Conditional params for samplers

- If/else logic in Jinja expressions

In [None]:
aidd.add_column(
    C.SamplerColumn(
        name="person1",  
        type=P.SamplerType.PERSON,
        params=P.PersonSamplerParams(sex="Male")
    )
)

aidd.add_column(
    C.SamplerColumn(
        name="person2",  
        type=P.SamplerType.PERSON,
        params=P.PersonSamplerParams(sex="Female")
    )
)

# Add pet_type column with conditional parameters.
aidd.add_column(
    C.SamplerColumn(
        name="pet_type",
        type=P.SamplerType.CATEGORY,
        # These will be the default values for the sampler.
        params=P.CategorySamplerParams(values=["dog", "cat", "fish"], weights=[0.5, 0.3, 0.2]),
        # These will be the values for the sampler if the condition is met.
        conditional_params={
            "number_of_pets == 0": P.CategorySamplerParams(values=["none"])
        }
    )
)

aidd.add_column(
    C.SamplerColumn(
        name="first_pet_name",
        type=P.SamplerType.SUBCATEGORY,
        params=P.SubcategorySamplerParams(
            category="pet_type",
            values={
                "dog": ["Buddy", "Max", "Charlie", "Cooper", "Daisy", "Lucy"],
                "cat": ["Oliver", "Leo", "Milo", "Charlie", "Simba", "Luna"],
                "fish": ["Bubbles", "Nemo", "Goldie", "Dory", "Finley", "Splash"],
                "none": ["n/a"]
            }
        )
    )
)

aidd.add_column(
    C.SamplerColumn(
        name="number_of_pets",
        type=P.SamplerType.POISSON,
        params=P.PoissonSamplerParams(mean=2)
    )
)


# Use jinja if/else logic to set the number of children.
aidd.add_column(
    C.ExpressionColumn(
        name="number_of_children",
        expr="{% if number_of_pets > 0 %}{{ 2 * number_of_pets - 1}}{% else %}0{% endif %}"
    )
)

aidd.add_column(
    C.ExpressionColumn(
        name="person1_full_name",
        expr="{{ person1.first_name }} {{ person1.last_name }}"
    )
)

aidd.add_column(
    C.ExpressionColumn(
        name="person2_full_name",
        expr="{{ person2.first_name }} {{ person2.last_name }}"
    )
)


aidd.add_column(
    C.LLMTextColumn(
        name="first_pet_backstory",
        prompt=(
            "{% if number_of_pets > 0 %}"
            "Write a sweet backstory for {{ person1.first_name }} and "
            "{{ person2.first_name }}'s first pet {{ pet_type }} named {{ first_pet_name }}. "
            "Keep it concise, no more than 8 sentences."
            "{% else %}"
            "Repeat exactly these words: 'They had no pets.'"
            "{% endif %}"
        ),
        # We're using our custom model with static temperature.
        model_alias="mistral-small-static-higher-temp",  
    )
)

aidd.add_column(
    C.LLMTextColumn(
        name="couple_backstory",
        prompt=(
            "Write a thoughtful, funny backstory for how {{ person1_full_name }} and {{ person2_full_name }} met. "
            "{% if number_of_pets > 0 %}"
            "Make sure to include how they decided to get a pet together, ultimately leading to {{ number_of_pets }} pets. "
            "Note their first pet was named {{ first_pet_name }}, with the following backstory:\n\n{{ first_pet_backstory }}"
            "{% else %}"
            "Make sure to include how they decided to not get a pet together."
            "{% endif %}"
        ),
        # We're using our custom model with variable temperature.
        model_alias="mistral-small-variable-higher-temp",  
    )
)


aidd.with_evaluation_report().validate()

## 👀 Preview the dataset

- Iteration is key to generating high-quality synthetic data.

- Use the `preview` method to generate 10 records for inspection.

- We set `verbose_logging` to `True` to see additional logging to verify our custom model aliases are being used.

In [None]:
preview = aidd.preview(verbose_logging=True)

In [None]:
# The preview dataset is available as a pandas DataFrame.
preview.dataset.df.head()

In [None]:
# Run this cell multiple times to cycle through the 10 preview records.
preview.display_sample_record()

## 🆙 Scale up!

- Once you are happy with the preview, scale up to a larger dataset by submitting a batch workflow.

- You can view the evaluation report by following the workflow link in the output of `create` below.

- Click the link to follow along with the generation process.

In [None]:
workflow_run = aidd.create(num_records=100, name="aidd-101-notebook-4-custom-model-configs")