
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_Navigator_IAPI_Parallel.ipynb)

<br>

<center><img src="https://gretel-public-website.s3.us-west-2.amazonaws.com/assets/brand/gretel_brand_wordmark.svg" alt="Gretel" width="350"/></center>

<br>

## 👋 Welcome to the Navigator real-time inference API Parallel Blueprint!

In this Blueprint, we will speed up the rate of generation by parallelizing requests to Navigator.


<br>

## ✅ Set up your Gretel account

To get started, you will need a [free Gretel account](https://console.gretel.ai/).

<br>

#### Ready? Let's go 🚀

## 💾 Install `gretel-client` and its dependencies

In [None]:
%%capture
!pip install gretel-client

## 🛜 Configure your Gretel session

- [The Gretel object](https://docs.gretel.ai/create-synthetic-data/gretel-sdk/the-gretel-object) provides a high-level interface for streamlining interactions with Gretel's APIs.

- Retrieve your Gretel API key [here](https://console.gretel.ai/users/me/key).

In [None]:
from gretel_client import Gretel

gretel = Gretel(api_key="prompt", validate=True)

## 🚀 Real-time inference API

- The Navigator real-time inference API makes it possible to programmatically run Navigator outside the [Gretel Console](https://console.gretel.ai/navigator).

- Our [Python SDK](https://github.com/gretelai/gretel-python-client) provides an intuitive high-level interface for the Navigator API.

- Navigator currently supports two data generation modes: `"tabular"` and `"natural_language"`. In both modes, you can choose the backend model that powers the generation.

In [None]:
# list "tabular" backend models
gretel.factories.get_navigator_model_list("tabular")

In [None]:
# list "natural_language" backend models
gretel.factories.get_navigator_model_list("natural_language")

**Notes:**

- `gretelai/auto` automatically selects the current default model, which will change with time as models continue to evolve.

- The `factories` attribute of the `Gretel` object provides methods for creating new objects that interact with Gretel's APIs.

## 📊 Parallel tabular data generation

- We use the `initialize_navigator_api` method of the `factories` attribute to create a Navigator API object per each thread.

- With `model_type = "tabular"` (which is the default), we initialize Navigator's tabular API.

- To select a different backend model, use the optional `backend_model` argument, which we've set to `gretelai/auto`.

In [None]:
import random
from concurrent.futures import ThreadPoolExecutor
from threading import Lock

import pandas as pd
from tqdm import tqdm


def generate_random_params():
    """
    Generate random values for LLM parameters to ensure moderate creativity.

    Returns:
        dict: A dictionary containing random values for temperature, top_p, and top_k.
    """
    params = {
        "temperature": round(
            random.uniform(0.5, 0.75), 2
        ),  # Random float between 0.5 and 0.9
        "top_p": round(
            random.uniform(0.8, 0.95), 2
        ),  # Random float between 0.8 and 1.0
        "top_k": random.randint(30, 45),  # Random integer between 30 and 50
    }
    return params


def generate_records_parallel(prompt: str, num_records=25, num_threads=5):
    shared_df = pd.DataFrame()

    mutex = Lock()

    def generate_data(progress: tqdm):
        tabular = gretel.factories.initialize_navigator_api(
            "tabular", backend_model="gretelai/auto"
        )
        nonlocal mutex, shared_df
        GENERATE_PARAMS = generate_random_params()
        try:
            for item in tabular.generate(
                prompt,
                num_records=num_records,
                stream=True,
                disable_progress_bar=True,
                **GENERATE_PARAMS
            ):
                with mutex:
                    shared_df = pd.concat(
                        [shared_df, pd.DataFrame(item, index=[0])], ignore_index=True
                    )
                    progress.update(1)
        except Exception as e:
            print("Error!")
            print(e)

    with tqdm(total=num_records * num_threads) as progress, ThreadPoolExecutor(
        num_threads
    ) as executor:
        for _ in range(1, num_records * num_threads + 1, num_records):
            executor.submit(generate_data, progress)
    return shared_df

In [None]:
prompt = """
Generate customer bank transaction data. Include the following columns:
- customer_name
- customer_id
- transaction_date
- transaction_amount
- transaction_type
- transaction_category
- account_balance
"""
num_records = 25
num_threads = 5

df = generate_records_parallel(prompt, num_records=num_records, num_threads=num_threads)