[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_101_Blueprint.ipynb)

<br>

<center><a href=https://gretel.ai/><img src="https://gretel-public-website.s3.us-west-2.amazonaws.com/assets/brand/gretel_brand_wordmark.svg" alt="Gretel" width="350"/></a></center>

<br>

## Welcome to the Gretel 101 Blueprint!

In this Blueprint, we will use Gretel to train a deep generative model and use it to generate high-quality synthetic (tabular) data. We will accomplish this by submitting training and generation jobs to the [Gretel Cloud](https://gretel.ai/faqs/gretel-cloud) via [Gretel's Python SDK](https://docs.gretel.ai/guides/environment-setup/cli-and-sdk).

Behind the scenes, Gretel will spin up workers with the necessary compute resources, set up the model with your desired configuration, and perform the submitted task.

## Create your Gretel account

To get started, you will need to [sign up for a free Gretel account](https://console.gretel.ai/).

<br>

#### Ready? Let's go 🚀

## 💾 Install `gretel-client` and its dependencies

In [None]:
%%capture
!pip install gretel-client

## 🛜 Configure your Gretel session

- The `Gretel` object provides a high-level interface for streamlining interactions with Gretel's APIs.

- Each `Gretel` instance is bound to a single [Gretel project](https://docs.gretel.ai/guides/gretel-fundamentals/projects).

- Running the cell below will prompt you for your Gretel API key, which you can retrieve [here](https://console.gretel.ai/users/me/key).

- With `validate=True`, your login credentials will be validated immediately at instantiation.

In [None]:
from gretel_client import Gretel

gretel = Gretel(api_key="prompt", validate=True)

In [None]:
# @title 🗂️ Pick a tabular dataset 👇 { display-mode: "form" }
dataset_path_dict = {
    "adult income in the USA (14000 records, 15 fields)": "https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/us-adult-income.csv",
    "hospital length of stay (9999 records, 18 fields)": "https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/sample-synthetic-healthcare.csv",
    "customer churn (7032 records, 21 fields)": "https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/sample_data/monthly-customer-payments.csv"
}

dataset = "adult income in the USA (14000 records, 15 fields)" # @param ["adult income in the USA (14000 records, 15 fields)", "hospital length of stay (9999 records, 18 fields)", "customer churn (7032 records, 21 fields)"]
dataset = dataset_path_dict[dataset]


In [None]:
import pandas as pd

# explore the data using pandas
df = pd.read_csv(dataset)
df.head()

## 🏋️‍♂️ Train a generative model

- The [tabular-actgan](https://github.com/gretelai/gretel-blueprints/blob/main/config_templates/gretel/synthetics/tabular-actgan.yml) base config tells Gretel which model to train and how to configure it.

- You can replace `tabular-actgan` with the path to a custom config file, or you can select any of the tabular configs [listed here](https://github.com/gretelai/gretel-blueprints/tree/main/config_templates/gretel/synthetics).

- The training data is passed in using the `data_source` argument. Its type can be a file path or `DataFrame`.

- **Tip:** Click the printed Console URL to monitor your job's progress in the Gretel Console.

In [None]:
trained = gretel.submit_train("tabular-actgan", data_source=dataset)

## 🧐 Evaluate the synthetic data quality

- Gretel automatically creates a [synthetic data quality report](https://docs.gretel.ai/reference/evaluate/synthetic-data-quality-report) for each model you train.

- The training results object returned by `submit_train` has a `GretelReport` attribute for viewing the quality report.


In [None]:
# view the quality scores
print(trained.report)

In [None]:
# display the full report within this notebook
trained.report.display_in_notebook()

In [None]:
# inspect the synthetic data used to create the report
df_synth_report = trained.fetch_report_synthetic_data()
df_synth_report.head()

## 🤖 Generate synthetic data

- The `model_id` argument can be the ID of any trained model within the current project.


In [None]:
generated = gretel.submit_generate(trained.model_id, num_records=1000)

In [None]:
# inspect the generated synthetic data
generated.synthetic_data.head()