## Instructions 

Install Gretel Trainer to use Benchmark. Depending on how big your datasets are and how many models and datasets you add, your Benchmark job may take more than 15 min or even longer to run. 

For best results, try the sample datasets in this notebook first to see Benchmark in action. The "iris" and "heart_disease" publicly available datasets used in this demo will take between 10 to 15 minutes to finish training on the GretelLSTM and GretelAmplify models.

Note: you can always check the progress of the models (and check model training logs) by viewing the Benchmark projects in the Gretel console dashboard. 

Interested in seeing the results of all Benchmark datasets on all Gretel models? You can check out the Benchmark report here: https://docs.gretel.ai/reference/benchmark/benchmark-report  

In [None]:
!pip install -U gretel-trainer

If running in Colab, the `pip install` command above will update the `matplotlib` library under the hood, but the previously installed version has already been imported automatically by Colab. As `pip`'s log output should suggest, you need to restart the Colab runtime to use the new version (and, by extension, import and use Benchmark).

You can also optionally configure your Gretel credentials here or be prompted when you run the Benchmark comparison later in this notebook. 
Learn more about signing up for a [free Gretel account](https://docs.gretel.ai/quickstart)

In [None]:
import gretel_trainer.benchmark as b

## Datasets

### From your own data


When using your own data, indicate the datatype (select between: "tabular_mixed", "tabular_numeric", "natural_language", and "time_series"). Learn more in the [Benchmark docs](https://docs.gretel.ai/reference/benchmark#docs-internal-guid-31c7e29f-7fff-7936-54f8-737618a7e7f3).

Running in Google Colab? You can add your files to the Colab file system, and indicate the path like: "/content/my_files/data.csv"

In [None]:
# my_data = b.make_dataset(["/PATH/TO/MY_DATASET.csv"], datatype="INDICATE_DATATYPE")

### From Gretel

Want to use public data? Gretel makes it easy for you to use common used, publicly available datasets, rather than using your own

In [None]:
datasets = []
datasets = b.list_gretel_datasets() # selects all Benchmark datasets

# Other sample commands
# datasets = b.list_gretel_datasets(datatype="time_series") # select all time-series datasets
# datasets = b.list_gretel_datasets(datatype="tabular_mixed", tags=["small", "marketing"]) # select all tabular_mixed, size small, and marketing-related datasets

# This will show you all the datasets in the Benchmark dataset bucket
[dataset.name for dataset in datasets]

In [None]:
# Benchmark datasets are annotated with tags, so you can select based on your use case
b.list_gretel_dataset_tags() 

In [None]:
# For this demo, we will select two datasets by name:
# "iris.csv" - a publicly available dataset for predicting the class of the iris plant based on attributes
# "processed_cleveland_heart_disease_uci.csv" - a publicly available dataset for predicting presence of heart disease
iris = b.get_gretel_dataset("iris")
heart_disease = b.get_gretel_dataset("processed_cleveland_heart_disease_uci")

## Models

### Gretel defaults

Preconfigured based on [public blueprints](https://github.com/gretelai/gretel-blueprints/tree/main/config_templates/gretel/synthetics). This is the easiest way to use Gretel's models out-of-the-box with the default configurations. 

In [None]:
from gretel_trainer.benchmark import (
    GretelAmplify,
    GretelAuto,
    GretelACTGAN,
    GretelGPTX,
    GretelLSTM,
)

### Customized Gretel models

You can also cusotmize Gretel models by changing the config to your customized configuration YAML file.

In [None]:
"""
from gretel_trainer.benchmark import GretelModel

# If trying in Colab: remember to add your config file to Colab's local file storage, then indicate the path like: "/content/my_files/my_config.yml"
class MyCustomLSTM(GretelModel):
    config = "/PATH/TO/MY_CONFIG.yml"


class MyCustomACTGAN(GretelModel):
    config = {...}

"""

### Completely custom, non-Gretel models

Benchmark lets you compare any model, not just Gretel models. You can implement any custom in Python. Learn more in the [Benchmark documentation](https://docs.gretel.ai/reference/benchmark#docs-internal-guid-31c7e29f-7fff-7936-54f8-737618a7e7f3). 

In [None]:
"""
class MyCustomModel:
    def train(self, source: str, **kwargs) -> None:
        # your training code here
    def generate(self, **kwargs) -> pd.DataFrame:
        # your generation code here

"""

## Launch a Benchmark Comparison!

Putting it all together! 

When you run a Benchmark comparison, all selected models will run on all indicated datasets. Tip: make sure the models you select are applicable to the datatype of the datasets. 

Learn more in the Benchmark docs: https://docs.gretel.ai/reference/benchmark. 

In [None]:
comparison = b.compare(datasets= [heart_disease, iris], models=[GretelLSTM, GretelAmplify])

In [None]:
# Run this to see a snapshot of results! (While comparison is running)
comparison.results

In [None]:
# Run this to wait for comparison to finish running, and export results as CSV at the end
comparison.wait()
comparison.export_results("./results.csv")