<img src="https://wandb.me/logo-im-png" width="400" alt="Weights & Biases" />
<br>
<img src="https://www.gitbook.com/cdn-cgi/image/height=40,fit=contain,dpr=1,format=auto/https%3A%2F%2F2196202216-files.gitbook.io%2F~%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252F-MW662bNvw1TgbuEBiwQ%252Flogo%252FLzl7Qs5X5sYkFBgjygeZ%252Fgretel_brand_wordmark_padded%25403x.png%3Falt%3Dmedia%26token%3D3f02fe4f-8684-443e-8aea-83a0e512cd96" width="200" alt="Gretel.ai" />


<a href="https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Organizing_Hyperparameter_Sweeps_in_PyTorch_with_W%26B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


In [None]:
%%capture
!pip install -U wandb gretel_client

In [None]:
import wandb

wandb.login()


In [None]:
from gretel_client import configure_session

configure_session(api_key="prompt", cache="yes", validate=True)


### 👈 Pick a `method`


The first thing we need to define is the `method`
for choosing new parameter values.

We provide the following search `methods`:

- **`grid` Search** – Iterate over every combination of hyperparameter values.
  Very effective, but can be computationally costly.
- **`random` Search** – Select each new combination at random according to provided `distribution`s. Surprisingly effective!
- **`bayes`ian Search** – Create a probabilistic model of metric score as a function of the hyperparameters, and choose parameters with high probability of improving the metric. Works well for small numbers of continuous parameters but scales poorly.

We'll stick with `random`.


In [None]:
sweep_config = {"method": "bayes"}


For `bayes`ian Sweeps,
you also need to tell us a bit about your `metric`.
We need to know its `name`, so we can find it in the model outputs
and we need to know whether your `goal` is to `minimize` it
(e.g. if it's the squared error)
or to `maximize` it
(e.g. if it's the accuracy).


In [None]:
metric = {"name": "sqs", "goal": "maximize"}

sweep_config["metric"] = metric


If you're not running a `bayes`ian Sweep, you don't have to,
but it's not a bad idea to include this in your `sweep_config` anyway,
in case you change your mind later.
It's also good reproducibility practice to keep note of things like this,
in case you, or someone else,
come back to your Sweep in 6 months or 6 years
and don't know whether `val_G_batch` is supposed to be high or low.


### 📃 Name the hyper`parameters`


In [None]:
parameters_dict = {
    "epochs": {"values": [25, 50, 100, 150]},
    "learning_rate": {"values": [0.001, 0.005, 0.01]},
    "vocab_size": {"values": [0, 500, 1000, 10000, 20000]},
    "rnn_units": {"values": [64, 256, 1024]},
    "batch_size": {"values": [64, 256]},
}

sweep_config["parameters"] = parameters_dict


In [None]:
import pprint

pprint.pprint(sweep_config)


# Step 2️⃣. Initialize the Sweep


In [None]:
import pandas as pd

# Load the training dataset
dataset_path = "https://gretel-public-website.s3.amazonaws.com/datasets/credit-timeseries-dataset.csv"
df = pd.read_csv(dataset_path)
df.to_csv("training_data.csv", index=False)
df


In [None]:
sweep_id = wandb.sweep(sweep_config, project="gretel-timeseries-sweep")


# Step 3️⃣. Run the Sweep agent


In [None]:
from gretel_client.projects import create_or_get_unique_project
from gretel_client.projects.models import read_model_config
from gretel_client.projects.jobs import Status

def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):
        # If called by wandb.agent, as below,
        # this config will be set by Sweep Controller
        config = wandb.config

        model_config = read_model_config("synthetics/default")

        model_config["models"][0]["synthetics"]["params"]["epochs"] = config["epochs"]
        model_config["models"][0]["synthetics"]["params"]["learning_rate"] = config[
            "learning_rate"
        ]
        model_config["models"][0]["synthetics"]["params"]["vocab_size"] = config[
            "vocab_size"
        ]
        model_config["models"][0]["synthetics"]["params"]["rnn_units"] = config[
            "rnn_units"
        ]
        model_config["models"][0]["synthetics"]["params"]["batch_size"] = config[
            "batch_size"
        ]

        project = create_or_get_unique_project(name="wandb-synthetic-data")
        model = project.create_model_obj(
            model_config=model_config, data_source="training_data.csv"
        )
        model.submit_cloud()

        # Log training accuracy to wandb
        for status_update in model.poll_logs_status():
            for update in status_update.logs:
                if "ctx" in update.keys():
                    acc = update["ctx"].get("accuracy")
                    loss = update["ctx"].get("loss")
                    epoch = update["ctx"].get("epoch")
                    ts = update["ctx"].get("ts")
                    if acc:
                        wandb.log(
                            {"accuracy": acc, "loss": loss, "time": ts, "epoch": epoch}
                        )

        # Log synthetic quality score and training time to wandb
        training_time = model.billing_details["total_time_seconds"]
        if model.status == Status.ERROR:
            wandb.log({"sqs": 0, "training_time": training_time})
        else:
            report = model.peek_report()
            sqs = report["synthetic_data_quality_score"]["score"]
            wandb.log({"sqs": sqs, "training_time": training_time})


In [None]:
wandb.login()
wandb.agent(sweep_id, train, count=20)
