# How to use Neptune Scale for tracking HPO runs

<a target="_blank" href="https://colab.research.google.com/github/neptune-ai/scale-examples/blob/main/how-to-guides/hpo/notebooks/Neptune_HPO.ipynb"> 
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/> 
</a>
<a target="_blank" href="https://github.com/neptune-ai/scale-examples/blob/main/how-to-guides/hpo/notebooks/Neptune_HPO.ipynb">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a>
<a target="_blank" href="https://scale.neptune.ai/o/examples/org/hpo/runs/table?viewId=9d44261f-32a1-42e7-96ff-9b35edc4be66">
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a>
<a target="_blank" href="https://docs-beta.neptune.ai/tutorials/hpo/">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>

## Introduction

When running a hyperparameter optimization job, you can use Neptune Scale to track all the metadata from the study and each trial.


## Before you start

  1. Create a Neptune Scale account. [Register &rarr;](https://neptune.ai/early-access)
  2. Create a Neptune project that you will use for tracking metadata. For instructions, see [Projects](https://docs-beta.neptune.ai/projects/) in the Neptune Scale docs.
  3. Install and configure Neptune Scale for logging metadata. For instructions, see [Get started](https://docs-beta.neptune.ai/setup) in the Neptune Scale docs.

### Set the NEPTUNE_PROJECT environment variable
Replace `examples/hpo` with your own project

In [None]:
%env NEPTUNE_PROJECT=examples/hpo

## Install Neptune Scale and dependencies

In [None]:
! pip install -qU neptune-scale torch torchvision tqdm "numpy<2.0"

## Import libraries

In [None]:
from neptune_scale import Run
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from tqdm.auto import trange, tqdm
import math

In [None]:
from datetime import datetime

ALLOWED_DATATYPES = [int, float, str, datetime, bool, list, set]

## Hyperparameters

In [None]:
parameters = {
    "batch_size": 128,
    "input_size": (1, 28, 28),
    "n_classes": 10,
    "epochs": 3,
    "device": torch.device("cuda:0" if torch.cuda.is_available() else "cpu"),
}

input_size = math.prod(parameters["input_size"])

### Hyperparameter search space

In [None]:
learning_rates = [0.05, 0.1, 0.5]  # learning rate choices

## Model

In [None]:
class BaseModel(nn.Module):
    def __init__(self, input_size, num_classes):
        super(BaseModel, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, num_classes)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return x


criterion = nn.CrossEntropyLoss()

## Dataset

In [None]:
data_tfms = {
    "train": transforms.Compose(
        [
            transforms.ToTensor(),
        ]
    )
}

In [None]:
trainset = datasets.MNIST(
    root="mnist",
    train=True,
    download=True,
    transform=data_tfms["train"],
)

trainloader = torch.utils.data.DataLoader(
    trainset,
    batch_size=parameters["batch_size"],
    shuffle=True,
    num_workers=0,
)

## Log metadata across HPO trials into a single run

### Initialize Model

In [None]:
model = BaseModel(
    input_size,
    parameters["n_classes"],
).to(parameters["device"])

### Create a global Neptune run

In [None]:
from random import random

run = Run(
    family="hpo",
    run_id=f"hpo-{random()}",
)

run.add_tags(["all-trials", "notebook"])

### Log configuration common across all trials

In [None]:
for key in parameters:
    if type(parameters[key]) not in ALLOWED_DATATYPES:
        run.log_configs({f"config/{key}": str(parameters[key])})
    else:
        run.log_configs({f"config/{key}": parameters[key]})

### Training loop

In [None]:
for trial, lr in tqdm(
    enumerate(learning_rates),
    total=len(learning_rates),
    desc="Trials",
):
    # Log trial hyperparameters
    run.log_configs({f"trials/{trial}/parameters/lr": lr})

    optimizer = optim.SGD(model.parameters(), lr=lr)

    # Initialize fields for best values across all trials
    best_acc = None

    step = 0

    for epoch in trange(parameters["epochs"], desc=f"Trial {trial} - lr: {lr}"):
        run.log_metrics(step=epoch, data={f"trials/{trial}/epochs": epoch})

        for x, y in trainloader:
            x, y = x.to(parameters["device"]), y.to(parameters["device"])
            optimizer.zero_grad()
            x = x.view(x.size(0), -1)
            outputs = model(x)
            loss = criterion(outputs, y)

            _, preds = torch.max(outputs, 1)
            acc = (torch.sum(preds == y.data)) / len(x)

            # Log trial metrics
            run.log_metrics(
                step=step,
                data={
                    f"trials/{trial}/metrics/batch/loss": float(loss),
                    f"trials/{trial}/metrics/batch/acc": float(acc),
                },
            )

            # Log best values across all trials
            if best_acc is None or acc > best_acc:
                best_acc = acc
                run.log_configs(
                    {
                        "best/trial": trial,
                        "best/metrics/loss": float(loss),
                        "best/metrics/acc": float(acc),
                        "best/parameters/lr": lr,
                    }
                )

            loss.backward()
            optimizer.step()

            step += 1

### Stop logging

In [None]:
run.close()

### Explore the results in Neptune
Follow the link to the run and explore the logged metadata in the Neptune app:

- The best trial, with its metrics and parameters, is available in the *best* namespace
- Metadata across all trials is available in the *trials* namespace

To organize all relevant metadata in one view, create a [custom dashboard](https://docs-beta.neptune.ai/custom_dashboard). [See an example](https://scale.neptune.ai/o/examples/org/hpo/runs/details?viewId=9d4424ec-5c27-4933-9003-d62e0784ac68&detailsTab=dashboard&dashboardId=HPO-overview-9d4421e6-dfe4-400b-9dfb-d9b9e8a416b6&runIdentificationKey=HPO-11&type=run).

To view best trials across different runs, you can also create [saved table views](https://docs-beta.neptune.ai/experiments_table#custom-views). [See an example](https://scale.neptune.ai/o/examples/org/hpo/runs/table?viewId=9d4424ec-5c27-4933-9003-d62e0784ac68&detailsTab=dashboard&dash=table&type=run).

## Log metadata from each HPO trial into separate runs

You can also log metadata from each trial into separate runs. This way, you can track metadata from each trial separately.  
Aggregated values can be logged to a parent sweep-level run. Sweep-level identifiers can be used to group all trials from the same sweep.

### Initialize Model

In [None]:
model = BaseModel(
    input_size,
    parameters["n_classes"],
).to(parameters["device"])

### Create a sweep-level identifier

In [None]:
import uuid

sweep_id = str(uuid.uuid4())

### Initialize sweep-level run

In [None]:
sweep_run = Run(
    family=f"sweep-{sweep_id}",
    run_id=f"sweep-{sweep_id}",
)

sweep_run.add_tags(["sweep", "notebook"])

### Assign sweep_id to sweep-level run as a group tag


In [None]:
sweep_run.add_tags([sweep_id], group_tags=True)

### Log configuration common across all trials

In [None]:
for key in parameters:
    if type(parameters[key]) not in ALLOWED_DATATYPES:
        sweep_run.log_configs({f"config/{key}": str(parameters[key])})
    else:
        sweep_run.log_configs({f"config/{key}": parameters[key]})

### Training Loop

In [None]:
# Initialize fields for best values across all trials
best_acc = None

for trial, lr in tqdm(
    enumerate(learning_rates),
    total=len(learning_rates),
    desc="Trials",
):
    # Create a trial-level run
    with Run(
        family=f"sweep-{sweep_id}",
        run_id=f"trial-{sweep_id}-{trial}",
    ) as trial_run:
        trial_run.add_tags(["trial", "notebook"])

        # Add sweep_id to the trial-level run
        trial_run.add_tags([sweep_id], group_tags=True)

        # Log trial number and hyperparams
        trial_run.log_configs({"trial_num": trial, "parameters/lr": lr})

        optimizer = optim.SGD(model.parameters(), lr=lr)

        step = 0

        for epoch in trange(parameters["epochs"], desc=f"Trial {trial} - lr: {lr}"):
            trial_run.log_metrics(step=epoch, data={"epochs": epoch})

            for x, y in trainloader:
                x, y = x.to(parameters["device"]), y.to(parameters["device"])
                optimizer.zero_grad()
                x = x.view(x.size(0), -1)
                outputs = model(x)
                loss = criterion(outputs, y)

                _, preds = torch.max(outputs, 1)
                acc = (torch.sum(preds == y.data)) / len(x)

                # Log trial metrics
                trial_run.log_metrics(
                    step=step,
                    data={
                        "metrics/batch/loss": float(loss),
                        "metrics/batch/acc": float(acc),
                    },
                )

                # Log best values across all trials to Sweep-level run
                if best_acc is None or acc > best_acc:
                    best_acc = acc
                    sweep_run.log_configs(
                        {
                            "best/trial": trial,
                            "best/metrics/loss": float(loss),
                            "best/metrics/acc": float(acc),
                            "best/parameters/lr": lr,
                        }
                    )

                loss.backward()
                optimizer.step()

                step += 1

### Stop the sweep-level run

In [None]:
sweep_run.close()

### Explore the results in Neptune
Follow the link to the runs and explore the logged metadata in the Neptune app:

- **Single run**
  - The best trial, with its metrics and parameters, is available in the *best* namespace of the sweep-level run
  - Metadata across all trials are available in the trial-level runs

- **Multiple runs**
  - To group all trials under a sweep, use the [run groups](https://docs-beta.neptune.ai/groups). [See an example](https://scale.neptune.ai/o/examples/org/hpo/runs/table?viewId=9d44261f-32a1-42e7-96ff-9b35edc4be66&detailsTab=dashboard&dash=table&type=run).
  - To compare trails within or across sweeps, create a [multi-run dashboard](https://docs-beta.neptune.ai/custom_dashboard#multi-run-dashboard). [See an example](https://scale.neptune.ai/o/examples/org/hpo/runs/compare?viewId=9d44261f-32a1-42e7-96ff-9b35edc4be66&detailsTab=dashboard&dash=dashboard&dashboardId=Compare-trials-9d44284a-40fe-4614-a66d-a5ca81b8b4cd&type=run&compare=uIWrlI2f5Tyn_lrTzrCY6RSrOVUYtMkY0ozkGXHFv6E8). 
    - To compare the average of trials across different sweeps, turn on [*Average grouped runs*](https://docs-beta.neptune.ai/charts#comparing-grouped-runs) in the chart widget settings.
  - To see both sweep-level and trial-level comparisons together, export charts or dashboards to a [report](https://docs-beta.neptune.ai/reports). [See an example](https://scale.neptune.ai/o/examples/org/hpo/reports/9d442900-19b4-47dc-a2e9-0faedc1f4d2c).