# Neptune + PyTorch Ignite

## Introduction

This guide will show you how to:

* Create a `NeptuneLogger()`,
* Log training metrics to Neptune using `NeptuneLogger()`,
* Upload model checkpoints to Neptune using `NeptuneSaver()`.

## Before you start

This notebook example lets you try out Neptune as an anonymous user, with zero setup.

* If you are running the notebook on your local machine, you need to have [Python](https://www.python.org/downloads/) and [pip](https://pypi.org/project/pip/) installed.
* If you want to see the example recorded to your own workspace instead:
    * Create a Neptune account → [Take me to registration](https://neptune.ai/register)
    * Create a Neptune project that you will use for tracking metadata → [Tell me more about projects](https://docs.neptune.ai/administration/projects)

## Install Neptune and dependencies

In [10]:
%pip install -U neptune scikit-plot torchvision

Note: you may need to restart the kernel to use updated packages.


**Note**: If running on Google Colab, restart the kernel and continue execution from the next cell to avoid a `ContextualVersionConflict` error.

This error is caused by Colab coming with `future==0.16.0` preinstalled, while `torchvision` updates this to a newer version.

## Import libraries

In [11]:
import torch
import torch.nn.functional as F
from torch import nn
from torch.optim import SGD
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision.transforms import Compose, Normalize, ToTensor

from ignite.engine import create_supervised_evaluator, create_supervised_trainer, Events
from ignite.metrics import Accuracy, Loss
from ignite.utils import setup_logger

## Define hyper-parameters

In [12]:
params = {
    "train_batch_size": 64,
    "val_batch_size": 64,
    "epochs": 10,
    "lr": 0.1,
    "momentum": 0.5,
}

## Create model

In [13]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=-1)

In [14]:
model = Net()
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)  # Move model before creating optimizer

Net(
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2_drop): Dropout2d(p=0.5, inplace=False)
  (fc1): Linear(in_features=320, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=10, bias=True)
)

## Define DataLoader()

In [15]:
def get_data_loaders(train_batch_size, val_batch_size):
    data_transform = Compose([ToTensor(), Normalize((0.1307,), (0.3081,))])

    train_loader = DataLoader(
        MNIST(download=True, root=".", transform=data_transform, train=True),
        batch_size=train_batch_size,
        shuffle=True,
    )

    val_loader = DataLoader(
        MNIST(download=False, root=".", transform=data_transform, train=False),
        batch_size=val_batch_size,
        shuffle=False,
    )
    return train_loader, val_loader

In [16]:
train_loader, val_loader = get_data_loaders(params["train_batch_size"], params["val_batch_size"])

## Create optimizer, trainer, and logger

In [17]:
optimizer = SGD(model.parameters(), lr=params["lr"], momentum=params["momentum"])
criterion = nn.CrossEntropyLoss()

trainer = create_supervised_trainer(model, optimizer, criterion, device=device)

## (Neptune) Create NeptuneLogger()

In [18]:
import neptune
from ignite.contrib.handlers.neptune_logger import NeptuneLogger

neptune_logger = NeptuneLogger(
    api_token=neptune.ANONYMOUS_API_TOKEN,
    project="common/pytorch-ignite-integration",
)



https://app.neptune.ai/common/pytorch-ignite-integration/e/PYTOR2-25


To open the run, click the Neptune link that appears in the console output. This will be updated live once training starts.

## (Neptune) Attach logger to the trainer

In [19]:
trainer.logger = setup_logger("Trainer")

neptune_logger.attach_output_handler(
    trainer,
    event_name=Events.ITERATION_COMPLETED(every=100),
    tag="training",
    output_transform=lambda loss: {"batchloss": loss},
)

<ignite.engine.events.RemovableEventHandle at 0x1bb52ffd160>

## Create evaluators

In [20]:
metrics = {"accuracy": Accuracy(), "loss": Loss(criterion)}

train_evaluator = create_supervised_evaluator(model, metrics=metrics, device=device)

validation_evaluator = create_supervised_evaluator(model, metrics=metrics, device=device)

In [21]:
@trainer.on(Events.EPOCH_COMPLETED)
def compute_metrics(engine):
    train_evaluator.run(train_loader)
    validation_evaluator.run(val_loader)

### (Neptune) Attach logger to training and validation evaluators

In [22]:
from ignite.contrib.handlers.neptune_logger import global_step_from_engine

In [23]:
train_evaluator.logger = setup_logger("Train Evaluator")

neptune_logger.attach_output_handler(
    train_evaluator,
    event_name=Events.EPOCH_COMPLETED,  # logging at the end of each epoch
    tag="training",
    metric_names="all",
    global_step_transform=global_step_from_engine(
        trainer
    ),  # takes the epoch of the trainer instead of train_evaluator
)

<ignite.engine.events.RemovableEventHandle at 0x1bb52fe8f70>

In [24]:
validation_evaluator.logger = setup_logger("Validation Evaluator")

neptune_logger.attach_output_handler(
    validation_evaluator,
    event_name=Events.EPOCH_COMPLETED,
    tag="validation",
    metric_names="all",
    global_step_transform=global_step_from_engine(
        trainer
    ),  # takes the epoch of the trainer instead of train_evaluator
)

<ignite.engine.events.RemovableEventHandle at 0x1bb52ffd4f0>

## (Neptune) Log optimizer parameters

In [25]:
neptune_logger.attach_opt_params_handler(
    trainer,
    event_name=Events.ITERATION_COMPLETED(every=100),
    optimizer=optimizer,
)

<ignite.engine.events.RemovableEventHandle at 0x1bb52fe8ac0>

## (Neptune) Log model's normalized weights and gradients after each iteration

In [26]:
from ignite.contrib.handlers.neptune_logger import WeightsScalarHandler

neptune_logger.attach(
    trainer,
    log_handler=WeightsScalarHandler(model, reduction=torch.norm),
    event_name=Events.ITERATION_COMPLETED(every=100),
)

<ignite.engine.events.RemovableEventHandle at 0x1bb1d49dfd0>

In [27]:
from ignite.contrib.handlers.neptune_logger import GradsScalarHandler

neptune_logger.attach(
    trainer,
    log_handler=GradsScalarHandler(model, reduction=torch.norm),
    event_name=Events.ITERATION_COMPLETED(every=100),
)

<ignite.engine.events.RemovableEventHandle at 0x1bb1d4b96a0>

## (Neptune) Save model checkpoints
__Note:__ `NeptuneSaver` currently does not work on Windows

In [28]:
from ignite.handlers import Checkpoint
from ignite.contrib.handlers.neptune_logger import NeptuneSaver


def score_function(engine):
    return engine.state.metrics["accuracy"]


to_save = {"model": model}

handler = Checkpoint(
    to_save=to_save,
    save_handler=NeptuneSaver(neptune_logger),
    n_saved=2,
    filename_prefix="best",
    score_function=score_function,
    score_name="validation_accuracy",
    global_step_transform=global_step_from_engine(trainer),
)

# validation_evaluator.add_event_handler(Events.COMPLETED, handler)

## Run trainer

In [29]:
trainer.run(train_loader, max_epochs=params["epochs"])

2023-05-02 11:04:15,546 Trainer INFO: Engine run starting with max_epochs=10.
2023-05-02 11:04:31,931 Train Evaluator INFO: Engine run starting with max_epochs=1.
2023-05-02 11:04:42,541 Train Evaluator INFO: Epoch[1] Complete. Time taken: 00:00:10.608
2023-05-02 11:04:42,542 Train Evaluator INFO: Engine run complete. Time taken: 00:00:10.609
2023-05-02 11:04:42,543 Validation Evaluator INFO: Engine run starting with max_epochs=1.
2023-05-02 11:04:44,418 Validation Evaluator INFO: Epoch[1] Complete. Time taken: 00:00:01.876
2023-05-02 11:04:44,418 Validation Evaluator INFO: Engine run complete. Time taken: 00:00:01.876
2023-05-02 11:04:44,419 Trainer INFO: Epoch[1] Complete. Time taken: 00:00:28.867
2023-05-02 11:04:59,881 Train Evaluator INFO: Engine run starting with max_epochs=1.
2023-05-02 11:05:10,424 Train Evaluator INFO: Epoch[1] Complete. Time taken: 00:00:10.543
2023-05-02 11:05:10,425 Train Evaluator INFO: Engine run complete. Time taken: 00:00:10.543
2023-05-02 11:05:10,426 

State:
	iteration: 9380
	epoch: 10
	epoch_length: 938
	max_epochs: 10
	max_iters: <class 'NoneType'>
	output: 0.04835706204175949
	batch: <class 'list'>
	metrics: <class 'dict'>
	dataloader: <class 'torch.utils.data.dataloader.DataLoader'>
	seed: <class 'NoneType'>
	times: <class 'dict'>

Head back to the run on Neptune to watch it being updated live!

## (Neptune) Logging additional metadata after training
You can access the Neptune run through the `.experiment` attribute of the `NeptuneLogger` object.

### (Neptune) Log hyper-parameters

In [30]:
neptune_logger.experiment["params"] = params

### (Neptune) Upload trained model

In [31]:
torch.save(model.state_dict(), "model.pth")
neptune_logger.experiment["trained_model"].upload("model.pth")

## (Neptune) Stop logging

In [32]:
neptune_logger.close()

Shutting down background jobs, please wait a moment...
Done!
Waiting for the remaining 10 operations to synchronize with Neptune. Do not kill this process.
All 10 operations synced, thanks for waiting!
Explore the metadata in the Neptune app:
https://app.neptune.ai/common/pytorch-ignite-integration/e/PYTOR2-25/metadata


## Analyze logged metadata in the Neptune app

Go to the run link and explore metadata (metrics, params, model checkpoints) that were logged to the run in Neptune.