![Neptune + PyTorch Lightning](https://neptune.ai/wp-content/uploads/2023/09/lightning.svg)

# Neptune + PyTorch Lightning

<!--<a target="_blank" href="https://lightning.ai/neptuneai/studios/neptune-lightning">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" height="20" alt="Open In Studio"/>
</a>--> <a target="_blank" href="https://colab.research.google.com/github/neptune-ai/scale-examples/blob/lb/pytorch-lightning/integrations-and-supported-tools/pytorch-lightning/notebooks/Neptune_PyTorch_Lightning.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a><a target="_blank" href="https://github.com/neptune-ai/scale-examples/blob/lb/pytorch-lightning/integrations-and-supported-tools/pytorch-lightning/notebooks/Neptune_PyTorch_Lightning.ipynb">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a><a target="_blank" href="https://scale.neptune.ai/o/examples/org/pytorch-lightning/runs/table?viewId=9ea6121c-42a7-4ece-83b2-c591044837e7&detailsTab=charts&dash=table&type=experiment"> 
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a><!--<a target="_blank" href="">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>-->

## Introduction

This guide will show you how to:

* Install the Neptune Scale-PyTorch Lightning integration
* Create a `NeptuneScaleLogger()` instance
* Use the logger to log metadata to Neptune

## Before you start

  1. Create a Neptune Scale account. [Register &rarr;](https://neptune.ai/early-access)
  2. Create a Neptune project that you will use for tracking metadata. For instructions, see [Projects](https://docs-beta.neptune.ai/projects/) in the Neptune Scale docs.
  3. Install and configure Neptune Scale for logging metadata. For instructions, see [Get started](https://docs-beta.neptune.ai/setup) in the Neptune Scale docs.

## Install Neptune and dependencies

In [None]:
! pip install -q -U neptune-scale torchvision
! pip install -q -U --user pydantic scikit-learn
! pip install -q git+https://github.com/SiddhantSadangi/pytorch-lightning.git

**Note**: If running on Google Colab, restart the kernel and continue execution from the next cell to avoid a `ContextualVersionConflict` error.

This error is caused by Colab coming with `future==0.16.0` preinstalled, while `torchvision` updates this to a newer version.

## Import libraries

In [None]:
import os

import numpy as np
import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from sklearn.metrics import accuracy_score
from torch.optim.lr_scheduler import LambdaLR
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
from torchvision.datasets import MNIST

from lightning.pytorch.loggers.neptune import NeptuneScaleLogger

## Define hyperparameters

In [None]:
params = {
    "batch_size": 64,
    "linear": 32,
    "lr": 0.001,
    "decay_factor": 0.8,
    "max_epochs": 3,
}

## Define Lightning model

To log metrics to Neptune, define a Lightning model with both inbuilt logging (`self.log()` or `self.log_dict()`) and custom logging (`neptune_logger.run`).

`neptune_logger.run` is a reference to the Neptune run object created by the `NeptuneScaleLogger()` instance. You can use it to log outside the `prefix` passed to `NeptuneScaleLogger()`. It accepts Neptune's logging methods such as `log_configs()` and `log_metrics()`.

In [None]:
class LitModel(pl.LightningModule):
    def __init__(self, linear, learning_rate, decay_factor):
        super().__init__()
        self.training_step_outputs = []
        self.validation_step_outputs = []
        self.test_step_outputs = []
        self.linear = linear
        self.learning_rate = learning_rate
        self.decay_factor = decay_factor
        self.train_img_max = 10
        self.train_img = 0
        self.layer_1 = torch.nn.Linear(28 * 28, linear)
        self.layer_2 = torch.nn.Linear(linear, 20)
        self.layer_3 = torch.nn.Linear(20, 10)

    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.layer_1(x)
        x = F.relu(x)
        x = self.layer_2(x)
        x = F.relu(x)
        x = self.layer_3(x)
        return x

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        scheduler = LambdaLR(optimizer, lambda epoch: self.decay_factor**epoch)
        return [optimizer], [scheduler]

    def training_step(self, batch):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        self.log("train/batch/loss", loss, prog_bar=False)

        y_true = y.cpu().detach().numpy()
        y_pred = y_hat.argmax(axis=1).cpu().detach().numpy()
        acc = accuracy_score(y_true, y_pred)
        self.log("train/batch/acc", acc)
        self.training_step_outputs.append({"loss": loss, "y_true": y_true, "y_pred": y_pred})

        return {"loss": loss, "y_true": y_true, "y_pred": y_pred}

    def on_train_epoch_end(self):
        loss = np.array([])
        y_true = np.array([])
        y_pred = np.array([])
        for results_dict in self.training_step_outputs:
            loss = np.append(loss, results_dict["loss"].cpu().detach().numpy())
            y_true = np.append(y_true, results_dict["y_true"])
            y_pred = np.append(y_pred, results_dict["y_pred"])
        acc = accuracy_score(y_true, y_pred)

        self.logger.run.log_metrics(
            data={"train/epoch/loss": loss.mean(), "train/epoch/acc": acc}, step=self.global_step
        )

        self.training_step_outputs.clear()

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        y_true = y.cpu().detach().numpy()
        y_pred = y_hat.argmax(axis=1).cpu().detach().numpy()

        self.validation_step_outputs.append({"loss": loss, "y_true": y_true, "y_pred": y_pred})

        return {"loss": loss, "y_true": y_true, "y_pred": y_pred}

    def on_validation_epoch_end(self):
        loss = np.array([])
        y_true = np.array([])
        y_pred = np.array([])
        for results_dict in self.validation_step_outputs:
            loss = np.append(loss, results_dict["loss"].cpu().detach().numpy())
            y_true = np.append(y_true, results_dict["y_true"])
            y_pred = np.append(y_pred, results_dict["y_pred"])
        acc = accuracy_score(y_true, y_pred)

        # You can also use the log_dict() method from PTL
        self.log_dict(
            {
                "val/epoch/loss": loss.mean(),
                "val/epoch/acc": acc,
            }
        )

        self.validation_step_outputs.clear()

    def test_step(self, batch):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        y_true = y.cpu().detach().numpy()
        y_pred = y_hat.argmax(axis=1).cpu().detach().numpy()

        """
        # Log misclassified test images to Neptune
        # Currently Neptune Scale does not support file uploads
        # The example will be updated once file support is added
        """

        self.test_step_outputs.append({"loss": loss, "y_true": y_true, "y_pred": y_pred})

        return {"loss": loss, "y_true": y_true, "y_pred": y_pred}

    def on_test_epoch_end(self):
        loss = np.array([])
        y_true = np.array([])
        y_pred = np.array([])
        for results_dict in self.test_step_outputs:
            loss = np.append(loss, results_dict["loss"].cpu().detach().numpy())
            y_true = np.append(y_true, results_dict["y_true"])
            y_pred = np.append(y_pred, results_dict["y_pred"])
        acc = accuracy_score(y_true, y_pred)
        self.log("test/loss", loss.mean())
        self.log("test/acc", acc)
        self.validation_step_outputs.clear()

### Initialize model

In [None]:
model = LitModel(
    linear=params["linear"],
    learning_rate=params["lr"],
    decay_factor=params["decay_factor"],
)

## Define DataModule

In [None]:
class MNISTDataModule(pl.LightningDataModule):
    def __init__(self, batch_size, normalization_vector):
        super().__init__()
        self.batch_size = batch_size
        self.normalization_vector = normalization_vector
        self.mnist_train = None
        self.mnist_val = None
        self.mnist_test = None

    def prepare_data(self):
        MNIST(os.getcwd(), train=True, download=True)
        MNIST(os.getcwd(), train=False, download=True)

    def setup(self, stage):
        transform = transforms.Compose(
            [
                transforms.ToTensor(),
                transforms.Normalize(self.normalization_vector[0], self.normalization_vector[1]),
            ]
        )
        if stage == "fit":
            mnist_train = MNIST(os.getcwd(), train=True, transform=transform)
            self.mnist_train, self.mnist_val = random_split(mnist_train, [55000, 5000])
        if stage == "test":
            self.mnist_test = MNIST(os.getcwd(), train=False, transform=transform)

    def train_dataloader(self):
        return DataLoader(self.mnist_train, batch_size=self.batch_size, num_workers=0)

    def val_dataloader(self):
        return DataLoader(self.mnist_val, batch_size=self.batch_size, num_workers=0)

    def test_dataloader(self):
        return DataLoader(self.mnist_test, batch_size=self.batch_size, num_workers=0)

### Initialize DataModule

In [None]:
dm = MNISTDataModule(
    normalization_vector=((0.1307,), (0.3081,)),
    batch_size=params["batch_size"],
)

## Create a NeptuneScaleLogger() instance

To create a new run for tracking the metadata, you tell Neptune who you are (`api_token`) and where to send the data (`project`).

### Log to your own project instead

Replace the code below with the following:

```python
from getpass import getpass

neptune_logger = NeptuneScaleLogger(
    project="workspace-name/project-name",  # replace with your own (see instructions below)
    api_token=getpass("Enter your Neptune API token: "),
    experiment_name="lightning-experiment"  # Optional experiment name
)
```

To find your API token and full project name:

1. [Log in to Neptune](https://scale.neptune.ai/).
1. In the bottom-left corner, expand your user menu and select **Get your API token**.
1. You can copy the project path from the project.

For more help, see [Get your API token](https://docs-beta.neptune.ai/setup#3-get-your-api-token) in the Neptune docs.

In [None]:
neptune_logger = NeptuneScaleLogger(
    # api_key = "YOUR_API_KEY",
    # project = "YOUR_WORKSPACE_NAME/YOUR_PROJECT_NAME"
    experiment_name="lightning-experiment",
)

# Print run URL
print(f"Neptune run URL: {neptune_logger.run.get_run_url()}")

In [None]:
# Add tags using the Neptune methods
neptune_logger.run.add_tags(["notebook", "lightning"])

## Initialize a trainer and pass neptune_logger

In [None]:
trainer = pl.Trainer(
    logger=neptune_logger,
    max_epochs=params["max_epochs"],
)

## Log hyperparameters to the run

In [None]:
neptune_logger.log_hyperparams(params)

## Log model summary to the run

_Currently not supported._

In [None]:
# neptune_logger.log_model_summary(model=model, max_depth=-1)

## Train and test the model and log metadata to the run live

In [None]:
trainer.fit(model, datamodule=dm)
trainer.test(model, datamodule=dm)

To see the metadata being logged live, open the run in the Neptune app.

You can ignore any "X-coordinates (step) must be strictly increasing" errors.

## Stop logging

Once you are done logging, you should stop the run using the `close()` method, which is a method of the run object. The run object itself is part of the `neptune_logger`.
This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.

In [None]:
neptune_logger.run.close()

## Analyze logged metadata in the Neptune app

To explore the metadata in Neptune, follow the link in the console output.

It looks something like [this](https://scale.neptune.ai/o/examples/org/pytorch-lightning/runs/details?viewId=standard-view&detailsTab=charts&runIdentificationKey=lightning-experiment&type=experiment).