# Monitor ML runs live 

## Introduction

This guide will show you how to:

* Monitor training and evaluation metrics and losses live
* Monitor hardware resources during training

By the end of it, you will monitor your metrics, losses, and hardware live in Neptune!

## Setup

Install dependencies

In [1]:
pip install neptune-client tensorflow

Note: you may need to restart the kernel to use updated packages.


## Step 1: Create a basic training script

As an example I'll use a script that trains a Keras model on mnist dataset.

In [2]:
import keras

# parameters
PARAMS = {'epoch_nr': 100,
          'batch_size': 256,
          'lr': 0.005,
          'momentum': 0.4,
          'use_nesterov': True,
          'unit_nr': 256,
          'dropout': 0.05}

mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = keras.models.Sequential([
    keras.layers.Flatten(),
    keras.layers.Dense(PARAMS['unit_nr'], activation=keras.activations.relu),
    keras.layers.Dropout(PARAMS['dropout']),
    keras.layers.Dense(10, activation=keras.activations.softmax)
])

optimizer = keras.optimizers.SGD(lr=PARAMS['lr'],
                                 momentum=PARAMS['momentum'],
                                 nesterov=PARAMS['use_nesterov'], )

model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

## Step 2: Initialize Neptune

Connects your script to Neptune application. 

In [3]:
import neptune

neptune.init(
    api_token="ANONYMOUS",
    project_qualified_name="shared/onboarding"
)

Project(shared/onboarding)

You tell Neptune: 

* **who you are**: your Neptune API token `api_token` 
* **where you want to send your data**: your Neptune project `project_qualified_name`.

---

**Note:** 


Instead of logging data to the public project 'shared/onboarding' as an anonymous user 'neptuner' you can log it to your own project.

To do that:

1. Get your Neptune API token

![image](https://neptune.ai/wp-content/uploads/get_token.gif)

2. Pass the token to ``api_token`` argument of ``neptune.init()`` method: ``api_token=YOUR_API_TOKEN``
3. Pass your username to the ``project_qualified_name`` argument of the ``neptune.init()`` method: ``project_qualified_name='YOUR_USERNAME/sandbox``. Keep `/sandbox` at the end, the `sandbox` project that was automatically created for you.

For example:

```python
neptune.init(project_qualified_name='funky_steve/sandbox', 
             api_token='eyJhcGlfYW908fsdf23f940jiri0bn3085gh03riv03irn',
            )
```

---

## Step 3: Create an experiment

In [4]:
neptune.create_experiment(name='great-idea')

NVMLError: NVML Shared Library Not Found - GPU usage metrics may not be reported.


https://ui.neptune.ai/shared/onboarding/e/ON-265


Experiment(ON-265)

This opens a new "experiment" namespace in Neptune to which you can log various objects.

Click on the link above to open this experiment in Neptune.

For now it is empty but keep the tab with experiment open to see what happens next. 

## Step 4: Add logging for metrics and losses

Since we are using Keras we'll create a Callback that logs metrics and losses after every epoch. 

In [5]:
class NeptuneMonitor(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        for metric_name, metric_value in logs.items():
            neptune.log_metric(metric_name, metric_value)

We need to pass it to the `callbacks` argument. 

In [6]:
model.fit(x_train, y_train,
          epochs=PARAMS['epoch_nr'],
          batch_size=PARAMS['batch_size'],
          callbacks=[NeptuneMonitor()])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f2ccc10c050>

## Step 5: See results live in the UI

Go to the `Logs` and `Charts` sections of the Neptune UI to see them. 

![image](https://neptune.ai/wp-content/uploads/logs_and_charts.gif)

Neptune automatically logs the hardware consumption during the experiment. 

You can see it in the `Monitoring` section of the Neptune UI. 

![image](https://neptune.ai/wp-content/uploads/monitoring.gif)

## Step 6: Stop the experiment

When running experiments in Notebooks you need to explicitly stop them to tell Neptune when to stop logging.

In [7]:
neptune.stop()