<a href="https://colab.research.google.com/github/mlop-ai/mlop/blob/main/examples/intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">
<img src="https://github.com/mlop-ai/mlop/raw/refs/heads/main/docs/static/img/banner.svg" width="320" alt="Logo" />
</div>

<div class="markdown-google-sans">
  <h1><strong>Welcome</strong></h1>
</div>
This is an interactive notebook to help you get started with the basic capabilities of the logger. Buckle up!

## Prerequisites

Install the OSS [MLOP's Python SDK](https://github.com/mlop-ai/mlop), then proceed with either **Option 1** or **Option 2** (if you have been told to use public credentials).

In [None]:
%pip install "mlop[dev]"
# %pip install "mlop[dev] @ git+https://github.com/mlop-ai/mlop.git"
# import sys; import os; sys.path.insert(0, os.path.dirname(os.path.abspath(os.path.dirname("__file__"))))
import mlop

### Option 1: Using your own account

1. Register an account at [MLOP's website](https://demo.mlop.ai).

2. Create a token in [Settings -> Developers -> Create API Key](https://demo.mlop.ai/api-keys).

In [None]:
mlop.login()

# mlop.logout()  # you may logout at any time by directly calling this function

### Option 2: Using temporary public credentials

Note: you will not be able to view the runs unless you have been explicitly given login credentials.

In [None]:
# @title Only use this if you have been given credentials. { display-mode: "form" }

settings = mlop.Settings()
settings.auth = "mlpi_public_use_only_"  # @param {type:"string"}
mlop.login(settings=settings)

## Simulate and track an ML experiment with MLOP

Create, track, and visualize a ML experiment:

0. Set up the hyperparameters to be tracked

1. Initialize and name the new MLOP run

2. Log metrics such as the accuruacy and loss within the training loop

3. Gracefully exit the run if everything completed successfully

In [None]:
import random

# 0. set up hyperparameters
config = {
    "learning_rate": 0.02,
    "architecture": "CNN",
    "dataset": "CIFAR-100",
    "epochs": 10,
}

# 1. initialize a run
op = mlop.init(
    project="example",
    name="simulation-standard", # will be auto-generated if left unspecified
    config=config,
)

# simulated training loop
epochs = 10
offset = random.random() / 5
for epoch in range(1, epochs+1):
    acc = 1 - 2**-epoch - random.random() / epoch - offset
    loss = 2**-epoch + random.random() / epoch + offset

    # 2Ô∏è. record metrics from the script to MLOP
    op.log({"acc": acc, "loss": loss})
    print(f"Epoch {epoch}/{epochs}")

# 3. mark the run as finished
op.finish()

## Simulate an ultra-high throughput ML experiment with MLOP

Compared to traditional experiment loggers, MLOP also by default uses a high-throughput mode.

This means there is little software limit in terms of how many data points you can log at a given time.

Other experiment trackers often easily hit a rate limit due to their inherent architecture. MLOP does not.

In [None]:
import random
import time

config = {
    "epochs": 100_000,
    "metrics": 20,
    "wait": 0.01
}

settings = mlop.Settings()
settings.meta = []

op = mlop.init(
    project="example",
    name="simulation-fast",
    config=config,
    settings=settings
)

start = time.time()
for i in range(config['epochs']):
    dummy_data = {f"val/metric-{i}": random.random() for i in range(config['metrics'])}
    op.log(dummy_data)

    if i % 10_000 == 0:
        print(f"Epoch {i + 1}/{config['epochs']}, sleeping {config['wait']}s")
        time.sleep(config['wait'])

print(f"Logged {int(config['epochs']*config['metrics'])} points in {time.time() - start:.2f}s")
op.finish()

Now that we know how to integrate MLOP into a simulated ML training loop, let's track an actual ML experiment using a basic PyTorch neural network.

##  Track an ML experiment with PyTorch

The following code cell defines and trains a simple MNIST classifier. During training, you will see MLOP prints out URLs. Click on the project page link to see your results stream in live to a MLOP project.

MLOP automatically logs metrics, console output (both `stdout` and `stderr`), system information (optional), system resource usage (`cpu`, `gpu`, `memory`, `disk`, `network`, `processes`), as well as  hyperparameters (specified in `config`).

You will be able to see an interactive graph with model inputs and outputs.

### Set up PyTorch DataLoader
The following cell defines some useful functions that we will need to train our ML model (these are not unique to the experiment tracker itself). See the PyTorch documentation for more information on how to define [forward and backward training loops](https://pytorch.org/tutorials/beginner/nn_tutorial.html), how to use [PyTorch DataLoaders](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) to load data in for training, and how to [define PyTorch models](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) using `torch.nn.Sequential`.

In [None]:
import numpy
import torch, torchvision
import torch.nn as nn
from torchvision.datasets import MNIST
import torchvision.transforms as T

MNIST.mirrors = [
    mirror for mirror in MNIST.mirrors if "http://yann.lecun.com/" not in mirror
]

device = "cuda:0" if torch.cuda.is_available() else "cpu"


def get_dataloader(is_train, batch_size, slice=5):
    "Get a training dataloader"
    full_dataset = MNIST(
        root=".", train=is_train, transform=T.ToTensor(), download=True
    )
    sub_dataset = torch.utils.data.Subset(
        full_dataset, indices=range(0, len(full_dataset), slice)
    )
    loader = torch.utils.data.DataLoader(
        dataset=sub_dataset,
        batch_size=batch_size,
        shuffle=True if is_train else False,
        pin_memory=True,
        num_workers=2,
    )
    return loader


def get_model(dropout):
    "A simple model"
    model = nn.Sequential(
        nn.Flatten(),
        nn.Linear(28 * 28, 256),
        nn.BatchNorm1d(256),
        nn.ReLU(),
        nn.Dropout(dropout),
        nn.Linear(256, 10),
    ).to(device)
    return model


def validate_model(model, valid_dl, loss_func, log_images=False, batch_idx=0):
    "Compute performance of the model on the validation dataset and log a mlop.Table"
    model.eval()
    val_loss = 0.0
    with torch.inference_mode():
        correct = 0
        for i, (images, labels) in enumerate(valid_dl):
            images, labels = images.to(device), labels.to(device)

            # forward pass
            outputs = model(images)
            val_loss += loss_func(outputs, labels) * labels.size(0)

            # compute accuracy and accumulate
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()

            # log one batch of images to the dashboard
            if i == batch_idx and log_images:
                j = 0
                for img, pred, target in zip(
                    images.to("cpu"), predicted.to("cpu"), labels.to("cpu")
                ):
                    pass # op.log({"image": mlop.Image(data = img[0].numpy() * 255, caption=f"{j}_{i}")})
                    j += 1

    return val_loss / len(valid_dl.dataset), correct / len(valid_dl.dataset)

### Train your model

The following code trains and saves model checkpoints to your project. Use model checkpoints like you normally would to assess how the model performed during training.

In [None]:
import math

config = {
    "epochs": 120,  # 5
    "batch_size": 128,
    "lr": 1e-3,
    "dropout": random.uniform(0.01, 0.80),
}

op = mlop.init(project="example", name="pytorch-mnist-cnn", config=config)

train_dl = get_dataloader(is_train=True, batch_size=config["batch_size"])
valid_dl = get_dataloader(is_train=False, batch_size=2 * config["batch_size"])
n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config["batch_size"])

# simple MLP model
model = get_model(config["dropout"])

# loss and optimizer
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])

# training loop
example_ct = 0
step_ct = 0
for epoch in range(config["epochs"]):
    model.train()
    for step, (images, labels) in enumerate(train_dl):
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        train_loss = loss_func(outputs, labels)
        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

        example_ct += len(images)
        metrics = {
            "train/train_loss": float(train_loss),
            "train/epoch": (step + 1 + (n_steps_per_epoch * epoch)) / n_steps_per_epoch,
            "train/example_ct": example_ct,
        }

        if step + 1 < n_steps_per_epoch:
            op.log(metrics)

        step_ct += 1

    val_loss, accuracy = validate_model(
        model, valid_dl, loss_func, log_images=(epoch == (config["epochs"] - 1))
    )

    val_metrics = {"val/val_loss": float(val_loss), "val/val_accuracy": accuracy}
    op.log({**metrics, **val_metrics})

    torch.save(model, "my_model.pt")
    # op.log({
    #     "model/mnist": mlop.File(path="./my_model.pt", name=f"epoch-{epoch+1}_dropout-{round(config['dropout'], 4)}")
    # })

    print(
        f"Epoch: {epoch + 1}, Train Loss: {train_loss:.3f}, Validation Loss: {val_loss:3f}, Accuracy: {accuracy:.2f}"
    )

op.finish()