# MLOps using WandB In PyTorch

<img src="https://opencv.org/wp-content/uploads/2023/06/m01_04_MLops_with_wandb_pytorch_cover.png" width="80%" align="center">

In this notebook we will how to do MLOps using WandB. We will take an take a hands-on approach to for experiment tracking, dataset versioning and model checkpointing in WandB.

***What is MLOps?***

MLOps, or Machine Learning Operations, is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning (ML) lifecycle. It's essentially DevOps for machine learning and aims to unify ML system development and ML system operations.

MLOps seeks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. It aims to create a culture and environment where ML technologies can generate business benefits by improving deployment, testing, and maintenance processes of ML models.

---

The notebook is split into two parts:

1. First we will do a short tutorial to learn about the absolute basics of wandb.

2. Next, we will check how we can use for a project. As a usecase we will use code from *Linear Regression* unit and enhance it with WandB integration.

**Our focus will be on the essential concepts necessary to kickstart your journey with WandB.**

***What is WandB?***

Wandb, short for Weights and Biases, is a platform that provides a suite of tools for visualizing, tracking, and analyzing machine learning experiments. It is commonly used by researchers and developers to monitor and manage their machine learning models and experiments.

<a href="https://wandb.ai/site" target="_blank">Wandb</a> offers a range of features and functionalities that aid in the experimentation and iteration process. Here are some key aspects of Wandb:

1. **Experiment tracking:** Wandb allows users to log and track various metrics, such as loss, accuracy, and custom metrics, during the training process. These metrics are visualized in real-time on a web-based dashboard, making it easy to monitor model performance and compare different runs.

2. **Visualization and analysis:** The platform offers interactive visualizations to analyze training progress, compare experiments, and gain insights from the collected data. It provides tools for visualizing model architectures, learning curves, hyperparameter sweeps, and more.

3. **Collaboration and sharing:** Wandb enables collaboration among team members by allowing them to share experiments, results, and insights. It supports collaboration features such as project organization, experiment commenting, and sharing visualizations with colleagues.

4. **Reproducibility and versioning:** It facilitates reproducibility of experiments by automatically tracking code versions, system configurations, and dependencies. This helps ensure that experiments can be accurately replicated and compared over time.

5. **Integration with popular frameworks:** Wandb seamlessly integrates with popular machine learning frameworks such as TensorFlow, PyTorch, and Keras. It provides easy-to-use APIs and libraries for instrumenting experiments and logging metrics.

6. **Hyperparameter optimization:** Wandb includes functionality for hyperparameter optimization. Users can perform automated sweeps over different hyperparameter configurations and track the performance of models across various settings.


Overall, Wandb serves as a comprehensive platform for experiment management, tracking, visualization, and collaboration, making it easier for machine learning practitioners to organize and analyze their work.

The <a href="https://docs.wandb.ai/guides" target="_blank">documentation of WandB</a> is quite extensive and provides detailed coverage of all the different functionalities.

Uncomment and run the following code cell to install the required libraries.

In [None]:
# !pip install -qqqU wandb torchinfo tqdm

In [None]:
# Import necesary support libraries.
import os
import math
import random
import warnings
from urllib.request import urlretrieve
from IPython.display import clear_output

import numpy as np
import pandas as pd
import seaborn as sns
from tqdm import trange
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# Necessary PyTorch imports.
import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

from torchinfo import summary

warnings.filterwarnings(action='ignore', category=UserWarning)

# Text formatting
BOLD = "\033[1m"
END = "\033[0m"

We define the usual function for deteministic training.

In [None]:
def seed_everything(seed_value):
    np.random.seed(seed_value)
    torch.manual_seed(seed_value)
    torch.cuda.manual_seed_all(seed_value)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

## 1 Quick Hands-On tutorial

WandB is quite easy and straightforward to use.

In essence, there are only 4 small steps we need to do to use WandB.


1. Log in to your WandB account: `wandb.login()`
2. Create a project and initialize a run: `wandb.init(...)`
3. Start training and log run metrics: `wandb.log(...)`
4. Terminate the experiment run: `wandb.finish(...)`

**Step 1. Log in to your WandB account.**


The first time you log in from a machine, your account will be linked to the machine. To do so, you have to authorize it.

1. Click on the <a href="https://wandb.ai/authorize" target="_blank">https://wandb.ai/authorize</a> link below
2. Copy the API key displayed on the page.
3. Paste the copied key in the input box, and that's it.

Do ensure you have <a href="https://wandb.ai/site" target="_blank">created an account on WandB</a> beforehand.

In [None]:
import wandb

wandb.login(relogin=True)

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: C:\Users\vaibh/.netrc


True

**Step 2. Create a project and `initialize` an experiment `run`.**


In WandB, a **run** is the smallest unit of computation. A run is basically a tracker for a single experiment. We can create multiple runs in a project.


One way is to create a new project and simultaneously instantiate a WandB experiment ***run*** using the `wandb.init(...)` method.

You can also create a new project beforehand by using the UI. Go to your projects page: https://wandb.ai/{YOUR_USERNAME}/projects and click the **Create new project** button. On the next page, provide a project name, and that's it.


<img src="https://opencv.org/wp-content/uploads/2023/06/m01_04_wandb_create_project.png" width="80%">

In [None]:
# Start a new wandb project and initiate a experiment run.
run = wandb.init(

    # Set the wandb project where this run will be logged
    project="WandB_Test_First_Project",

    # Track hyperparameters and run metadata.
    config={
        "learning_rate": 0.02,
        "architecture": "CNN",
        "dataset": "CIFAR-100",
        "epochs": 10,
    },
)

RUN_ID = wandb.run.id
print(f"RUN ID: {BOLD}{RUN_ID}{END}")

[34m[1mwandb[0m: Currently logged in as: [33mopencv_courses[0m. Use [1m`wandb login --relogin`[0m to force relogin


RUN ID: [1mfdnrnah1[0m


Click on the 2nd last link in the output of the above code cell. It should take you to your **project** page. This is where all the project runs will be stored and accessible to you and your team. By default, a project is private. You can change its status to the public by going to the *Overview* tab.

We passed a `config` dictionary inside the `.init(...)` method. The key-value pairs passed are basically the various hyperparameters of your experiment. It's a good idea to record all your hyperparameters to ensure an experiment is reproducible later.

We can pass the config with `.init(...)` or set and pass them later.

For example:

```python
run = wandb.init(project="WandB Test First Project")

HPARAMS = run.config

HPARAMS["learning_rate"] = 0.02
HPARAMS["architecture"] = "CNN"
HPARAMS["dataset"] = "CIFAR-100"
HPARAMS["epochs"] = 10
```

**Q) What does wandb.init do to my training process?**<br>

> When `wandb.init()` is called from your training script an API call is made to create a run object on our servers. A new process is started to stream and collect metrics, thereby keeping all threads and logic out of your primary process. Your script runs normally and writes to local files, while the separate process streams them to our servers along with system metrics. You can always turn off streaming by running wandb off from your training directory, or setting the `WANDB_MODE` environment variable to offline.

You will find more information in the <a href="https://docs.wandb.ai/ref/python/init" target="_blank">wandb.init documentation page</a>.

**Step 3. Start training and `log` metrics to WandB.**

The only changes required to your existing code are how you set your experiment hyperparameters.

Additionally, we have to use the `wandb.log(...)` method to log and track the intermediate outputs of the experiment. You can log numbers, images, tables, etc.

You can log various types of objects as part of a run in WandB. You check them in the <a href="https://docs.wandb.ai/ref/python/log" target="_blank">documentation page</a>.

In [None]:
# Set variable according to the hyperparameters we logged in.

epochs = wandb.config["epochs"]  # run.config["epochs"]
print(epochs)

10


We have added the `%%wandb` magic line function in the code cell below. We can use this command to visualize the current run.

You can find more details regarding <a href="https://docs.wandb.ai/guides/track/jupyter" target="_blank">tracking jupyter notebooks here</a>.

In [None]:
%%wandb

# No changes in the training code.

for epoch in range(2, epochs):
    acc  = random.random()
    loss = random.random()

    # val_acc  = random.random()
    # val_loss = random.random()

    # ================================================================================
    # ========================-Logging acc and loss to wandb-=========================
    # ================================================================================
    # At the end of each epoch we will log the "acc" and "loss" achieved to WandB.
    # We can log objects at any point in loop, it's not limited to only per-epoch level.

    wandb.log({"acc": acc, "loss": loss})

**Step 4. Terminate run.**

The final step is to simply terminate the current run using `wandb.finish()` or `run.finish()` method.

Before terminating an experiment, we generally want to save some run outputs, such as model checkpoints, any YAML configuration file, or model outputs. We can easily save them by uploading files to WandB as part of the run output.

One way to save the file in the current local WandB run directory, `wandb.run.dir`. In case they are at some other location, you can use the <a href="https://docs.wandb.ai/guides/track/save-restore" target="_blank">wandb.save(...)</a> method.

All the uploaded files will be accessible in the run's <a href="https://wandb.ai/opencv_courses/WandB_Test_First_Project/runs/fdnrnah1/files" target="_blank">*Files*</a> tab. Scrolling down in the code cell output, you will see the "Files" tab on the left-hand side tabs.

In [None]:
# wandb.run.dir

In [None]:
# Creating a dummy text file and saving it to the current wandb run folder in your local.
# The file will be upload when .finish() method is executed.

with open(os.path.join(wandb.run.dir, "temp.txt"), "w") as file:
    file.write("This file will be uploaded.")

In [None]:
wandb.finish()

0,1
acc,▃▁▅▆█▇██
loss,██▇▃▄▁▃▂

0,1
acc,0.86599
loss,0.14979


In case you forgot change the configuration or upoad or delete file, you can also reinitialize an existing run by simply passing in the unique ID of that run

```python
PROJECT_NAME = ....
RUN_ID = "fdnrnah1" # The above run.

old_run = wandb.init(project=PROJECT_NAME, id=RUN_ID, resume="allow")
````

## 2 Linear Regression Integration with WandB

**In this section, we will use WandB for dataset and model versioning using WandB Artifacts.**

We will also integrate the above steps into the project.

You can access the <a href="https://wandb.ai/opencv_courses/WandB_Linear_Regression_Project?workspace=user-opencv_courses" target="_blank">WandB_Linear_Regression_Project page over here</a>.

In brief, ***Artifacts*** *are WandB's flexible and lightweight building block for dataset and model versioning like we use Git for code versioning.*

From the <a href="https://docs.wandb.ai/guides/artifacts" target="_blank">WandB Artifacts documentation</a>:

> By using *Artifacts* we can track datasets, models, dependencies, and results through each step of your machine learning pipeline. Artifacts make it easy to get a complete and auditable history of changes to your files. Artifacts can be thought of as a versioned directory. Artifacts are either an input of a run or an output of a run. Common artifacts include entire training sets and models. Store datasets directly into artifacts, or use artifact references to point to data in other systems like Amazon S3, GCP, or your own system.


Artifacts in WandB are a method to version and track datasets, models, and other files associated with machine learning experiments. They help in tracking the complete evolution of models, starting from the raw data, going through various stages of preprocessing, to the trained model, and even further to the model in production.

Artifacts are used to:

1. **Version Control**: This allows you to store versions of your datasets and models. For instance, if you make changes to a dataset or a model, you can save it as a new version, and later return to an older version if required.

2. **Pipeline Tracking**: You can track your entire ML pipeline, from data preparation to model training to deployment. For example, you can link a dataset artifact to a model artifact to know which data was used to train which model.

3. **Collaboration**: Team members can share and reuse artifacts. If someone has created a useful model or dataset, they can save it as an artifact, and others can download and use it.

4. **Reproducibility**: Because all the stages of the machine learning workflow are logged and versioned, experiments can be reproduced more easily.

In summary, WandB Artifacts are an essential feature for robust, reproducible, and collaborative machine learning workflows.



To keep things simple and quickly explain how we can use Artifacts in our projects, we've split the original Auto-MPG dataset into three CSV files:

1. Auto-MPG_test_dataset.csv
2. Auto-MPG_train_dataset_1.csv
3. Auto-MPG_train_dataset_2.csv

The `Auto-MPG_train_dataset_1.csv` is simply a subset of the `Auto-MPG_train_dataset_2.csv`

---


Here's a graph of what we are planning to do in this section:

 <img src="https://opencv.org/wp-content/uploads/2023/06/m1_04_artifacts_lineage_graph.png" width="90%">


We will perform the following runs:

1. In **run 1**: Upload current dataset as an Artifact.
    - In this we will create an `train_dataset` and `test_dataset` Artifact using the "Auto-MPG_train_dataset_1.csv" and "Auto-MPG_test_dataset.csv" files. We will upload and version both of them.

2. In **run 2**:
    - We will download and use the latest version of the `train_dataset:latest` and `test_dataset:latest` available in the project.
    - We will run our experiments using this dataset.
    - We will also create a new `Checkpoint` Artifact to track and version the *model checkpoint file* between different experiment runs.
    
3. In **run 3**:
    - We will create a new version of the `train_dataset` Artifact using the "Auto-MPG_train_dataset_2.csv" file.
    - We can either create a new model and train from scratch or use the `Checkpoint` Artifact created in the previous run to train on the new `train_dataset`. We will do the former in this run and train a new model.
    - We will use the same `test_dataset` version for training the model.
    - We will also create a new `Checkpoint` Artifact version and log the new model checkpoints.

### 2.1 Run 1 - Dataset as Artifacts

In [None]:
USER_NAME = "opencv_courses" # Name of the user creating/accessing a project or run.
PROJECT_NAME = "WandB_Linear_Regression_Project" # Give the project a name.

We are creating a new project.

In [None]:
import wandb

wandb.login()

run = wandb.init(
    project=PROJECT_NAME,
    entity=USER_NAME,     # Can be skipped as it will automatically pick it up using your login ID.
)

In [None]:
# Download test set
URL_1 = "https://www.dropbox.com/s/piolxl5z3996dyx/Auto-MPG_test_dataset.csv?dl=1"
SAVE_PATH_1 = os.path.join(os.getcwd(), "Auto-MPG_test_dataset.csv")

# Download train_1 set
URL_2 = "https://www.dropbox.com/s/zg94q6yy7v4hbh8/Auto-MPG_train_dataset_1.csv?dl=1"
SAVE_PATH_2 = os.path.join(os.getcwd(), "Auto-MPG_train_dataset.csv")

for url, save_path in [(URL_1, SAVE_PATH_1), (URL_2, SAVE_PATH_2)]:
    urlretrieve(url, save_path)

**Log test set CSV file as an artifact. This way, the file will be logged and versioned, and a record of all the changes will be kept.**

Continuous running of the next cell won't create or upload the dataset as WandB tracks and uploads only the changes made to the original file or directory in case a directory of images was uploaded.

In [None]:
# Provide a name and type for the Artifact.
artifact = wandb.Artifact("test_dataset", type="dataset")

artifact.add_file(local_path=SAVE_PATH_1)
run.log_artifact(artifact)

<wandb.sdk.artifacts.local_artifact.Artifact at 0x210cbd5da50>

**Log the initial training set CSV file as well.**

In [None]:
artifact = wandb.Artifact("train_dataset", type="dataset")

artifact.add_file(local_path=SAVE_PATH_2)
run.log_artifact(artifact, aliases=["latest", "set_1"]) # You can provide you own alias for later use.

<wandb.sdk.artifacts.local_artifact.Artifact at 0x210cbe2dea0>

Once uploaded, refresh the project page and check the *Artifact* tab. It should look like this:

<img src="https://opencv.org/wp-content/uploads/2023/06/m01_04_initial_artifacts_logged.png" width="25%">

In [None]:
# Terminate run.
run.finish()

VBox(children=(Label(value='0.013 MB of 0.013 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

### 2.2 Training Helper Functions

The following code cells are borrowed from the *Linear Regression Notebook*. We have compressed the code down to its most significant parts.

We are defining some of the helper functions, such as:

1. `preprocess_dataset(...)`: It takes in the train and test set CSV path. Performs the necessary preprocessing. Splits train set into training and validation set. Returns preprocessed `Horsepower` and `Displacement` columns from the train, val, and test set as a 2D tensor.

2. `Regressor` class: To create a linear regression model with 2 hidden layers.

3. `train_one_epoch(...)`, and `evaluate(...)` functions are the same as in the `Linear Regression` notebook.

4. `main(...)`: The function performs the following tasks
    - Watch the model using the <a href="https://docs.wandb.ai/ref/python/watch" target="_blank">wandb.watch(...)</a> method.
    - Performs the training loop.
    - Save the model and optimizer based on the validation loss.
    - Logs per-epoch level metric to WandB.
    - Calculate loss on the test set using the best-saved model.
    - Logs the best validation loss and test loss as summary metrics of the run.
        - In the UI, summary values appear in the run table to compare single values across runs. Summary values be set directly with `wandb.run.summary["key"] = value`.

#### 2.2.1 Dataset Helper Function

We have composed all the different dataset preparation steps used in the Linear regression notebook in these two functions.

In [None]:
def convert_to_tensor(dataframe):
    shape = dataframe.shape
    return torch.from_numpy(dataframe.values).reshape(-1, 1 if len(shape)==1 else shape[1]).to(torch.float32)


def preprocess_dataset(train_csv_path, test_csv_path, val_split_pct=0.3):

    # Read CSV Files
    train_dataset = pd.read_csv(train_csv_path, header=0)
    test_dataset  = pd.read_csv(test_csv_path,  header=0)

    # Create data and target.
    X_train = train_dataset.copy()
    X_test  = test_dataset.copy()

    # Separate target values from features.
    y_train = X_train.pop("MPG")
    y_test  = X_test.pop("MPG")

    # Calculate mean and standard deviation for Horsepower.
    mean_hp = np.mean(X_train['Horsepower'])
    std_hp  = np.std(X_train['Horsepower'])

    # Scale Horsepower feature
    X_train["Horsepower_scaled"] = (X_train["Horsepower"] - mean_hp) / std_hp
    X_test["Horsepower_scaled"]  = (X_test["Horsepower"] - mean_hp)  / std_hp

    # Calculate mean and standard deviation for Displacement.
    mean_dis = np.mean(X_train["Displacement"])
    std_dis  = np.std(X_train["Displacement"])

    # Scale Displacement feature
    X_train["Displacement_scaled"] = (X_train["Displacement"] - mean_dis) / std_dis
    X_test["Displacement_scaled"]  = (X_test["Displacement"] - mean_dis)  / std_dis

    # Split train set into
    X_train_split, X_val_split, y_train_split, y_val_split = train_test_split(X_train, y_train,
                                                                              test_size=val_split_pct,
                                                                              random_state=0)

    X_train_hp_dp = convert_to_tensor(X_train_split[['Horsepower_scaled', 'Displacement_scaled']])
    y_train       = convert_to_tensor(y_train_split)

    X_val_hp_dp = convert_to_tensor(X_val_split[['Horsepower_scaled', 'Displacement_scaled']])
    y_val       = convert_to_tensor(y_val_split)

    # Prepare test set.
    X_test_hp_dp = convert_to_tensor(X_test[['Horsepower_scaled', 'Displacement_scaled']])
    y_test       = convert_to_tensor(y_test)

    return {
        "X_train": X_train_hp_dp, "y_train": y_train,
        "X_val":   X_val_hp_dp,   "y_val":   y_val,
        "X_test":  X_test_hp_dp,  "y_test":  y_test,
    }

#### 2.2.2  Model Creation Class

In [None]:
class Regressor(nn.Module):

    # Initialize the parameter
    def __init__(self, in_features=2, intermediate=10, out_features=1):
        super().__init__()

        self.linear_1 = nn.Linear(in_features=in_features,  out_features=intermediate)
        self.linear_2 = nn.Linear(in_features=intermediate, out_features=intermediate)
        self.linear_3 = nn.Linear(in_features=intermediate, out_features=out_features)

    # Forward pass
    def forward(self, x):
        # First Linear layer --=> ReLU activation
        pred = F.relu(self.linear_1(x))

        # Second Linear layer --=> ReLU activation
        pred = F.relu(self.linear_2(pred))

        # Third Linear layer.
        pred = self.linear_3(pred)
        return pred

#### 2.2.3  Training & Evaluation Helper Functions

In [None]:
def train_one_epoch(model: torch.nn.Module, optimizer: torch.optim.Optimizer, loss_fn: torch.nn.Module, dataset: tuple):
    data, target = dataset
    model.train() # Set model in training mode.

    outputs = model(data)           # Perform forward pass through the model.
    loss = loss_fn(outputs, target) # Calculate L1 loss on the model predictions.
    optimizer.zero_grad()          # Reset gradients.
    loss.backward()                # Calcualte gradients based on the loss.
    optimizer.step()               # Update parameters.

    return loss.detach().item()

def evaluate(model: torch.nn.Module, loss_fn: torch.nn.Module, dataset: tuple):
    data, target = dataset
    model.eval() # Set model in evaluation mode.

    with torch.no_grad():
        outputs = model(data) # Perform forward pass through the validation set.

    loss = loss_fn(outputs, target) # Calculate the loss on the validation set.
    return loss.item()

#### 2.2.4  Main Function For Training

In [None]:
def main(model, optimizer, loss_fn, train_set, val_set, test_set, total_epochs=500, ckpt_path=".", log_graph=False):

   # watch model and log parameters, gradients and graph
    wandb.watch(model, criterion=loss_fn, log_freq=10, log_graph=log_graph)

    train_loss_record = []
    val_loss_record   = []

    X_train, y_train = train_set
    X_val,   y_val   = val_set
    X_test,  y_test  = test_set

    # Track best validation loss.
    best_valid_loss = float("inf")

    for epoch in trange(total_epochs):

        # Perform one epoch of training and then evaluate on the validation set.
        train_loss = train_one_epoch(model, optimizer, loss_fn, train_set)
        val_loss   = evaluate(model, loss_fn, val_set)

        # Record training and validation loss
        train_loss_record.append(train_loss)
        val_loss_record.append(val_loss)

        # Save optimizer and model state_dict if validation loss improves.
        if best_valid_loss > val_loss:
            best_valid_loss = val_loss
            torch.save({"model": model.state_dict(), "opt": optimizer.state_dict()}, ckpt_path)

        # Log run metrics.
        # We also log in the epoch so it can be used as the X-axis in the run charts.
        wandb.log({
            "epoch": epoch,
            "loss": train_loss,
            "val_loss": val_loss,
        })

    # Reload best model
    model.load_state_dict(torch.load(ckpt_path, map_location="cpu")["model"])

    # Calculate performance on the test set.
    test_loss = evaluate(model, loss_fn, test_set)

    # Log run summary metrics.
    wandb.run.summary["best_valid_loss"] = best_valid_loss
    wandb.run.summary["test_loss"] = test_loss

    print(f"\n\nBest Validation Loss: {best_valid_loss:0.4f}")
    print(f"Test Set Loss: {test_loss:0.7f}")

    return train_loss_record, val_loss_record

### 2.3 Run 2 -  First Experiment

In [None]:
# Create new run.
import wandb

run = wandb.init(project=PROJECT_NAME, entity=USER_NAME)

Log hyperparameters.

In [None]:
HPARAMS = run.config

HPARAMS["INTEMEDIATE_NODES"] = 32
HPARAMS["NUM_EPOCHS"]        = 500
HPARAMS["LEARNING_RATE"]     = 0.01
HPARAMS["LOSS_FN"]           = "l1_loss" # mse_loss
HPARAMS["VAL_PCT"]           = 0.3
HPARAMS["OUTPUT_NODES"]      = 1

**Downloading and using the latest train dataset and test dataset version.**

**Note 1:** Even though it may seem redundant, we are executing the following code cells to highlight that it may be helpful in cases you or your teammate don't have the dataset available locally. So, executing the following cells allows us to download any version of the dataset artifact from WandB and use it for our experiments.

**Note 2:** If it throws an error, try waiting 20-30 secs and trying again. The artifact is probably still being created.

In [None]:
train_csv_artifact = run.use_artifact(f"{USER_NAME}/{PROJECT_NAME}/train_dataset:set_1") # "train_dataset:latest"
download_dir       = train_csv_artifact.download(root="dataset_dir")

[34m[1mwandb[0m:   1 of 1 files downloaded.  


You can also do the following:

```python
artifact = run.use_artifact("train_dataset:latest")
path = artifact.get_path("Auto-MPG_train_dataset.csv")
path.download()
```

This will download the artifact at the following path: `'.\\artifacts\\train_dataset-v0\\Auto-MPG_train_dataset.csv'`

In [None]:
test_csv_artifact = run.use_artifact(f"{USER_NAME}/{PROJECT_NAME}/test_dataset:latest")
download_dir      = test_csv_artifact.download(root="dataset_dir")

[34m[1mwandb[0m:   1 of 1 files downloaded.  


In [None]:
os.listdir(download_dir)

['Auto-MPG_test_dataset.csv', 'Auto-MPG_train_dataset.csv']

In [None]:
train_csv_path = os.path.join(download_dir, "Auto-MPG_train_dataset.csv")
test_csv_path  = os.path.join(download_dir, "Auto-MPG_test_dataset.csv")

Get train, validation and test dataset tensors.

In [None]:
dataset_dict = preprocess_dataset(train_csv_path, test_csv_path, val_split_pct=HPARAMS["VAL_PCT"])

X_train = dataset_dict["X_train"]
y_train = dataset_dict["y_train"]

X_val = dataset_dict["X_val"]
y_val = dataset_dict["y_val"]

X_test = dataset_dict["X_test"]
y_test = dataset_dict["y_test"]

print(f"X_train: {X_train.shape}, y_train: {y_train.shape}")

print(f"X_val: {X_val.shape}, y_val: {y_val.shape}")

print(f"X_test: {X_test.shape}, y_test: {y_test.shape}")

X_train: torch.Size([175, 2]), y_train: torch.Size([175, 1])
X_val:   torch.Size([76, 2]),  y_val:   torch.Size([76, 1])
X_test:  torch.Size([78, 2]),  y_test:  torch.Size([78, 1])


Log dataset related configurations and the number of input features.

In [None]:
HPARAMS["TRAIN_SIZE"] = X_train.shape
HPARAMS["VAL_SIZE"]   = X_val.shape
HPARAMS["TEST_SIZE"]  = X_test.shape
HPARAMS["NUM_IN_FEATURES"] = X_train.shape[1]

In [None]:
# Initialize Model

seed_everything(41)

model = Regressor(
    in_features=HPARAMS["NUM_IN_FEATURES"],
    intermediate=HPARAMS["INTEMEDIATE_NODES"],
    out_features=HPARAMS["OUTPUT_NODES"],
)

# Print model summary.
batch_size = 1
summary(model, input_size=(batch_size, 2,), device="cpu", col_names=("input_size", "output_size", "num_params"))

Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
Regressor                                [1, 2]                    [1, 1]                    --
├─Linear: 1-1                            [1, 2]                    [1, 32]                   96
├─Linear: 1-2                            [1, 32]                   [1, 32]                   1,056
├─Linear: 1-3                            [1, 32]                   [1, 1]                    33
Total params: 1,185
Trainable params: 1,185
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.01

In [None]:
# Intialize optimizer
optimizer = optim.Adam(model.parameters(), lr=HPARAMS["LEARNING_RATE"])

# Initialize loss function based on the configuration.
if HPARAMS["LOSS_FN"] == "l1_loss":
    criterion = nn.L1Loss()
else:
    criterion = nn.MSELoss()

**Start training.**

In [None]:
%%wandb

CKPT_DIR = os.getcwd()  # wandb.run.dir
CKPT_PATH = os.path.join(CKPT_DIR, "ckpt.tar")

train_loss_record_1, val_loss_record_1 = main(model,
                                             optimizer,
                                             criterion,
                                             train_set=(X_train, y_train),
                                             val_set=(X_val, y_val),
                                             test_set=(X_test, y_test),
                                             total_epochs=HPARAMS["NUM_EPOCHS"],
                                             ckpt_path=CKPT_PATH,
                                             log_graph=True,
                                            )

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`
100%|███████████████████████████████████████████████████████████████████████| 500/500 [00:01<00:00, 357.67it/s]



Best Validation Loss: 2.8049
Test Set Loss: 2.8156927





**Next, we will log the trained checkpoint file as a `Checkpoint` Artifact, which can be used in later runs.**

In [None]:
ckpt_artifact = wandb.Artifact("Checkpoint", type="Trained_Checkpoint")
ckpt_artifact.add_file(local_path=CKPT_PATH)

run.log_artifact(ckpt_artifact)

<wandb.sdk.artifacts.local_artifact.Artifact at 0x210cc1e82b0>

We can also save the checkpoint file as a run output by:

```python
wandb.save(CKPT_PATH)
```

In [None]:
# Terminate run.
run.finish()

0,1
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
loss,█▅▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val_loss,█▅▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_valid_loss,2.8049
epoch,499.0
loss,2.36688
test_loss,2.81569
val_loss,2.86674


### 2.4 Run 3 - Second Experiment

In [None]:
# Create a new run in the project.

import wandb

run = wandb.init(project=PROJECT_NAME, entity=USER_NAME)

Log hyperparameters. We have kept them the same as the previous run.

In [None]:
HPARAMS = run.config

HPARAMS["INTEMEDIATE_NODES"] = 32
HPARAMS["NUM_EPOCHS"]        = 500
HPARAMS["LEARNING_RATE"]     = 0.01
HPARAMS["LOSS_FN"]           = "l1_loss" # mse_loss
HPARAMS["VAL_PCT"]           = 0.3
HPARAMS["OUTPUT_NODES"]      = 1

In [None]:
# Download the updated version training set for use-case.

URL_3 = r"https://www.dropbox.com/s/3bzhz07g1gaf58n/Auto-MPG_train_dataset_2.csv?dl=1"
SAVE_PATH_3 = os.path.join(os.getcwd(), "Auto-MPG_train_dataset.csv")
urlretrieve(URL_3, SAVE_PATH_3);

**Suppose our training set has been updated and now we have some new instances.**

We will log this new version of the training set. The procedure remains the same.

In [None]:
# Log and version the new training set.

artifact = wandb.Artifact("train_dataset", type="dataset")

artifact.add_file(local_path=SAVE_PATH_3)
run.log_artifact(artifact) # You can provide you own alias for later use.

<wandb.sdk.artifacts.local_artifact.Artifact at 0x210cd50e3b0>

<img src="https://opencv.org/wp-content/uploads/2023/06/m01_04_artifact_second_update.png" width="25%">

In [None]:
# Select the latest versions of the training set test set.

train_csv_artifact = run.use_artifact(f"{USER_NAME}/{PROJECT_NAME}/train_dataset:latest")
test_csv_artifact  = run.use_artifact(f"{USER_NAME}/{PROJECT_NAME}/test_dataset:latest")

You can either start using the artifact or download the new version of the artifact in a folder. Here, we are downloading the new dataset version in a new folder.

In [None]:
new_download_dir = train_csv_artifact.download(root="new_dataset_dir")
new_download_dir = test_csv_artifact.download(root="new_dataset_dir")

[34m[1mwandb[0m:   1 of 1 files downloaded.  
[34m[1mwandb[0m:   1 of 1 files downloaded.  


In [None]:
os.listdir(new_download_dir)

['Auto-MPG_test_dataset.csv', 'Auto-MPG_train_dataset.csv']

Prepare new train, val and test dataset tensors.

In [None]:
train_csv_path = os.path.join(new_download_dir, "Auto-MPG_train_dataset.csv")
test_csv_path  = os.path.join(new_download_dir, "Auto-MPG_test_dataset.csv")

dataset_dict = preprocess_dataset(train_csv_path, test_csv_path, val_split_pct=HPARAMS["VAL_PCT"])

X_train = dataset_dict["X_train"]
y_train = dataset_dict["y_train"]

X_val = dataset_dict["X_val"]
y_val = dataset_dict["y_val"]

X_test = dataset_dict["X_test"]
y_test = dataset_dict["y_test"]

print(f"X_train: {X_train.shape}, y_train: {y_train.shape}")
print(f"X_val:   {X_val.shape},  y_val:   {y_val.shape}")
print(f"X_test:  {X_test.shape},  y_test:  {y_test.shape}")

X_train: torch.Size([219, 2]), y_train: torch.Size([219, 1])
X_val:   torch.Size([95, 2]),  y_val:   torch.Size([95, 1])
X_test:  torch.Size([78, 2]),  y_test:  torch.Size([78, 1])


Set and log the required hyperparameters.

In [None]:
HPARAMS["TRAIN_SIZE"] = X_train.shape
HPARAMS["VAL_SIZE"]   = X_val.shape
HPARAMS["TEST_SIZE"]  = X_test.shape

HPARAMS["NUM_IN_FEATURES"] = X_train.shape[1]

In [None]:
# Initialize Model

seed_everything(41)

model = Regressor(
    in_features=HPARAMS["NUM_IN_FEATURES"],
    intermediate=HPARAMS["INTEMEDIATE_NODES"],
    out_features=HPARAMS["OUTPUT_NODES"],
)

# Print model summary.
batch_size = 1
summary(model, input_size=(batch_size, 2,), device="cpu", col_names=("input_size", "output_size", "num_params"))

Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
Regressor                                [1, 2]                    [1, 1]                    --
├─Linear: 1-1                            [1, 2]                    [1, 32]                   96
├─Linear: 1-2                            [1, 32]                   [1, 32]                   1,056
├─Linear: 1-3                            [1, 32]                   [1, 1]                    33
Total params: 1,185
Trainable params: 1,185
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.01

In [None]:
# Intialize optimizer.
optimizer = optim.Adam(model.parameters(), lr=HPARAMS["LEARNING_RATE"])

# Initialize loss function.
if HPARAMS["LOSS_FN"] == "l1_loss":
    criterion = nn.L1Loss()
else:
    criterion = nn.MSELoss()

In [None]:
%%wandb

# As we are versioning the checkpoints, we can keep the model checkpoint file name same,
# and give custom aliases to each version.

CKPT_DIR = os.getcwd()  # wandb.run.dir
CKPT_PATH = os.path.join(CKPT_DIR, "ckpt.tar")

train_loss_record_2, val_loss_record_2 = main(model,
                                             optimizer,
                                             criterion,
                                             train_set=(X_train, y_train),
                                             val_set=(X_val, y_val),
                                             test_set=(X_test, y_test),
                                             total_epochs=HPARAMS["NUM_EPOCHS"],
                                             ckpt_path=CKPT_PATH,
                                             log_graph=True,
                                            )

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`
100%|███████████████████████████████████████████████████████████████████████| 500/500 [00:01<00:00, 369.71it/s]



Best Validation Loss: 3.1608
Test Set Loss: 2.7234669





In [None]:
ckpt_artifact = wandb.Artifact("Checkpoint", type="Trained_Checkpoint")
ckpt_artifact.add_file(local_path=CKPT_PATH)

run.log_artifact(ckpt_artifact, aliases=["latest", "train_dataset_v1"])

<wandb.sdk.artifacts.local_artifact.Artifact at 0x210cd5176a0>

In [None]:
run.finish()

VBox(children=(Label(value='0.032 MB of 0.058 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.547087…

0,1
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
loss,█▅▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val_loss,█▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_valid_loss,3.16077
epoch,499.0
loss,2.35153
test_loss,2.72347
val_loss,3.1797


At the end, you should have the following artifacts:

<img src="https://opencv.org/wp-content/uploads/2023/06/m01_04_artifact_third_update.png" width="25%">

You can view your runs in the project page. Organize the columns as you wish, add notes or tags, customize the charts as you wish.

There are other tools such as WandB as well. To name a few:

1. <a href="https://clear.ml/" target="_blank">ClearML</a>
2. <a href="https://www.comet.com/site/" target="_blank">Comet</a>
3. <a href="https://mlflow.org/" target="_blank">MLflow</a>

You may use any of them as long as they do a assist you in maintaing and performing experiments easily.

## 3 Conclusion

Throughout this notebook, we have explored Weights & Biases (WandB), an important tool in the machine learning ecosystem, and we delved into how to use it effectively to streamline our machine learning projects.

We started by understanding the fundamentals of WandB, its purpose, and benefits. WandB provides an efficient and user-friendly interface for experiment tracking and collaboration in machine learning projects, making it a must-have in any deep learning engineer's toolkit.

We then dove into how WandB can be utilized for experiment tracking. We saw how it provides an organized and visually appealing platform to log and visualize experiments, and results. This leads to better transparency, understanding, and debugging capabilities throughout the course of our machine learning project lifecycle.

Further, we discovered how WandB Artifacts help in versioning control and tracking. We learned the process of creating versions of datasets and models using Artifacts, providing a systematic approach to handle, track, and manage different versions of our data and models. This functionality helps prevent any confusion or errors that could stem from working with multiple or outdated versions.

Lastly, we applied Artifacts in a practical project scenario, highlighting how they can be used. The use of Artifacts further reinforced the collaboration, reproducibility, and transparency that WandB promotes in machine learning workflows.

In conclusion, WandB is a powerful tool for managing machine learning projects. Its capabilities such as experiment tracking and versioning through Artifacts significantly enhance the efficiency and productivity of machine learning tasks. By incorporating WandB into our workflow, we can more effectively develop, track, and maintain high-quality machine learning models, fostering a seamless transition from development to deployment.

Happy coding and experimenting!

## References

You can learn more about the various functionalities WandB offers from the following:

1. <a href="https://docs.wandb.ai/guides" target="_blank">WandB Guides</a>   
2. <a href="https://docs.wandb.ai/ref/python/" target="_blank">WandB Python Reference</a>
3. <a href="https://docs.wandb.ai/tutorials" target="_blank">WandB Tutorials</a>