# Evaluating and Debugging Generative AI

## Introduction

During the iterative workflow of build a Machine Learning model, many time we come across with the fact thas a previous iteration was actually better and we will like to return to the previous assemble of hyperparameters and training set. Managining a proper and clean way to work around this can be challenging even for a small team of developers. In order to properly do this, it is important to be rigurous in the ways that this is managed.

This course spins around the tools from **Weights & Biases**

## Instrument W&B

First, not mentioned in the course, *W&B* offers a bunch of levels on their service. The most important one here, the personal development level is free. You can also build your own server of *W&B* or execute the code anonymously without an account

In [None]:
# %pip install wandb
import wandb

# 1. Organize your hyperparameters
config = {'learning_rate': 0.001}

# 2. Start wandb run
wandb.init(project='gpt5, config=config)

#--- Model training ---#

# 3. Log metrics over time to visualize performance
wandb.log({'loss': loss})

# 4. When working in a notebook, finish wandb
wandb.finish()


The idea here is to implement *Wandb* directly inside the functions used to train me model in order to automatically log every step made in the model

### Example

The example is to classify 4 sprites (small pixelart images) using neural-networks

In [None]:
INPUT_SIZE = 3 * 16 * 16
OUTPUT_SIZE = 5
HIDDEN_SIZE = 256
NUM_WORKERS = 2
CLASSES = ["hero", "non-hero", "food", "spell", "side-facing"]
DATA_DIR = Path('./data/')
DEVICE = torch.device("cuda" if torch.cuda.is_available()  else "cpu")

def get_model(dropout):
    "Simple MLP with Dropout"
    return nn.Sequential(
        nn.Flatten(),
        nn.Linear(INPUT_SIZE, HIDDEN_SIZE),
        nn.BatchNorm1d(HIDDEN_SIZE),
        nn.ReLU(),
        nn.Dropout(dropout),
        nn.Linear(HIDDEN_SIZE, OUTPUT_SIZE)
    ).to(DEVICE)

Then we define a python object (a dictionary) with the hyperparameters that we will use to train our model

In [None]:
# Let's define a config object to store our hyperparameters
config = SimpleNamespace(
    epochs = 2,
    batch_size = 128,
    lr = 1e-5,
    dropout = 0.5,
    slice_size = 10_000,
    valid_pct = 0.2,
)

Until now, everything is standard to machine learning algorithms.

In the standard training fuction, we have: the loading of the training data set and its slice into training and validation sets, the definition of the steps per epoch (though we will probably has as well an early stop), chosing the loss function and the optimizer, and finally the loop to train the model per epoch.

It is in this function where we will intertwine the wandb code 

In [None]:
def train_model(config):
    "Train a model with a given config"
    
    wandb.init(
        project="dlai_intro",
        config=config,
    )

    # Get the data
    train_dl, valid_dl = get_dataloaders(DATA_DIR, 
                                         config.batch_size, 
                                         config.slice_size, 
                                         config.valid_pct)
    n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config.batch_size)

    # A simple MLP model
    model = get_model(config.dropout)

    # Make the loss and optimizer
    loss_func = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=config.lr)

    example_ct = 0

    for epoch in tqdm(range(config.epochs), total=config.epochs):
        model.train()

        for step, (images, labels) in enumerate(train_dl):
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            outputs = model(images)
            train_loss = loss_func(outputs, labels)
            optimizer.zero_grad()
            train_loss.backward()
            optimizer.step()

            example_ct += len(images)
            metrics = {
                "train/train_loss": train_loss,
                "train/epoch": epoch + 1,
                "train/example_ct": example_ct
            }
            wandb.log(metrics)
            
        # Compute validation metrics, log images on last epoch
        val_loss, accuracy = validate_model(model, valid_dl, loss_func)
        # Compute train and validation metrics
        val_metrics = {
            "val/val_loss": val_loss,
            "val/val_accuracy": accuracy
        }
        wandb.log(val_metrics)
    
    wandb.finish()

When executing the model using *W&B*, we will receive some links to the iteration and the project. This will lead us to a dashboard were we can explore the graph of the experiment that we carried out