# **Experiment Tracking**

| | |
|-|-|
| Author(s) | [Keeyana Jones](https://github.com/keeyanajones/) |

## **Overview**

Experiment tracking s a cornerstone of effective machine learning (ML) development, especially in research, development, and MLOps.  Its the systematic process of recording, organizing, and analyzing all the relevant meta data, artifacts and results of each machine learning experiment or run.

Think of ML experiment as a single attempt to train a model, with specific configurations, data, and code.  Because ML development is highly iterative and involves a lot of trail and error (trying different algorithms, hyperparameters, data preprocessing techniques, etc), keeping track of these experiments manually quickly becomes unmanageable.  

1. Inputs/Configuration:
   - **Code Version:** The Git commit hash or version of the training script, preprocessing scripts, and model definition code.
   - **Data Version:** The specific version of the dataset used for training and validation (linking to a data versioning system like DVC or Git LFS).
   - **Hyperparameters:** All the turnable parameters of your model and training process (e.g., learning rate batch size, number of layers, optimizer type, regularization strength).
   - **Environment:** Details about the computing environment (e.g., Python version, library versions, GPU type, CPU count).
   - **Random Seeds:** Any random seeds used to ensure reproducibility of stochastic processes.

2. Outputs/Results:
   - **Metrics:** Key performance indicators (KPIs) measured during training and evaluation (e.g., accuracy, loss, precision, recall, F1 score for classification, MSE, RMSE, MAE, R-squared for regression).  These are often tracked over epochs.
   - **Model Artifacts:** The trained model weights/checkpoints, architecture definitions, and any preprocessing components (e.g., fitted scalers, tokenizers).
   - **Visualization:** Plots generated during training (e.g., loss curves, learning rate schedules, gradient norms), and evaluation visualizations (e.g., confusion matrices, ROC curves, example predictions).
   - **Logs:** Standard output and error logs from the training process.
   - **Hardware Usage:** CPU, GPU, and memory utilization during the run.

3. Metadata:
   - **Experiment ID/Name:** A unique identifier or descriptive name for each run.
   - **Timestamp:** When the experiment was started and finished.
   - **User/Author:** Who initiated the experiment.
   - **Status:** Whether the run completed successfully, failed, or was interrupted.
   - **Tags/Notes:** Custom tags or free form notes to categorize experiments and add context.         

### **Why is Experiment Tracking Essential?**

1. **Reproducibility:** This is the most critical reason.  Without tracking, its virtually impossible to reproduce a specific models performance later.  If you can't reproduce results, you can't debug, verify, or build upon past work effectively. 

2. **Comparison and Analysis:**
   - Easily Compare the performance of different model architectures, hyperparameter settings, or data preprocessing techniques side by side.
   - Identify what changes led to improvement or regressions in model performance.
   - Debug training processes by analyzing metric curves, resource usage, or specific outputs.

3. **Collaboration:**
   - Provides a centralized hub for teams to share, understand, and review each other's experiments.
   - Reduces redundant work, as team members can see what configurations have already been tried.
   - Facilitates knowledge transfer and onboarding for new team members. 

4. **Auditability and Compliance:** 
   - Creates a clear audit trail of model development, which is crucial for regulated industries that need to explain or justify how a model was built and what data it used. 

5. **Resource Optimization:**
   - By tracking resource consumption (CPU/GPU usage, memory), you can identify inefficient experiments and optimize your computing costs.

6. **Faster Iteration:** An organized overview of past experiments allows daa scientist to quickly identify promising directions and avoid repeating failed approaches, accelerating the entire development cycle.

7. **Transition to MLOps:** Experiment tracking is a fundamental component of MLOps (Machine Learning Operations).  It bridges the gap between research/experimentation and production deployment by providing the necessary metadata for model registration, deployment, and monitoring.  

### **How to Implement Experiment Tracking?**

While you could use manual spreadsheets or ad-hoc file naming dedicated tools are highly recommended for any serious ML Project: 

1. Manual Tracking (Not Recommended for Scale):
   - Spreadsheets (Google Sheets, Excel) or text files to log parameters and results.
   - Systematic folder structures to save model artifacts and plots.
   - **Pros:** Simple to start for very small, solo projects.
   - **Cons:** Tedious, error prone, difficult to compare, scale or collaborate. 

2. Integrated with ML Frameworks (Basic Logging):
   - **TensorBoard:** Built for TensorFlow (and now supports PyTorch), it allows visualizing metrics and graphs during training. Its a good basic logger but not a full experiment management system.
   - **PyTorch Lighting Logger:** Integrates with various loggers (tensorBoard, MLflow, Weights & Biases) to simplify logging.

3. Dedicated Experiment Tracking Platforms/Tools:
   - These are the most common and powerful solutions, providing APIs for logging and a web UI for visualization and comparison.
      - **MLFlow Tracking:** An open source platform thats part of the broader MLFlow ecosystem.  Its widely adopted for tracking metrics, parameters, and artifacts, and includes a UI.
      - **Weights & Biases (W&B):** A popular commercial (with free tier) platform known for its excellent visualizations, comprehensive logging, hyperparameter weeps, and collaboration features.
      - **Comet ML:** Another strong commercial (with free tier) contender offering robust experiment tracking, model production monitoring, and a rich UI.
      - **Neptune.ai:** A lightweight, flexible experiment management tool designed to fit into an workflow, with a focus on ease of use.  
      - **ClearML:** An open source MLOps platform that includes comprehensive experiment tracking, automation, and pipeline capabilities.   
      - **DVC (Data Version Control) with DVC Studio:** While primarily for data and model versioning. DVC also has experiment tracking features that work with Git to track experiments, parameters, and metrics.
      - **Guild AI:** Open source experiment racking pipeline automation and hyperparameter tuning.
      - **Polyaxon:** An open source platform for managing and orchestrating ML experiments.   

### **How they Work (Conceptual):**

You typically integrate a few lines of code from the chosen tools client library into your training script.  For Example:

In [None]:
import wandb # or mlflow, neptune, comet

# Initialize a new experiment run
wandb.init(project="my_image_classifier", config={"learning_rate": 0.001, "epochs": 10})

# Get hyperparameters from the config
learning_rate = wandb.config.learning_rate
epochs = wandb.config.epochs

# Your training loop
for epoch in range (epochs):
    # ... train model ...
    loss = calculate_loss()
    accuracy = calculate_accuracy()

# Log metrics to the experiment tracker
wandb.log({"loss": loss, "accuracy": accuracy, "epoch": epoch})

# After training, log the final model artifact
wandb.save("model.pth")

After running your script, the logged data is sen to a centralized server (cloud hosted or self hosted) where you can then access an intuitive dashboard to view, filter, sort, compare, and analyze all your experiment runs.  

By adopting a robust experiment tracking strategy, data science teams can significantly enhance their productivity, ensure reproducibility, and ultimately build better, more reliable machine learning models.

----