# <center>ECON 725: Computer Programming and Data Management in Economics <a class="tocSkip"></center>    
# <center> MLOps: Experiment Tracking <a class="tocSkip"></center>

## Learning Objectives
<hr>

- Understand the importance of experiment tracking in machine learning projects.
- Learn how to set up and use experiment tracking tools like Weights and Biases.
- Gain hands-on experience in logging experiments, parameters, metrics, and artifacts.
- Learn how to compare different experiment runs and select the best model.
- Understand how to integrate experiment tracking with other ML tools and workflows.

# A quick introduction to Experiment Tracking with Weights & Biases

<center><img width="30%" src="img/wandb.png"/></center>

Experiment tracking is the process of recording and monitoring the details of the experiments you run, including the parameters, metrics, and artifacts generated during the training process. This information helps you keep track of the experiments you run, compare different runs, and select the best model based on the performance metrics. 

In this session, we will use Weights & Biases (W&B) to log and track our machine learning experiments. W&B is a popular experiment tracking tool that allows you to log and visualize your experiments in a collaborative and reproducible way. It provides a unified interface to log various aspects of your experiments, including parameters, metrics, artifacts, and more.

# Setting up Weights & Biases

To get started with W&B, you need to create an account on the W&B platform and install the W&B library in your Python environment. You can sign up for a free account on the W&B website (https://www.wandb.com/) and follow the instructions to create an account. Ideally, you should use your GitHub account for easy authentication. Once you have created an account, you can install the W&B library using `pip`:

```bash
pip install wandb
```

After installing the library, you need to authenticate your account by running the following command in your terminal or notebook:

```bash
wandb login
```

This will prompt you to log in to your W&B account and authenticate your session. Once you have authenticated your account, you are ready to start logging your experiments using W&B.


<center><img width="40%" src="img/wandblogin.png"/></center>

## Log a Run to a new project

Start tracking system metrics and console logs, right out of the box. Run this sample code to see the new run appear in W&B.

```python
import wandb
import random

# start a new wandb run to track this script
wandb.init(
    # set the wandb project where this run will be logged
    project="my-awesome-project",

    # track hyperparameters and run metadata
    config={
    "learning_rate": 0.02,
    "architecture": "CNN",
    "dataset": "CIFAR-100",
    "epochs": 10,
    }
)

# simulate training
epochs = 10
offset = random.random() / 5
for epoch in range(2, epochs):
    acc = 1 - 2 ** -epoch - random.random() / epoch - offset
    loss = 2 ** -epoch + random.random() / epoch + offset

    # log metrics to wandb
    wandb.log({"acc": acc, "loss": loss})

# [optional] finish the wandb run, necessary in notebooks
wandb.finish()
```

### Import libraries

In [28]:
import wandb
import pandas as pd
import pickle
import os

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, mean_squared_error, f1_score, precision_score, recall_score

### Initialize a Weights & Biases Run

At the beginning of our script or notebook, calling `wandb.init()` generates a background process to sync and log data as a W&B Run.

In [4]:
wandb.init(project="econ725-wandb", name="experiment-1")

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mmarcelortizv[0m. Use [1m`wandb login --relogin`[0m to force relogin


### Load the Iris dataset

We will use the Iris dataset for this example. The Iris dataset is a classic dataset in machine learning and statistics, which contains 150 samples of iris flowers, each with four features (sepal length, sepal width, petal length, and petal width) and a target label (species of iris). In order to know more about this dataset, you can checkout the [official docs for `sklearn`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html).

In [5]:
X, y = load_iris(return_X_y=True)
label_names = ["Setosa", "Versicolour", "Virginica"]

### Training model and Experiment Tracking

Define model configs or other hyperparameters using `wandb.config`. This will automatically track the hyperparameters and output them in the W&B dashboard. 

In [6]:
# Log your model configs to Weights & Biases
params = {"C": 0.1, "random_state": 42}
wandb.config = params

Define and train a Logistic Regression model

In [7]:
model = LogisticRegression(**params).fit(X, y)
y_pred = model.predict(X)
y_probas = model.predict_proba(X)

Log your metrics to Weights & Biases using `wandb.log`.

In [8]:
wandb.log({
    "accuracy": accuracy_score(y, y_pred),
    "mean_squared_error": mean_squared_error(y, y_pred)
})

### Visualize and Compare Plots using Weights & Biases

The [**ROC curves**](https://docs.wandb.ai/guides/integrations/scikit#roc) plot true positive rate (y-axis) vs false positive rate (x-axis). The ideal score is a `TPR = 1` and `FPR = 0`, which is the point on the top left. Typically we calculate the area under the ROC curve (AUC-ROC), and the greater the AUC-ROC the better.

In [9]:
wandb.sklearn.plot_roc(y, y_probas, labels=label_names)

The [**precision-recall**](https://docs.wandb.ai/guides/integrations/scikit#precision-recall-curve) curve computes the tradeoff between precision and recall for different thresholds. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate. High scores for both show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall). PR curve is useful when the classes are very imbalanced.

In [10]:
wandb.sklearn.plot_precision_recall(y, y_probas, labels=label_names)

The [**confusion matrix**](https://docs.wandb.ai/guides/integrations/scikit#confusion-matrix) computes the confusion matrix to evaluate the accuracy of a classifier. It's useful for assessing the quality of model predictions and finding patterns in the predictions the model gets wrong. The diagonal represents the predictions the model got right, i.e. where the actual label is equal to the predicted label.

In [11]:
wandb.sklearn.plot_confusion_matrix(y, y_pred, labels=label_names)

In order to know more about the different functionalities available as part of the Scikit-Learn integration with Weights & Biases, you can check the [official docs](https://docs.wandb.ai/guides/integrations/scikit).

### Artifacts

Artifacts are a way to track and version your datasets, models, and other large files. They are a way to track the input and output of your machine learning pipeline. You can log artifacts to W&B using the `wandb.Artifact` class. Artifacts make it easy to get a complete and auditable history of changes to your files.

In [12]:
# Save your model
with open("logistic_regression.pkl", "wb") as f:
    pickle.dump(model, f)

# Log your model as a versioned file to Weights & Biases Artifact
artifact = wandb.Artifact(f"iris-logistic-regression-model", type="model")
artifact.add_file("logistic_regression.pkl")
wandb.log_artifact(artifact)

<Artifact iris-logistic-regression-model>

In [13]:
### Finish the run
wandb.finish()

0,1
accuracy,▁
mean_squared_error,▁

0,1
accuracy,0.96
mean_squared_error,0.04


## Diving Deeper into Weights & Biases

Knowing the basic workflow of logging experiments and visualizing the results is just the tip of the iceberg. W&B provides a wide range of features and integrations that can help you streamline your machine learning workflow and collaborate with your team more effectively. Here are some of the key features of W&B:

* Versioning datasets using [Artifacts](https://docs.wandb.ai/guides/artifacts).
* Exploring and visualizing our datasets with [Tables](https://docs.wandb.ai/guides/tables).
* Baseline Experiment with a Random Forest Classification Model.

<center><img width="40%" src="https://docs.wandb.ai/assets/images/artifacts_landing_page2-a9d45cea4d1c8147231a384b36838619.png"/></center>

<center><img width="40%" src="https://docs.wandb.ai/assets/images/tables_sample_predictions-c07d0f6bdee3c0d70b36246af875b878.png"/></center>


### Logging Dataset to Artifacts

Download the `train.csv` and `test.csv` files from [Titanic - Machine Learning from Disaster](https://www.kaggle.com/competitions/titanic/data) and place them in the `data` directory.

In [14]:
# Initialize a WandB Run
wandb.init(project="econ725-wandb", job_type="log_data")

# Log the `data` directory as an artifact
artifact = wandb.Artifact('Titanic', type='dataset', metadata={"Source": "https://www.kaggle.com/competitions/titanic/data"})
artifact.add_dir('data')
wandb.log_artifact(artifact)

# End the WandB Run
wandb.finish()

[34m[1mwandb[0m: Adding directory to artifact (./data)... Done. 0.0s


VBox(children=(Label(value='0.090 MB of 0.090 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

Let's go to see what happens when we log the dataset to W&B using Artifacts...

### Versioning the Data

In [16]:
# Initialize a WandB Run
wandb.init(project="econ725-wandb", job_type="log_data")

# Fetch the dataset artifact 
artifact = wandb.use_artifact('marcelortizv/econ725-wandb/Titanic:v0', type='dataset')
artifact_dir = artifact.download()

[34m[1mwandb[0m:   2 of 2 files downloaded.  


Read the dataset files

In [18]:
train_df = pd.read_csv(os.path.join(artifact_dir, "train.csv"))
test_df = pd.read_csv(os.path.join(artifact_dir, "test.csv"))

In [19]:
num_train_examples = int(0.8 * len(train_df))
num_val_examples = len(train_df) - num_train_examples

print(num_train_examples, num_val_examples)

712 179


In [20]:
train_df["Split"] = ["Train"] * num_train_examples + ["Validation"] * num_val_examples
train_df.to_csv("data/train.csv", encoding='utf-8', index=False)

In [21]:
# Log the `data` directory as an artifact
artifact = wandb.Artifact('Titanic', type='dataset', metadata={"Source": "https://www.kaggle.com/competitions/titanic/data"})
artifact.add_dir('data')
wandb.log_artifact(artifact)

# End the WandB Run
wandb.finish()

[34m[1mwandb[0m: Adding directory to artifact (./data)... Done. 0.0s


### Explore the Dataset

In [22]:
# Initialize a WandB Run
wandb.init(project="econ725-wandb", job_type="explore_data")

# Fetch the latest version of the dataset artifact 
artifact = wandb.use_artifact('marcelortizv/econ725-wandb/Titanic:latest', type='dataset')
artifact_dir = artifact.download()

# Read the files
train_val_df = pd.read_csv(os.path.join(artifact_dir, "train.csv"))
test_df = pd.read_csv(os.path.join(artifact_dir, "test.csv"))

[34m[1mwandb[0m:   2 of 2 files downloaded.  


In [23]:
# Create tables corresponding to datasets
train_val_table = wandb.Table(dataframe=train_val_df)
test_table = wandb.Table(dataframe=test_df)

# Log the tables to Weights & Biases
wandb.log({
    "Train-Val-Table": train_val_table,
    "Test-Table": test_table
})

# End the WandB Run
wandb.finish()

VBox(children=(Label(value='0.127 MB of 0.127 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

Let's go to see what happens when we log the dataset to W&B using Artifacts...

### Training a Random Forest Classifier

In [24]:
# Initialize a WandB Run
wandb.init(project="econ725-wandb", name="baseline_experiment-2", job_type="train")

# Fetch the latest version of the dataset artifact 
artifact = wandb.use_artifact('marcelortizv/econ725-wandb/Titanic:latest', type='dataset')
artifact_dir = artifact.download()

# Read the files
train_val_df = pd.read_csv(os.path.join(artifact_dir, "train.csv"))
test_df = pd.read_csv(os.path.join(artifact_dir, "test.csv"))

[34m[1mwandb[0m:   2 of 2 files downloaded.  


In [25]:
features = ["Pclass", "Sex", "SibSp", "Parch"]
X_train = pd.get_dummies(train_val_df[features][train_val_df["Split"] == "Train"])
X_val = pd.get_dummies(train_val_df[features][train_val_df["Split"] == "Validation"])
y_train = train_val_df["Survived"][train_val_df["Split"] == "Train"]
y_val = train_val_df["Survived"][train_val_df["Split"] == "Validation"]

In [27]:
model_params = {"n_estimators": 100, "max_depth": 10, "random_state": 1}
wandb.config = model_params

model = RandomForestClassifier(**model_params)
model.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
y_probas_train = model.predict_proba(X_train)
y_pred_val = model.predict(X_val)
y_probas_val = model.predict_proba(X_val)

In [29]:
wandb.log({
    "Train/Accuracy": accuracy_score(y_train, y_pred_train),
    "Validation/Accuracy": accuracy_score(y_val, y_pred_val),
    "Train/Presicion": precision_score(y_train, y_pred_train),
    "Validation/Presicion": precision_score(y_val, y_pred_val),
    "Train/Recall": recall_score(y_train, y_pred_train),
    "Validation/Recall": recall_score(y_val, y_pred_val),
    "Train/F1-Score": f1_score(y_train, y_pred_train),
    "Validation/F1-Score": f1_score(y_val, y_pred_val),
})

In [30]:
label_names = ["Not-Survived", "Survived"]

wandb.sklearn.plot_class_proportions(y_train, y_val, label_names)
wandb.sklearn.plot_summary_metrics(model, X_train, y_train, X_val, y_val)
wandb.sklearn.plot_roc(y_val, y_probas_val, labels=label_names)
wandb.sklearn.plot_precision_recall(y_val, y_probas_val, labels=label_names)
wandb.sklearn.plot_confusion_matrix(y_val, y_pred_val, labels=label_names)



In [31]:
# Save your model
with open("random_forest_classifier.pkl", "wb") as f:
    pickle.dump(model, f)

# Log your model as a versioned file to Weights & Biases Artifact
artifact = wandb.Artifact(f"titanic-random-forest-model", type="model")
artifact.add_file("random_forest_classifier.pkl")
wandb.log_artifact(artifact)


# End the WandB Run
wandb.finish()

VBox(children=(Label(value='0.746 MB of 0.746 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
Train/Accuracy,▁
Train/F1-Score,▁
Train/Presicion,▁
Train/Recall,▁
Validation/Accuracy,▁
Validation/F1-Score,▁
Validation/Presicion,▁
Validation/Recall,▁

0,1
Train/Accuracy,0.8118
Train/F1-Score,0.73307
Train/Presicion,0.82143
Train/Recall,0.66187
Validation/Accuracy,0.82123
Validation/F1-Score,0.72881
Validation/Presicion,0.7963
Validation/Recall,0.67188


## Hyperparameter Optimization with Weights & Biases

In [None]:
# run in Terminal
# python 04_1_hyperparameter_tuning.py

<center><img width="80%" src="img/sweep.png"/></center>