## Tracking computational experiments with weights and biases
___
<img src="images/wandbdash.png" alt="drawing" width="1300"/>

Julius Polz
___
Karlsruhe Institute of Technology

### 1 - Experiment tracking

<img src="images/scheme1.png" alt="drawing" width="700"/>

What to track?

 * Scores
 * Parameters

 * Time
 * Package versions
 * Code version (e.g. Git commit)

 * Power consumption
 * Computational resources
 * Hardware

<img src="images/nature2.png" alt="drawing" width="1100"/>

<img src="images/nature.png" alt="drawing" width="1100"/>

... Conclusion: There are significant effects of rounding errors 

### 2 - Setup

```
git clone https://github.com/jpolz/wandb_example.git
cd wandb_example
```

conda/mamba installation assumed:

```
conda create env -f environment.yml
conda activate wandb
jupyter notebook
```

In [None]:
import wandb
import numpy as np

In [None]:
! wandb login

--> get key from https://wandb.ai/settings --> API keys

### 2 - Logging parameters and scores with weights and biases

In [None]:
config = {'parameter_1':1,'parameter_2':1,'parameter_3':1,} # store parameters in a dictionary

In [None]:
run = wandb.init(project="my-test-project", config=config) # <-- really good documentation
print('hello world')
print('parameter is '+str(run.config.parameter_1))
# run
# some
# model
run.log({'score':1})
run.finish()

In [None]:
run = wandb.init(project="my-test-project", config=config) 
print('parameter is '+str(run.config.parameter_1))
for i in np.linspace(0,1,10):
    # train/optimize some model
    run.log({'score':i})
run.finish()

### 3 - The experiment

In [None]:
from sklearn.datasets import make_moons # example data generator
from sklearn.ensemble import RandomForestClassifier # example model
from sklearn.model_selection import train_test_split # pre-processing

In [None]:
X, y = make_moons(n_samples=1000, noise=0.3, random_state=42) # generate example data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42) # random train test split

In [None]:
import matplotlib.pyplot as plt

In [None]:
fig, ax = plt.subplots(1,2, figsize=(14,7))
ax[0].scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='RdYlBu')
ax[1].scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='RdYlBu')
ax[0].set_title('train')
ax[1].set_title('test');

In [None]:
classifier = RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1) # classifier with parameters

In [None]:
classifier.fit(X_train, y_train) # the actual training --> further reading: https://en.wikipedia.org/wiki/Random_forest
train_accuracy = classifier.score(X_train, y_train) # training score
test_accuracy = classifier.score(X_test, y_test) # test score
print(train_accuracy, test_accuracy)

In [None]:
y_pred_train = classifier.predict(X_train) # predictions for training data
y_pred_test = classifier.predict(X_test) # predictions for test data

In [None]:
fig, ax = plt.subplots(1,2, figsize=(10,5))
ax[0].scatter(X_train[:, 0], X_train[:, 1], c=y_pred_train, cmap='RdYlBu')
ax[1].scatter(X_test[:, 0], X_test[:, 1], c=y_pred_test, cmap='RdYlBu')
ax[0].set_title('train')
ax[1].set_title('test');

In [None]:
fig, ax = plt.subplots(1,2, figsize=(10,5))
ax[0].scatter(X_train[:, 0], X_train[:, 1], c=y_pred_train==y_train, cmap='RdYlGn')
ax[1].scatter(X_test[:, 0], X_test[:, 1], c=y_pred_test==y_test, cmap='RdYlGn')
ax[0].set_title('train')
ax[1].set_title('test');

In [None]:
h = 0.02  # step size in the mesh
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = classifier.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
# Put the result into a color plot
Z = Z.reshape(xx.shape)

In [None]:
fig, ax = plt.subplots(1,2, figsize=(14,7))
ax[0].scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='RdBu') # true classes
ax[1].scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='RdBu') # true classes
ax[0].set_title('train')
ax[1].set_title('test');
ax[0].contourf(xx, yy, Z, cmap='RdBu', alpha=0.5)
ax[1].contourf(xx, yy, Z, cmap='RdBu', alpha=0.5);

### 3 - Integrating W&B

In [None]:
classifier = RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1)
classifier.fit(X_train, y_train)
train_accuracy = classifier.score(X_train, y_train)
test_accuracy = classifier.score(X_test, y_test)
print(train_accuracy, test_accuracy)

In [None]:
config = {'max_depth':5,'n_estimators':10,'max_features':1,}
classifier = RandomForestClassifier(
    max_depth=config['max_depth'], 
    n_estimators=config['n_estimators'], 
    max_features=config['max_features']
)
classifier.fit(X_train, y_train)
train_accuracy = classifier.score(X_train, y_train)
test_accuracy = classifier.score(X_test, y_test)
print(train_accuracy, test_accuracy)

In [None]:
config = {'max_depth':6,'n_estimators':5,'max_features':1,}
run = wandb.init(project="my-test-rf", config=config)               # changes!
classifier = RandomForestClassifier(
    max_depth=run.config.max_depth,                                 # changes!
    n_estimators=run.config.n_estimators,                           # changes!
    max_features=run.config.max_features                            # changes!
)
classifier.fit(X_train, y_train)
train_accuracy = classifier.score(X_train, y_train)
test_accuracy = classifier.score(X_test, y_test)
run.log({                                                           # changes!
    'train_accuracy':train_accuracy,
    'test_accuracy':test_accuracy,
})
run.finish()                                                        # changes!

### 3 - Parameter sweeps

Sweeps are used for parameter optimization

In [None]:
sweep_config = {
    'method': 'random', #grid, random, bayes
    'metric': {
      'name': 'test_accuracy',
      'goal': 'maximize'   
    },
    'parameters': {
        'max_depth': {
            'values': [3,4,5,6,7,8,9,10]
        },
        'n_estimators': {
            'values': [1,2,3,4,5,6,7,8,9,10,15,20,25,50,100]
        },
        'max_features': {
            'values': [1,2,3]
        },
        'min_impurity_decrease': {
            'distribution': 'uniform',
            'min': 0.0,
            'max': 0.1
        },
    }
}

In [None]:
def run_fct():
    wandb.init()
    classifier = RandomForestClassifier(
        max_depth=wandb.config.max_depth,                               # using wandb.config instead of run.config
        n_estimators=wandb.config.n_estimators,
        max_features=wandb.config.max_features,
        min_impurity_decrease=wandb.config.min_impurity_decrease        #!!!!!!! new parameter
    )
    classifier.fit(X_train, y_train)
    train_accuracy = classifier.score(X_train, y_train)
    test_accuracy = classifier.score(X_test, y_test)
    wandb.log({
        'train_accuracy':train_accuracy,
        'test_accuracy':test_accuracy,
    })
    return None

In [None]:
sweep_id = wandb.sweep(sweep_config, project="my-test-rf")
wandb.agent(sweep_id, run_fct)

In [None]:
sweep_config = {
    'method': 'bayes', #grid, random, bayes
    'metric': {
      'name': 'test_accuracy',
      'goal': 'maximize'   
    },
    'parameters': {
        'max_depth': {
            'values': [3,4,5,6,7,8,9,10]
        },
        'n_estimators': {
            'values': [1,2,3,4,5,6,7,8,9,10,15,20,25,50,100]
        },
        'max_features': {
            'values': [1,2,3]
        },
        'min_impurity_decrease': {
            'distribution': 'uniform',
            'min': 0.0,
            'max': 0.1
        },
    }
}

In [None]:
sweep_id = wandb.sweep(sweep_config, project="my-test-rf")
wandb.agent(sweep_id, run_fct)

### Conclusion: Why I use W&B

 * free for academia
 * easy to implement with few changes to code
 * tracks a lot of stuff
 * great visualization
 * convenient parameter optimization
 * you can also save and recover trained models