<img src="https://fsdl.me/logo-720-dark-horizontal">

# Lab 04: Experiment Management

### What You Will Learn

- Why experiment management is so important for ML model development
- Which features of experiment management we use in developing the Text Recognizer
- How to use Weights & Biases for experiment management, including metric logging, artifact versioning, and hyperparameter optimization

# Setup

If you're running this notebook on Google Colab,
the cell below will run full environment setup.

It should take about three minutes to run.

In [None]:
lab_idx = 4

if "bootstrap" not in locals() or bootstrap.run:
    # path management for Python
    pythonpath, = !echo $PYTHONPATH
    if "." not in pythonpath.split(":"):
        pythonpath = ".:" + pythonpath
        %env PYTHONPATH={pythonpath}
        !echo $PYTHONPATH

    # get both Colab and local notebooks into the same state
    !wget --quiet https://fsdl.me/gist-bootstrap -O bootstrap.py
    import bootstrap

    # change into the lab directory
    bootstrap.change_to_lab_dir(lab_idx=lab_idx)

    # allow "hot-reloading" of modules
    %load_ext autoreload
    %autoreload 2
    # needed for inline plots in some contexts
    %matplotlib inline

    bootstrap.run = False  # change to True re-run setup
    
!pwd
%ls

This lab contains a large number of embedded IFrames.

This cell makes the notebook wider if you set `full_width` to `True`.

Particularly useful in local Jupyter. Colab defaults to full width.

In [None]:
from IPython.display import display, HTML, IFrame

full_width = True
frame_height = 720  # adjust for your screen

if full_width:  # if we want the notebook to take up the whole width
    # add styling to the notebook's HTML directly
    display(HTML("<style>.container { width:100% !important; }</style>"))
    display(HTML("<style>.output_result { max-width:100% !important; }</style>"))

# Why experiment management?

Let's run an experiment.

We'll train a new model on a new dataset.

> <small> Simplified model since [Lab 03](https://fsdl.me/lab03-colab) --
> still a CNN encoder with a Transformer decoder,
> still on real text, but only one line at a time.
> Much easier, quicker to run, but still fairly realistic.
> Line-level recognition is [common](https://huggingface.co/docs/transformers/model_doc/trocr). </small>

It will take up to a few minutes.

As it's running, continue reading below.

In [None]:
%%time
import torch

gpus = int(torch.cuda.is_available()) 

%run training/run_experiment.py --model_class LineCNNTransformer --data_class IAMLines \
  --loss transformer --batch_size 32 --gpus {gpus} --max_epochs 2 \
  --limit_train_batches 0.1 --limit_val_batches 0.1 --limit_test_batches 0.1 --log_every_n_steps 10

We're calculating lots of metrics and reporting them to the terminal here.

Achieved by the built-in `.log` method of the `LightningModule`, very straightforward.

What is lost?

- metric values except most recent
- timestamps
- stdout
- CLI args

- model weights outside the top 5
- system info
- git information / disk state

Why do we need this?

Training is like compilation:
in the "differentiable software" sense
and in that there's lots of arcane flags.

Even more so than normal compilation,
generates a ton of important data for understanding future behavior.

We could save this information ourselves but
- bunch of boilerplate code
- conflict over resources
- how do we view the information once it is saved?

learning to read information
[from streaming numbers in the command line](http://www.quickmeme.com/img/45/4502c7603faf94c0e431761368e9573df164fad15f1bbc27fc03ad493f010dea.jpg)
is something of a rite of passage for MLEs,
there's a better way.

# Local Experiment Tracking with Tensorboard

How does this information end up in the output?

Review the `training_step` method of our `LightningModule` class,
the `TransformerLitModel`:

In [None]:
from text_recognizer.lit_models.transformer import TransformerLitModel


TransformerLitModel.validation_step??

Focus on `validation/loss` and `validation/cer` for now.

The `self.log` method with `prog_bar=True` puts information about metrics in the progress bar.
Read more about `log`
[here](https://pytorch-lightning.readthedocs.io/en/1.6.1/common/lightning_module.html#train-epoch-level-metrics).

But the `self.log` method of the `LightningModule` isn't just for logging to the terminal --
it can also use a logger to push information elsewhere.

By default, we use
[TensorBoard](https://www.tensorflow.org/tensorboard)
via the Lightning `TensorBoardLogger`,
which saves results to the local disk.

Let's find them:

In [None]:
# we use a sequence of bash commands to get the latest checkpoint's filename
#  by hand, you can just copy and paste it

list_all_log_files = "find training/logs/lightning_logs/"  # find avoids issues with \n in filenames
filter_to_folders = "grep '_[0-9]*$'"  # regex match on end of line
sort_version_descending = "sort -Vr"  # uses "version" sorting (-V) and reverses (-r)
take_first = "head -n 1"  # the first n elements, n=1

In [None]:
latest_log, = ! {list_all_log_files} | {filter_to_folders} | {sort_version_descending} | {take_first}
latest_log

In [None]:
!ls -lh {latest_log}

To view results, we need to launch a TensorBoard server --
much like we need to launch a Jupyter server to use Jupyter notebooks.

The cell below loads an extension that lets you use TensorBoard inside of a notebook
the same way you'd use it from the command line.

In [None]:
%load_ext tensorboard

In [None]:
# same command works in terminal, with "{arguments}" replaced with values or "$VARIABLES"

port = 11717  # pick an open port on your machine
host = "0.0.0.0" # allow connections from the internet
                 #   watch out! make sure you turn TensorBoard off

%tensorboard --logdir {latest_log} --port {port} --host {host}

If you've run many experiments on this machine,
you can see all of their results by pointing TensorBoard
at the whole `lightning_logs` directory,
rather than just one experiment:

In [None]:
%tensorboard --logdir training/logs/lightning_logs --port {port + 1} --host "0.0.0.0"

For large numbers of experiments, the management experience is not great,
and it's especially difficult to move across groups of experiments or to collaborate,
which are important as applications mature and teams grow.

Tensorboard is an independent service, so we need to make sure we turn it off when we're done.

In [None]:
import tensorboard.notebook

# get the process IDs for all tensorboard instances
pids = [tb.pid for tb in tensorboard.notebook.manager.get_all()]

done_with_tensorboard = False

if done_with_tensorboard:
    for pid in pids:
        !kill -9 {pid} 2> /dev/null

# Experiment Management with Weights & Biases

### What tools are available for experiment management?

TensorBoard is powerful and flexible and very scalable,
but running it requires engineering effort and babysitting --
you're running a database, writing data to it,
and layering a web application over it it.
This is a fairly common workflow for web developers,
but not so much for ML engineers.

Can use [tensorboard.dev](https://tensorboard.dev/),
and it's as simple as running the command `tensorboard dev upload`
pointed at your logging directory.

But there are strict limits to the free tier:
1GB of tensor data and 1GB of binary data.
A single Text Recognizer compiled model checkpoint is ~500MB,
and that's not particularly large for a useful model.
Furthermore, all data is public --
tensorboard.dev works very well for academic and open projects
but not for industrial ML.

Alternatively,
could use [git LFS](https://git-lfs.github.com/)
to track binary data and tensor data,
which is more likely to be sensitive than metrics.
But this separation is un-natural:
model binaries, metrics,
and tracked inputs/outputs
are all needed to debug model development pipelines,
and fragmenting them across services makes debugging harder.
Additionally, git-style versioning is an awkward fit for logging --
is it really sensible to create a new commit for each logging event?

The Hugging Face ecosystem uses TensorBoard and git LFS.
The Hugging Face Hub, a git server much like GitHub,
[will host TensorBoard alongside models](https://huggingface.co/docs/hub/tensorboard)
and officially has
[no storage limit](https://discuss.huggingface.co/t/is-there-a-size-limit-for-dataset-hosting/14861/4),
avoiding the
[tight bandwidth and storage limits](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage),
that make using git LFS with GitHub infeasible.
Using the hub requires maintaining an additional git remote or switching from GitHub,
which is infeasible for many projects.

There are multiple alternatives to TensorBoard.
The primary [open governance](https://www.ibm.com/blogs/cloud-computing/2016/10/27/open-source-open-governance/)
tool is [MLflow](https://github.com/mlflow/mlflow/)
and there are a number of
[closed-governance and/or closed-source tools](https://www.reddit.com/r/MachineLearning/comments/q5g7m9/n_sagemaker_experiments_vs_comet_neptune_wandb_etc/).

These tools generally avoid any need to worry about hosting
(unless data governance rules require a self-hosted version).

Among them, the FSDL recommendation is
[Weights & Biases](https://wandb.ai),
which we believe offers the best user experience,
the best integrations with other tools,
including
[Lightning](https://docs.wandb.ai/guides/integrations/lightning) and
[Keras](https://docs.wandb.ai/guides/integrations/keras),
[Jupyter](https://docs.wandb.ai/guides/track/jupyter),
and even
[TensorBoard](https://docs.wandb.ai/guides/integrations/tensorboard),
and the best tools for collaboration.
For a broad set of opinions on experiment management tools,
see these discussions:

- r/mlops: [1](https://www.reddit.com/r/mlops/comments/uxieq3/is_weights_and_biases_worth_the_money/), [2](https://www.reddit.com/r/mlops/comments/sbtkxz/best_mlops_platform_for_2022/)
- r/MachineLearning: [3](https://www.reddit.com/r/MachineLearning/comments/sqa36p/comment/hwls9px/?utm_source=share&utm_medium=web2x&context=3)

In [None]:
import wandb

print(wandb.__doc__)

The integration is simple:
we get most of it just by changing a single variable, `logger`, from
`TensorboardLogger` to `WandbLogger`:

In [None]:
!grep "args.wandb" -A 5 training/run_experiment.py | head -n 6

In order to complete the rest of this notebook,
you'll need a Weights & Biases account.

As with GitHub, the free tier is very generous for work that is open to the public
but much more limited for work that is private.

The Text Recognizer project will fit comfortably within the free tier.

Run the cell below and follow the prompts to log in or create an account or go
[here](https://wandb.ai/signup).

In [None]:
!wandb login

Run the cell below to launch an experiment tracked with Weights & Biases.

The experiment can take between 3 and 10 minutes to run.
In that time, continue reading.

In [None]:
%%time
%run training/run_experiment.py --model_class LineCNNTransformer --data_class IAMLines \
  --loss transformer --batch_size 32 --gpus {gpus} --max_epochs 10 \
  --log_every_n_steps 10 --wandb --limit_test_batches 0.1 \
  --limit_train_batches 0.1 --limit_val_batches 0.1
    
last_expt = wandb.run

wandb.finish()  # necessary in this style of notebook execution, not necessary in CLI

additional info from wandb
- data saved locally
- data also synced to their servers

What's `wandb` doing?

launches a separate process to "listen" for events and upload them

at the end:
- sparklines
- summary metrics

These also show up in the command line,
with a little less pizzazz.

## Runs

The main interface for W&B is a web application.

To view results, head to the link to the notebook output
as "Syncing run **{adjective}-{noun}-{number}**".

You can watch results stream in live at that link.

Once training is finished, you can run the cell below below to print the URL:

In [None]:
print(last_expt.url)

For even more convenience, we can also see the results directly in the notebook by embedding:

In [None]:
IFrame(last_expt.url, width="100%", height=frame_height)

This is the [run page](https://docs.wandb.ai/ref/app/pages/run-page).

What do we have here?

A bunch of tabs with information about our experiment.

From top to bottom

- [Overview](https://docs.wandb.ai/ref/app/pages/run-page#overview-tab)
  - (i) icon
  - high-level info
  - git repo and state
  - system hardware and hostname
- Charts
  - line plot icon
  - come back to this
- [System](https://docs.wandb.ai/ref/app/pages/run-page#system-tab)
  - computer chip icon
  - GPU metrics, CPU metrics, I/O and network metrics
- [Logs](https://docs.wandb.ai/ref/app/pages/run-page#logs-tab)
  - command prompt icon
  - stdout
- Model
  - undirected graph icon, not super helpful
- [Files](https://docs.wandb.ai/ref/app/pages/run-page#files-tab)
  - documents icon
  - conda-environment.yaml and requirements.txt
  - diff.patch
- [Artifacts](https://docs.wandb.ai/ref/app/pages/run-page#artifacts-tab)
  - drum storage icon, aka "stacked hockey pucks"
  - versioned binary files: `run_table`s of predictions and `model` checkpoints

- Charts redux
  - you can edit the charts
  - peep the gradient histos -- `wandb.watch`

Note that we have model inputs and outputs.
This requires a bit more work than everything else,
which is part of the W&B features built into Lightning.

This is achieved by a custom Lightning `Callback`.

In [None]:
from text_recognizer.callbacks.imtotext import ImageToTextTableLogger


ImageToTextTableLogger??

It logs structured tabular data to W&B,
where the columns can be rich media
and the tables support basic exploratory data analysis in the browser.
[Docs here](https://docs.wandb.ai/guides/data-vis/log-tables).

In [None]:
table_versions_url = last_expt.url.split("runs")[0] + f"artifacts/run_table/run-{last_expt.id}-trainpredictions/"
table_data_url = table_versions_url + "v0/files/train/predictions.table.json"

print(table_data_url)
IFrame(src=table_data_url, width="100%", height=frame_height)

## Projects

Can also view lots of runs at once: [project page](https://docs.wandb.ai/ref/app/pages/project-page).

Let's see what this looks like for a longer-running project --
some of the debugging and feature addition work updating the course from 2021 to 2022.

In [None]:
project_url = "https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/workspace"

print(project_url)
IFrame(src=project_url, width="100%", height=720)

## Artifacts

Store and version large binary files in the W&B cloud. [Docs](https://docs.wandb.ai/guides/artifacts/artifacts-core-concepts)

Click on one of the `model` checkpoints -- whichever version.

Overview: includes which run created this model checkpoint.

Metadata: includes hyperparameters and `validation/cer`.

Files: actual file contents of the artifact. Different between versions.

In [None]:
IFrame(src=last_expt.url + "/artifacts", width="100%", height=frame_height)

Storage limits,
as of August 2022:
  - 100GB of Artifacts
  - 100GB of experiment data

You can track your storage and compare it to limits at this URL:

In [None]:
storage_tracker_url = f"https://wandb.ai/usage/{last_expt.entity}"

print(storage_tracker_url)

## Reports

## Programmatic Access

Can also programmatically access data via an API:

In [None]:
wb_api = wandb.Api()

For example, we can access the data we just logged:

In [None]:
run = wb_api.run("/".join([last_expt.entity, last_expt.project, last_expt.id]))  # fetch a run given your username, the project, and the run's ID

hist = run.history()  # and pull down a sample of the data as a pandas DataFrame

hist.head(5)

In [None]:
hist.groupby("epoch")["train/loss"].mean()

including the artifacts:

In [None]:
# which artifacts where created and logged?
artifacts = run.logged_artifacts()

for artifact in artifacts:
    print(f"artifact of type {artifact.type}: {artifact.name}")

meaning we can easily recreate training or validation data that came out of our `DataLoader`s,
which is normally ephemeral:

In [None]:
from pathlib import Path

artifact = wb_api.artifact(f"{last_expt.entity}/{last_expt.project}/run-{last_expt.id}-trainpredictions:latest")
artifact_dir = Path(artifact.download(root="training/logs"))
image_dir = artifact_dir / "media" / "images"

images = [path for path in image_dir.iterdir()]

In [None]:
import random

from IPython.display import Image

Image(str(random.choice(images)))

#### Advanced W&B API Usage: MLOps

One of the strengths of a well-instrumented experiment tracking system is that it allows
automatic relation of information:
what were the inputs when this model's gradient spiked?
which models have been trained on this dataset,
and what was their performance?

The cells below pull down the training data
for the model currently running the FSDL Text Recognizer app.

This is just intended as a demonstration of what's possible,
so don't worry about understanding every piece of this.

We start from the same project that we used to look at the project view.

In [None]:
text_recognizer_project = wb_api.project("fsdl-text-recognizer-2021-training", entity="cfrye59")

text_recognizer_project  

and then we search it for the text recognizer model currently being used in production:

In [None]:
# collect all versions of the text-recognizer ever put into production by...

for art_type in text_recognizer_project.artifacts_types(): # looking through all artifact types
    if art_type.name == "prod-ready":  # for the prod-ready type
        # and grabbing the text-recognizer
        production_text_recognizers = art_type.collection("paragraph-text-recognizer").versions()

# and then get the one that's currently being tested in CI by...
for text_recognizer in production_text_recognizers:
    if "ci-test" in text_recognizer.aliases:  # looking for the one that's labeled as CI-tested
        in_prod_text_recognizer = text_recognizer

# view its metadata at the url or in the notebook
in_prod_text_recognizer_url = text_recognizer_project.url[:-9] + f"artifacts/{in_prod_text_recognizer.type}/{in_prod_text_recognizer.name.replace(':', '/')}"

print(in_prod_text_recognizer_url)
IFrame(src=in_prod_text_recognizer_url, width="100%", height=frame_height)

From its metadata, we can get information about how it was "staged" to be put into production,
and in particular which model checkpoint was used:

In [None]:
staging_run = in_prod_text_recognizer.logged_by()

training_ckpt, = [at for at in staging_run.used_artifacts() if at.type == "model"]
training_ckpt.name

That checkpoint was logged by a training experiment, which is available as metadata.

We can look at the training run for that model, either here in the notebook or at its URL:

In [None]:
training_run = training_ckpt.logged_by()
print(training_run.url)
IFrame(src=training_run.url, width="100%", height=frame_height)

and from there, we can pull down the logged data and analyze it locally.

In [None]:
training_results = training_run.history(samples=10000)
training_results.head()

In [None]:
ax = training_results.groupby("epoch")["train/loss"].mean().plot();
training_results["validation/loss"].dropna().plot(logy=True); ax.legend();

Raw data deluge:
if you're spun up on the project,
it's useful for exploration, discovery.

If not, not so useful -- just overwhelming.

We need to synthesize the raw logged data into information.
This helps us communicate with other stakeholders,
preserve knowledge and prevent repetition of work,
and surface insights faster.

These workflows are supported by the W&B Reports feature
([docs here](https://docs.wandb.ai/guides/reports)).

Below are some common use cases and an example for each.

### Dashboard

Structured subset of output from experiment,
designed for quickly surfacing issues or insights

Use cases:
- basic state of ongoing experiment
- comparing one experiment to another
- spinning yourself back up into context more quickly

In [None]:
dashboard_url = "https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/reports/Training-Run-2022-06-02--VmlldzoyMTAyOTkw"

IFrame(src=dashboard_url, width="100%", height=frame_height)

### PR Documentation

One or a small number of charts making a clear point and connected to VCS state.

Use cases:
- intra-team communication
- record-keeping that points to raw info and makes it discoverable
- improving confidence in PR correctness (see troubleshooting & testing)

In [None]:
bugfix_doc_url = "https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/reports/Overfit-Check-After-Refactor--VmlldzoyMDY5MjI1"

IFrame(src=bugfix_doc_url, width="100%", height=frame_height)

### "Blog Post"

Lots of prose, outlinks, context.

- external comms: branding, recruiting
- communication between teams

Example, from the Craiyon.ai project DALL·E Mini project, by FSDL alumnus
[Boris Dayma](https://twitter.com/borisdayma)
and others:

In [None]:
IFrame(src="https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-Mini-Explained-with-Demo--Vmlldzo4NjIxODA#training-dall-e-mini", width="100%", height=frame_height)

More examples from the FSDL Text Recognizer project
[here](https://wandb.ai/cfrye59/fsdl-text-recognizer-2021-training/reports/-Report-of-Reports---VmlldzoyMjEwNDM5)
-- all of them organized into a Report!

## Programmatic Access

Can also programmatically access data via an API:

In [None]:
wb_api = wandb.Api()

For example, we can access the data we just logged:

In [None]:
run = wb_api.run("/".join([last_expt.entity, last_expt.project, last_expt.id]))  # fetch a run given your username, the project, and the run's ID

hist = run.history()  # and pull down a sample of the data as a pandas DataFrame

hist.head(5)

In [None]:
hist.groupby("epoch")["train/loss"].mean()

including the artifacts:

In [None]:
# which artifacts where created and logged?
artifacts = run.logged_artifacts()

for artifact in artifacts:
    print(f"artifact of type {artifact.type}: {artifact.name}")

meaning we can easily recreate training or validation data that came out of our `DataLoader`s,
which is normally ephemeral:

In [None]:
from pathlib import Path

artifact = wb_api.artifact(f"{last_expt.entity}/{last_expt.project}/run-{last_expt.id}-trainpredictions:latest")
artifact_dir = Path(artifact.download(root="training/logs"))
image_dir = artifact_dir / "media" / "images"

images = [path for path in image_dir.iterdir()]

In [None]:
import random

from IPython.display import Image

Image(str(random.choice(images)))

### Advanced W&B API Usage: MLOps

One of the strengths of a well-instrumented experiment tracking system is that it allows
automatic relation of information:
what were the inputs when this model's gradient spiked?
which models have been trained on this dataset,
and what was their performance?

The cells below pull down the training data
for the model currently running the FSDL Text Recognizer app.

This is just intended as a demonstration of what's possible,
so don't worry about understanding every piece of this.

We start from the same project that we used to look at the project view.

In [None]:
text_recognizer_project = wb_api.project("fsdl-text-recognizer-2021-training", entity="cfrye59")

text_recognizer_project  

and then we search it for the text recognizer model currently being used in production:

In [None]:
# collect all versions of the text-recognizer ever put into production by...

for art_type in text_recognizer_project.artifacts_types(): # looking through all artifact types
    if art_type.name == "prod-ready":  # for the prod-ready type
        # and grabbing the text-recognizer
        production_text_recognizers = art_type.collection("paragraph-text-recognizer").versions()

# and then get the one that's currently being tested in CI by...
for text_recognizer in production_text_recognizers:
    if "ci-test" in text_recognizer.aliases:  # looking for the one that's labeled as CI-tested
        in_prod_text_recognizer = text_recognizer

# view its metadata at the url or in the notebook
in_prod_text_recognizer_url = text_recognizer_project.url[:-9] + f"artifacts/{in_prod_text_recognizer.type}/{in_prod_text_recognizer.name.replace(':', '/')}"

print(in_prod_text_recognizer_url)
IFrame(src=in_prod_text_recognizer_url, width="100%", height=frame_height)

From its metadata, we can get information about how it was "staged" to be put into production,
and in particular which model checkpoint was used:

In [None]:
staging_run = in_prod_text_recognizer.logged_by()

training_ckpt, = [at for at in staging_run.used_artifacts() if at.type == "model"]
training_ckpt.name

That checkpoint was logged by a training experiment, which is available as metadata.

We can look at the training run for that model, either here in the notebook or at its URL:

In [None]:
training_run = training_ckpt.logged_by()
print(training_run.url)
IFrame(src=training_run.url, width="100%", height=frame_height)

and from there, we can pull down the logged data and analyze it locally.

In [None]:
training_results = training_run.history(samples=10000)
training_results.head()

In [None]:
ax = training_results.groupby("epoch")["train/loss"].mean().plot();
training_results["validation/loss"].dropna().plot(logy=True); ax.legend();

# Hyperparameter Optimization

Many of our choices, like the depth of our network, the parameters of our optimizer,
cannot be (easily) chosen by descent of the gradient of a loss function.

These parameters that impact the values of the parameters we directly optimize with gradients,
or _hyperparameters_
can also be optimized.

Simple and straightforward versions are best, and
Weights & Biases makes the most straightforward forms of hyperparameter optimization easy.

We can use the same training script and we don't need to run an optimization server.

We just need to write a configuration yaml file. [docs](https://docs.wandb.ai/guides/sweeps/configuration)

In [None]:
%%writefile training/simple-sweep.yaml
# first we specify what we're sweeping
# we specify a program to run
program: training/run_experiment.py
# we optionally specify how to run it, including setting default arguments
command:  
    - ${env}
    - ${interpreter}
    - ${program}
    - "--wandb"
    - "--overfit_batches"
    - "1"
    - "--log_every_n_steps"
    - "25"
    - "--max_epochs"
    - "100"
    - "--limit_test_batches"
    - "0"
    - ${args}  # these arguments come from the sweep parameters below

# and we specify which parameters to sweep over, what we're optimizing, and how we want to optimize it
method: random  # generally, random searches perform well, can also be "grid" or "bayes"
metric:
    name: train/loss
    goal: minimize
parameters:  
    # LineCNN hyperparameters
    window_width:
        values: [8, 16, 32, 64]
    window_stride:
        values: [4, 8, 16, 32]
    # Transformer hyperparameters
    tf_layers:
        values: [1, 2, 4, 8]
    # we can also fix some values, just like we set default arguments
    gpus:
        value: 1
    model_class:
        value: LineCNNTransformer
    data_class:
        value: IAMLines
    loss:
        value: transformer

From the config we launch a "controller":
a lightweight process that just decides what hyperparameters to try next
and coordinates the heavierweight training.

This lives on the W&B servers, so no headaches about opening communication,
cleaning up when it's done, etc.

In [None]:
!wandb sweep training/simple-sweep.yaml --project fsdl-line-recognizer-2022
simple_sweep_id = wb_api.project("fsdl-line-recognizer-2022").sweeps()[0].id

and then we can launch an "agent" to follow the orders of the controller:

In [None]:
%%time

# interrupt twice to terminate this cell if it's running too long,
#   can be over 15 minutes with some hyperparameters

!wandb agent --project fsdl-line-recognizer-2022 --entity {wb_api.default_entity} --count=1 {simple_sweep_id}

We set the `--count` of runs to execute to just `1` to reduce the runtime.

If not provided, the agent will run forever for random or Bayesian sweeps
or until the sweep is terminated, which can be done from the W&B interface.

One fun trick: use environment variables to launch parallel epxeriments on multiple GPUs on the same machine.

```
CUDA_VISIBLE_DEVICES=0 wandb agent $SWEEP_ID
# open another terminal
CUDA_VISIBLE_DEVICES=1 wandb agent $SWEEP_ID
# and so on
```

# Exercises

### 🌟Contribute to a hyperparameter search.

We've kicked off a big hyperparameter search on the `LineCNNTransformer` that anyone can join!

There are ~10,000,000 potential hyperparameter combinations,
and each takes 30 minutes to test,
so checking each possiblity will take over 500 years of compute time.
Best get cracking then!

Run the cell below to pull up a dashboard and print the URL where you can check on the current status.

In [None]:
sweep_entity = "fullstackdeeplearning"
sweep_project = "fsdl-line-recognizer-2022"
sweep_id = "e0eo43eu"
sweep_url = f"https://wandb.ai/{sweep_entity}/{sweep_project}/sweeps/{sweep_id}"

print(sweep_url)
IFrame(src=sweep_url, width="100%", height=frame_height)

We can also retrieve information about the sweep from the API,
including the hyperparameters being swept over.

In [None]:
sweep_info = wb_api.sweep("/".join([sweep_entity, sweep_project, sweep_id]))

In [None]:
hyperparams = sweep_info.config["parameters"]
hyperparams

If you'd like to contribute to this sweep,
run the cell below after changing the count to a number greater than 0.

Each iteration runs for 30 minutes if it does not crash,
e.g. due to out-of-memory errors.

In [None]:
count = 0  # off by default, increase it to join in!

if count:
    !wandb agent {sweep_id} --entity {sweep_entity} --project {sweep_project} --count {count}

### 🌟🌟 Find good hyperparameters for the` LineCNNTransformer`.

If you observe interesting phenomena during training,
from promising hyperparameter combos to software bugs to strange model behavior,
turn the charts into a report and share it with the FSDL community or
[open an issue on GitHub](https://github.com/full-stack-deep-learning/fsdl-text-recognizer-2022/issues)
with a link to them.

In [None]:
# check the `sweep_info.config` above for hyperparameters or see the --help output for potential arguments
%run training/run_experiment.py --model_class LineCNNTransformer --data_class IAMLines \
  --loss transformer --batch_size 32 --gpus {gpus} --max_epochs 5 \
  --log_every_n_steps 50 --wandb --limit_test_batches 0.1 \
  --limit_train_batches 0.1 --limit_val_batches 0.1 \
  --help
    
last_hyperparam_expt = wandb.run  # in case you wwant to look pull URLs, look up in API, etc., as in code above

wandb.finish()

### 🌟🌟🌟 Add logging of tensor statistics.

`torchmetrics`. use `MinMetric`, `MaxMetric`, and `MeanMetric`. but start with just one!

To use it with `training/run_experiment.py`, you'll need to
- define it in a Python file somewhere, e.g. `text_recognizer/metrics`
- add the metrics to `BaseImageToTextLitModel`'s `__init__` method, where `CharacterErrorRate` appears
  - decide whether to calculate separate train/val versions (whatever you do, start with just one of them!)
- in the appropriate methods of the `TransformerLitModel`, add metric calculation and logging for `Min`, `Max`, or `Mean`.
  - base on the calculation and logging of `val_cer`
  - `sync_dist=True` is only important in distributed settings, so you might not notice any issues regardless of that argument's value

Bonus Challenge: use `MeanSquaredError` to implement a `VarianceMetric`. Hint: one way is to use `torch.zeros_like` and `torch.mean`.