-
Notifications
You must be signed in to change notification settings - Fork 932
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[feat] Model version control using W&B Artifacts (#1137)
Summary: 🚀 I have extended the `WandbLogger` with the ability to log the `current.pt` checkpoint as W&B Artifacts. Note that this PR is based on top of this [PR](#1129). ### What is W&B Artifacts? > W&B Artifacts was designed to make it effortless to version your datasets and models, regardless of whether you want to store your files with us or whether you already have a bucket you want us to track. Once you've tracked your dataset or model files, W&B will automatically log each and every modification, giving you a complete and auditable history of changes to your files. Through this PR, W&B Artifacts can help save and organize machine learning models throughout a project's lifecycle. More details in the documentation [here](https://docs.wandb.ai/guides/artifacts/model-versioning). ### Modification This PR adds a `log_model_checkpoint` method to the `WandbLogger` class in the `utils/logger.py` file. This method is called in the `utils/checkpoint.py` file. ### Usage To use this, in the `config/defaults.yaml` do, `training.wandb.enabled=true` and `training.wandb.log_checkpoint=true`. ### Result The screenshot shows the `current.pt` checkpoints saved at intervals defined by `training.checkpoint_interval`. You can check out the logged artifacts page [here](https://wandb.ai/ayut/mmf/artifacts/model/run_ey9xextf_model/0dc64164acbdc300fd01/api). ![image](https://user-images.githubusercontent.com/31141479/139390462-d5c8445e-5c20-4fdd-85d0-51ef64846bf0.png) ### Superpowers With this small addition, now one can easily track different versions of the model, download a checkpoint of interest by using the API in the API tab, easily share the checkpoints with teammates, etc. ### Requests This is a draft PR as there are a few more things that can be improved here. * Is there a better way to access the path to the `current.pt` checkpoint? Rather is the modification made to `utils/checkpoint.py` an acceptable way of approaching this? * While logging a file as W&B artifacts we can also provide metadata associated with that file. In this case, we can add current iteration, training metrics, etc. as the metadata. Would love to get suggestions about the different data points that I should log as metadata alongside the checkpoints. * How to determine if a checkpoint is the best one? If a checkpoint is best I can add `best` as an alias for that checkpoint's artifact. Pull Request resolved: #1137 Test Plan: Imported from GitHub, without a `Test Plan:` line. **Static Docs Preview: mmf** |[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D32402090/V6/mmf/)| |**Modified Pages**| |[docs/notes/logger](https://our.intern.facebook.com/intern/staticdocs/eph/D32402090/V6/mmf/docs/notes/logger/)| Reviewed By: apsdehal Differential Revision: D32402090 Pulled By: ebsmothers fbshipit-source-id: 94b881ec55c4197301331d571bc926521e2feecc
- Loading branch information
1 parent
b6a5804
commit c9ab349
Showing
5 changed files
with
115 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,75 @@ | ||
--- | ||
id: concepts | ||
title: Terminology and Concepts | ||
sidebar_label: Terminology and Concepts | ||
id: logger | ||
title: Weights and Biases Logging | ||
sidebar_label: Weights and Biases Logging | ||
--- | ||
|
||
## Weights and Biases Logger | ||
|
||
MMF has a `WandbLogger` class which lets the user to log their model's progress using [Weights and Biases](https://gitbook-docs.wandb.ai/). | ||
MMF now has a `WandbLogger` class which lets the user to log their model's progress using [Weights and Biases](https://wandb.ai/site). Enable this logger to automatically log the training/validation metrics, system (GPU and CPU) metrics and configuration parameters. | ||
|
||
## First time setup | ||
|
||
To set up wandb, run the following: | ||
``` | ||
pip install wandb | ||
``` | ||
In order to log anything to the W&B server you need to authenticate the machine with W&B **API key**. You can create a new account by going to https://wandb.ai/signup which will generate an API key. If you are an existing user you can retrieve your key from https://wandb.ai/authorize. You only need to supply your key once, and then it is remembered on the same device. | ||
|
||
``` | ||
wandb login | ||
``` | ||
|
||
## W&B config parameters | ||
|
||
The following options are available in config to enable and customize the wandb logging: | ||
```yaml | ||
training: | ||
# Weights and Biases control, by default Weights and Biases (wandb) is disabled | ||
wandb: | ||
# Whether to use Weights and Biases Logger, (Default: false) | ||
enabled: false | ||
enabled: true | ||
# An entity is a username or team name where you're sending runs. | ||
# This is necessary if you want to log your metrics to a team account. By default | ||
# it will log the run to your user account. | ||
entity: null | ||
# Project name to be used while logging the experiment with wandb | ||
wandb_projectname: mmf_${oc.env:USER} | ||
project: mmf | ||
# Experiment/ run name to be used while logging the experiment | ||
# under the project with wandb | ||
wandb_runname: ${training.experiment_name} | ||
name: ${training.experiment_name} | ||
# Specify other argument values that you want to pass to wandb.init(). Check out the documentation | ||
# at https://docs.wandb.ai/ref/python/init to see what arguments are available. | ||
# job_type: 'train' | ||
# tags: ['tag1', 'tag2'] | ||
env: | ||
wandb_logdir: ${env:MMF_WANDB_LOGDIR,} | ||
``` | ||
To enable wandb logger the user needs to change the following option in the config. | ||
|
||
`training.wandb.enabled=True` | ||
* To enable wandb logger the user needs to change the following option in the config. | ||
|
||
`training.wandb.enabled=True` | ||
|
||
* To give the `entity` which is the name of the team or the username, the user needs to change the following option in the config. In case no `entity` is provided, the data will be logged to the `entity` set as default in the user's settings. | ||
|
||
`training.wandb.entity=<teamname/username>` | ||
|
||
* To give the current experiment a project and run name, user should add these config options. The default project name is `mmf` and the default run name is `${training.experiment_name}`. | ||
|
||
`training.wandb.project=<ProjectName>` <br /> | ||
`training.wandb.name=<RunName>` | ||
|
||
* To change the path to the directory where wandb metadata would be stored (Default: `env.log_dir`): | ||
|
||
`env.wandb_logdir=<dir_name>` | ||
|
||
To give the current experiment a project and run name, user should add these config options. | ||
* To provide extra arguments to `wandb.init()`, the user just needs to define them in the config file. Check out the documentation at https://docs.wandb.ai/ref/python/init to see what arguments are available. An example is shown in the config parameter shown above. Make sure to use the same key name in the config file as defined in the documentation. | ||
|
||
`training.wandb.wandb_projectname=<ProjectName> training.wandb.wandb_runname=<RunName>` | ||
## Current features | ||
|
||
To change the path to the directory where wandb metadata would be stored (Default: `env.log_dir`): | ||
The following features are currently supported by the `WandbLogger`: | ||
|
||
`env.wandb_logdir=<dir_name>` | ||
* Training & Validation metrics | ||
* Learning Rate over time | ||
* GPU: Type, GPU Utilization, power, temperature, CUDA memory usage | ||
* Log configuration parameters |