Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Model version control using W&B Artifacts #1137

Closed
wants to merge 19 commits into from

Conversation

ayulockin
Copy link
Contributor

🚀 I have extended the WandbLogger with the ability to log the current.pt checkpoint as W&B Artifacts. Note that this PR is based on top of this PR.

What is W&B Artifacts?

W&B Artifacts was designed to make it effortless to version your datasets and models, regardless of whether you want to store your files with us or whether you already have a bucket you want us to track. Once you've tracked your dataset or model files, W&B will automatically log each and every modification, giving you a complete and auditable history of changes to your files.

Through this PR, W&B Artifacts can help save and organize machine learning models throughout a project's lifecycle. More details in the documentation here.

Modification

This PR adds a log_model_checkpoint method to the WandbLogger class in the utils/logger.py file. This method is called in the utils/checkpoint.py file.

Usage

To use this, in the config/defaults.yaml do, training.wandb.enabled=true and training.wandb.log_checkpoint=true.

Result

The screenshot shows the current.pt checkpoints saved at intervals defined by training.checkpoint_interval. You can check out the logged artifacts page here.

image

Superpowers

With this small addition, now one can easily track different versions of the model, download a checkpoint of interest by using the API in the API tab, easily share the checkpoints with teammates, etc.

Requests

This is a draft PR as there are a few more things that can be improved here.

  • Is there a better way to access the path to the current.pt checkpoint? Rather is the modification made to utils/checkpoint.py an acceptable way of approaching this?

  • While logging a file as W&B artifacts we can also provide metadata associated with that file. In this case, we can add current iteration, training metrics, etc. as the metadata. Would love to get suggestions about the different data points that I should log as metadata alongside the checkpoints.

  • How to determine if a checkpoint is the best one? If a checkpoint is best I can add best as an alias for that checkpoint's artifact.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Oct 29, 2021
@ayulockin ayulockin marked this pull request as ready for review November 8, 2021 08:29
@ayulockin
Copy link
Contributor Author

Hey @ebsmothers, thought of tagging you here for visibility since you looked over my first PR.

@apsdehal
Copy link
Contributor

apsdehal commented Nov 9, 2021

@ayulockin Thanks for the PR! Give us a few days to review this. We will get back to you soon.

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, and for your patience on the review. The changes look good. Can you rebase to factor out the changes from PR#1129? Alternatively we can just close the other PR and use this one instead, whichever you prefer.

@ayulockin
Copy link
Contributor Author

Hey, @ebsmothers I rebased to factor in the changes. This PR now contains all the changes from PR#1129. Please take a look and let me know. If you want you can close the PR#1129.

@facebook-github-bot
Copy link
Contributor

@ebsmothers has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ayulockin has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ebsmothers has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Nov 23, 2021
Summary:
🚀 I have extended the `WandbLogger` with the ability to log the `current.pt` checkpoint as W&B Artifacts. Note that this PR is based on top of this [PR](#1129).

### What is W&B Artifacts?

> W&B Artifacts was designed to make it effortless to version your datasets and models, regardless of whether you want to store your files with us or whether you already have a bucket you want us to track. Once you've tracked your dataset or model files, W&B will automatically log each and every modification, giving you a complete and auditable history of changes to your files.

Through this PR, W&B Artifacts can help save and organize machine learning models throughout a project's lifecycle. More details in the documentation [here](https://docs.wandb.ai/guides/artifacts/model-versioning).

### Modification

This PR adds a `log_model_checkpoint` method to the `WandbLogger` class in the `utils/logger.py` file. This method is called in the `utils/checkpoint.py` file.

### Usage

To use this, in the `config/defaults.yaml` do, `training.wandb.enabled=true` and `training.wandb.log_checkpoint=true`.

### Result

The screenshot shows the `current.pt` checkpoints saved at intervals defined by `training.checkpoint_interval`. You can check out the logged artifacts page [here](https://wandb.ai/ayut/mmf/artifacts/model/run_ey9xextf_model/0dc64164acbdc300fd01/api).

![image](https://user-images.githubusercontent.com/31141479/139390462-d5c8445e-5c20-4fdd-85d0-51ef64846bf0.png)

### Superpowers

With this small addition, now one can easily track different versions of the model, download a checkpoint of interest by using the API in the API tab, easily share the checkpoints with teammates, etc.

### Requests

This is a draft PR as there are a few more things that can be improved here.

* Is there a better way to access the path to the `current.pt` checkpoint? Rather is the modification made to `utils/checkpoint.py` an acceptable way of approaching this?

* While logging a file as W&B artifacts we can also provide metadata associated with that file. In this case, we can add current iteration, training metrics, etc. as the metadata. Would love to get suggestions about the different data points that I should log as metadata alongside the checkpoints.

* How to determine if a checkpoint is the best one? If a checkpoint is best I can add `best` as an alias for that checkpoint's artifact.

Pull Request resolved: #1137

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

**Static Docs Preview: mmf**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D32402090/V6/mmf/)|

|**Modified Pages**|
|[docs/notes/logger](https://our.intern.facebook.com/intern/staticdocs/eph/D32402090/V6/mmf/docs/notes/logger/)|

Reviewed By: apsdehal

Differential Revision: D32402090

Pulled By: ebsmothers

fbshipit-source-id: 94b881ec55c4197301331d571bc926521e2feecc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants