Skip to content

Commit

Permalink
skorch doctor: a tool to understand the net (#912)
Browse files Browse the repository at this point in the history
A helper class to assist in understanding the neural net training

The SkorchDoctor helper class allows users to wrap their neural net before
training and then automatically collect useful data that allows to better
understand what is going on during training and how to possibly improve it.

The class will automatically record activations of each module + gradients and
updates of each learnable parameter, all of those for each training step. Once
training is finished, the user can either directly take a look at the data,
which is stored as an attribute on the helper class, or use one of the provided
plotting functions (requires matplotlib) to plot distributions of the data.

Examples of what conclusions could be drawn from the data:

- Net is not powerful enough
- Need for better weight initialization or normalization
- Need to adjust optimizer
- Need for gradient clipping

However, the helper class will not suggest any of those solutions itself, I
don't think that's possible. It is only intended to help surfacing potential
problems, it's up to the user to decide on a solution.

A notebook to show the usage of SkorchDoctor, once for a simple MLP and once for
fine-tuning a BERT model, is provided:

https://github.com/skorch-dev/skorch/blob/skorch-doctor/notebooks/Skorch_Doctor.ipynb

Implementation

Because of the additional data being collected, depending on the use case, a
significant memory overhead is expected. To keep this in check, a few measures
are taken:

- The collected data is immediately pulled to numpy to avoid clobbering GPU
  memory.
- It is documented, and shown in examples, that you should use only a small
  amount of data and low number of epochs, since that's enough to understand
  most problems. Most notably, this helps with storing less data about
  activations.
- For parameter updates, only a single scalar per weight/bias is stored,
  indicating the relative magnitude of the update.
- The biggest overhead will most likely come from storing the gradients, not
  sure if something can be done here without losing too much useful data.
- An option is provided to filter by layer/parameter name.

For storing activations, some heuristics are in place to deal with the output of
the modules. The problem here is that modules can return any arbitrary data from
their forward call. A few assumptions are being made here: The output can be
shoved into to_numpy and it has to be either a torch tensor, a list, a tuple, or
a mapping of torch tensors. If it's neither of those, an error is raised.

---------

Co-authored-by: BenjaminBossan <b.bossan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
  • Loading branch information
3 people committed May 4, 2023
1 parent 3c7fa60 commit 785b917
Show file tree
Hide file tree
Showing 10 changed files with 3,625 additions and 5 deletions.
2 changes: 1 addition & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added

- Add support for compiled PyTorch modules using the `torch.compile` function, introduced in [PyTorch 2.0 release](https://pytorch.org/get-started/pytorch-2.0/), which can greatly improve performance on new GPU architectures; to use it, initialize your net with the `compile=True` argument, further compilation arguments can be specified using the dunder notation, e.g. `compile__dynamic=True`
- Add a class [`DistributedHistory`](https://skorch.readthedocs.io/en/latest/history.html#skorch.history.DistributedHistory) which should be used when training in a multi GPU setting (#955)
- `SkorchDoctor`: A helper class that assists in understanding and debugging the neural net training, see [this notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb) (#912)

### Changed

Expand Down
37 changes: 37 additions & 0 deletions docs/user/neuralnet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -523,3 +523,40 @@ Those arguments are used to initialize your ``module``, ``criterion``,
etc. They are not fixed because we cannot know them in advance; in
fact, you can define any parameter for your ``module`` or other
components.

Diagnosing problems during training
-----------------------------------

When you find that your net is not training very well but have no idea why, the
:class:`skorch.helper.SkorchDoctor` can come to your aide. This class wraps your
net and automatically records the activations of all layers, as well as the
gradients and updates of all parameters. On top of that, it provides a couple of
plotting functions to visualize those values (using them requires ``matplotlib``
to be installed):

.. code:: python
net = NeuralNet(...)
doctor = SkorchDoctor(net)
X_sample, y_sample = X[:100], y[:100] # a few samples are enough
doctor.fit(X_sample, y_sample)
# now use the attributes and plotting functions to better
# understand the training process
doctor.activation_logs_ # the recorded activations
doctor.gradient_logs_ # the recorded gradients
doctor.param_update_logs_ # the recorded parameter updates
# the next steps require matplotlib to be installed
doctor.plot_loss()
doctor.plot_activations()
doctor.plot_gradients()
doctor.plot_param_updates()
doctor.plot_activations_over_time(<layer-name>)
doctor.plot_gradients_over_time(<param-name>)
These tools allow you to make more educated decisions about how to improve your
training process then just randomly trying out different hyper-parameters.

A more complete example can be found in the `SkorchDoctor notebook <https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb>`_.
2 changes: 2 additions & 0 deletions docs/user/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,5 @@ The following are examples and notebooks on how to use skorch.
* `Hugging Face Finetunging <https://nbviewer.jupyter.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb>`_ - Fine-tune a BERT model for text classification with the Hugging Face transformers library and skorch. `Run in Google Colab 💻 <https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb>`_

* `Hugging Face Vision Transformer <https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_VisionTransformer.ipynb>`_ - Show how to fine-tune a vision transformer model for a classification task using the Hugging Face transformers library and skorch. `Run in Google Colab 💻 <https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_VisionTransformer.ipynb>`_

* `SkorchDoctor <https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb>`_ - Diagnosing problems in training your neural net `Run in Google Colab 💻 <https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb>`_
1 change: 1 addition & 0 deletions notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
* [Hugging Face fine-tuning](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb)
* [Hugging Face Hub Checkpoints](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Model_Checkpoint.ipynb)
* [Hugging Face Vision Transformer](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_VisionTransformer.ipynb)
* [Skorch Doctor](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb)

0 comments on commit 785b917

Please sign in to comment.