skorch doctor: a tool to understand the net (#912)

A helper class to assist in understanding the neural net training The SkorchDoctor helper class allows users to wrap their neural net before training and then automatically collect useful data that allows to better understand what is going on during training and how to possibly improve it. The class will automatically record activations of each module + gradients and updates of each learnable parameter, all of those for each training step. Once training is finished, the user can either directly take a look at the data, which is stored as an attribute on the helper class, or use one of the provided plotting functions (requires matplotlib) to plot distributions of the data. Examples of what conclusions could be drawn from the data: - Net is not powerful enough - Need for better weight initialization or normalization - Need to adjust optimizer - Need for gradient clipping However, the helper class will not suggest any of those solutions itself, I don't think that's possible. It is only intended to help surfacing potential problems, it's up to the user to decide on a solution. A notebook to show the usage of SkorchDoctor, once for a simple MLP and once for fine-tuning a BERT model, is provided: https://github.com/skorch-dev/skorch/blob/skorch-doctor/notebooks/Skorch_Doctor.ipynb Implementation Because of the additional data being collected, depending on the use case, a significant memory overhead is expected. To keep this in check, a few measures are taken: - The collected data is immediately pulled to numpy to avoid clobbering GPU memory. - It is documented, and shown in examples, that you should use only a small amount of data and low number of epochs, since that's enough to understand most problems. Most notably, this helps with storing less data about activations. - For parameter updates, only a single scalar per weight/bias is stored, indicating the relative magnitude of the update. - The biggest overhead will most likely come from storing the gradients, not sure if something can be done here without losing too much useful data. - An option is provided to filter by layer/parameter name. For storing activations, some heuristics are in place to deal with the output of the modules. The problem here is that modules can return any arbitrary data from their forward call. A few assumptions are being made here: The output can be shoved into to_numpy and it has to be either a torch tensor, a list, a tuple, or a mapping of torch tensors. If it's neither of those, an error is raised. --------- Co-authored-by: BenjaminBossan <b.bossan@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
skorch-dev · May 4, 2023 · 785b917 · 785b917
1 parent 3c7fa60
commit 785b917
Show file tree

Hide file tree

Showing 10 changed files with 3,625 additions and 5 deletions.
diff --git a/CHANGES.md b/CHANGES.md
@@ -8,9 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
-
 - Add support for compiled PyTorch modules using the `torch.compile` function, introduced in [PyTorch 2.0 release](https://pytorch.org/get-started/pytorch-2.0/), which can greatly improve performance on new GPU architectures; to use it, initialize your net with the `compile=True` argument, further compilation arguments can be specified using the dunder notation, e.g. `compile__dynamic=True`
 - Add a class [`DistributedHistory`](https://skorch.readthedocs.io/en/latest/history.html#skorch.history.DistributedHistory) which should be used when training in a multi GPU setting (#955)
+- `SkorchDoctor`: A helper class that assists in understanding and debugging the neural net training, see [this notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb) (#912)
 
 ### Changed
 

diff --git a/docs/user/neuralnet.rst b/docs/user/neuralnet.rst
@@ -523,3 +523,40 @@ Those arguments are used to initialize your ``module``, ``criterion``,
 etc. They are not fixed because we cannot know them in advance; in
 fact, you can define any parameter for your ``module`` or other
 components.
+
+Diagnosing problems during training
+-----------------------------------
+
+When you find that your net is not training very well but have no idea why, the
+:class:`skorch.helper.SkorchDoctor` can come to your aide. This class wraps your
+net and automatically records the activations of all layers, as well as the
+gradients and updates of all parameters. On top of that, it provides a couple of
+plotting functions to visualize those values (using them requires ``matplotlib``
+to be installed):
+
+.. code:: python
+
+    net = NeuralNet(...)
+    doctor = SkorchDoctor(net)
+
+    X_sample, y_sample = X[:100], y[:100]  # a few samples are enough
+    doctor.fit(X_sample, y_sample)
+
+    # now use the attributes and plotting functions to better
+    # understand the training process
+    doctor.activation_logs_  # the recorded activations
+    doctor.gradient_logs_  # the recorded gradients
+    doctor.param_update_logs_  # the recorded parameter updates
+
+    # the next steps require matplotlib to be installed
+    doctor.plot_loss()
+    doctor.plot_activations()
+    doctor.plot_gradients()
+    doctor.plot_param_updates()
+    doctor.plot_activations_over_time(<layer-name>)
+    doctor.plot_gradients_over_time(<param-name>)
+
+These tools allow you to make more educated decisions about how to improve your
+training process then just randomly trying out different hyper-parameters.
+
+A more complete example can be found in the `SkorchDoctor notebook <https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb>`_.
diff --git a/docs/user/tutorials.rst b/docs/user/tutorials.rst
@@ -28,3 +28,5 @@ The following are examples and notebooks on how to use skorch.
 * `Hugging Face Finetunging <https://nbviewer.jupyter.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb>`_ - Fine-tune a BERT model for text classification with the Hugging Face transformers library and skorch. `Run in Google Colab 💻 <https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb>`_
 
 * `Hugging Face Vision Transformer <https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_VisionTransformer.ipynb>`_ - Show how to fine-tune a vision transformer model for a classification task using the Hugging Face transformers library and skorch. `Run in Google Colab 💻 <https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_VisionTransformer.ipynb>`_
+
+* `SkorchDoctor <https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb>`_ - Diagnosing problems in training your neural net `Run in Google Colab 💻 <https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb>`_
diff --git a/notebooks/README.md b/notebooks/README.md
@@ -10,3 +10,4 @@
 * [Hugging Face fine-tuning](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb)
 * [Hugging Face Hub Checkpoints](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Model_Checkpoint.ipynb)
 * [Hugging Face Vision Transformer](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_VisionTransformer.ipynb)
+* [Skorch Doctor](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb)