Skip to content

Commit

Permalink
Change: refactor skorch for more consistency when adding custom modul…
Browse files Browse the repository at this point in the history
…es etc. (#751)

* Don't reinitialize uninitialized net bc set_params

Previously, when a parameter on, say, the module was changed via
set_params (e.g. net.set_params(module__hidden_units=123)), set_params
would always trigger (re-)initialization of the module. However, when
the net was not initialized in the first place, this is unnecessary. It
is sufficient to set the new attribute and wait for the net to be
initialized later.

Fortunately, this change doesn't seem to have any further impact, i.e.
we didn't implicitly rely on this behavior anywhere. The only exceptions
are 2 tests in test_cli.py, but those can easily be adjusted and this
shouldn't have any user impact.

* Simplify initialize_* methods

These methods started to become complicated because they did the
following:

1. Check if there is anything to initialize at all
2. Print message about reason for potential re-initialization
3. Moving to device

That made it quite difficult to override them without forgetting about
some aspect. With this change, there are now corresponding _intialize_*
methods that are called by net.initialize() and net.set_params. These
new methods now take care of the points above and call the initialize_*
methods inside.

Now, we can more easily make sure that the user can override
initialize_* without anything important being forgotten.

* Further clean up of set_params re-initialization

Removed code for states that could not be reached because of virtual
params. This simplifies the logic considerably.

* Rework logic of creating custom modules/optimizers

So far, the logic for creating custom modules or optimizers was separate
from the logic that created the default module, criterion and optimizer.
E.g., the "prefixes_" attribute was prefilled with 'module_',
'criterion_' and 'optimizer_'. This makes dealing with custom
modules/optimizers (e.g. creating a second module called 'mymodule_')
more difficult, because the logic for treating those was completely
disjoint from the logic of how the default modules/optimizer were
treated.

This change actually removes most of the "special status" of
module/criterion/optimizer. Therefore, the logic to treat those is now
the same as for any custom module. So for instance, they are no longer
pre-registered but instead are only registered later during their
initialize_* methods.

This is implemented by moving the registration to the respective
initialize_* methods. This is because during __init__, we don't actually
know if we deal with a module or optimizer yet (passed argument for
'module' can, for instance, be a function, so we cannot type check). But
during 'initialize', when the actual instances are created, we can check
if we deal with a nn.Module or optim.Optimizer. If we do, we register
them.

So overall, the logic and role of 'initialize' have changed. Users will
be expected to set custom modules/optimizers during their respective
'initialize_*' methods from now on (stricter checks and doc updates will
be added). This affords us to no longer rely on the name to infer the
function (remember that previously, a custom module needed to contain
the substring 'module', which is an ugly restriction).

As more of a side effect to these changes, the '_check_kwargs' call was
moved to 'initialize' as well, since we cannot really check for faulty
kwargs as long as we don't know what modules and optimizers will be
registered.

* Add battery of tests for custom modules/optimizers

Right now, there is a big hole in the treatment of custom
modules/optimizers that distinguishes them from the assumed
ones ('module', 'criterion', 'optimizer'). This battery of unit tests
covers behaviors that will fail but really shouldn't:

- custom module parameters should be passed to the optimizer
- set_params on a custom module should trigger re-initialization of
  criterion and optimizer
- set_params on a custom criterion should trigger re-initialization of
  optimizer
- custom modules and criteria are not automatically moved to cuda

* Remove _PYTORCH_COMPONENTS global

Since custom components are no longer matched by name, this became
obsolete.

* All optimizers perform updates automatically

Before this, only the default "optimizer_" was used and all others were
being ignored. With this change, "zero_grad" and "step" are called on
all optimizers automatically.

* Fix corner case with pre-initialized modules

This case had to be covered yet: When the module/criterion is already
initialized and none of it's parameters changed,
initialize_module/criterion was not called. However, what if a custom
module/criterion does need to be initialized? In that case, not calling
initialize_module/criterion is bad.

With this fix, this bad behavior no longer occurs. Tests were added to
cover this.

In order to achieve this change, we had to unfortunately push down the
checking whether module/criterion is already initialized from
_initialize_module/criterion to initialize_module/criterion. There was
no other way of checking this, since at first, we cannot know which
attributes are modules/criteria.

For the user, this means a little more work if they want to implement
initialize_module/criterion absolutely correctly. However, that's not so
bad because it is only important if the user wants to work with
pre-initialized modules/criteria and with custom modules/criteria, which
should happen very rarely in practice. And even if it does, the user can
just copy the default skorch code and will end up with a correct
implementation.

* Custom modules are set to train/eval mode

Until now, only module_ and criterion_ were automatically set into
training/evaluation mode, now custom modules are also set automatically.
This was implemented through a new method, net._set_training. It is
private for now, maybe consider adding a public one. Also, the name
could be changed to "train" as in PyTorch, but that name could be
confusing.

* Reviewer comment: Consider virtual params

I did not correctly handle virtual params with custom optimizers. This
has been fixed now. The ambiguous 'lr' parameter is only associated with
the default 'optimizer', not any custom optimizer, which need to be
addressed by 'myoptimizer__lr'.

* For completeness, the text from the PR is copied below:
Motivation

The initial reason why I wanted to work on this is that I'm currently
implementing a gpytorch integration (see this branch). For this, a big part is
adding a new custom module called "likelihood". Doing this correctly was
actually not trivial and required a lot of more or less duplicated code. Putting
such a burden on a user with less experience with skorch would not be possible.

The main reason for this difficulty is that module, criterion and optimizer are
treated "special" so far. We assume that they are already there and build
everything else around this. If a custom module is added, the user needs to be
aware of all the places where this is relevant, which is too error prone.
Previous work

Some changes to facilitate adding custom modules were already implemented in
#597. However, they don't go far enough.

Main changes

With this PR, we remove the special status of module, criterion and optimizer.
Instead, all the work that needs to be done when adding any of them to the net
is now implemented in a generic manner. This way, custom modules etc. can re-use
the same functionality and can therefore expect to be treated the same as these
"first class" components.

Here is a list of changed that were added to that effect:

* Until now, custom module parameters were not trained by the optimizer, now
  they are
* Until now, custom modules/criteria were not automatically moved to the
  indicated device, now they are
* Until now, custom modules/criteria were not automatically set into train/eval
  mode, now they are
* Simplified implementation of initialize_module et al. - they contained a lot
  of stuff that was irrelevant for the user, like messaging about why something
  was re-initialized; now this stuff is done inside the newly added methods
  _initialize_module etc., which are called by initialize() and shouldn't be a
  bother to the user
* Adding a custom module no longer requires the attribute name to contain the
  substring "module" (which was really not a nice solution), same for criterion
  and optimizer
* Re-initialization logic was changed: When any module is changed (via
  set_params), this triggers re-initialization of all modules, criteria and
  optimizers; when any criterion is changed, this triggers re-initialization of
  all optimizers (but not modules); this is a bit defensive since it could
  trigger unnecessary inits but it's better than missing any inits

Additions

* There is now a get_learnable_params method on the net to retrieve all
  learnable parameters (instead of just those of module_); it is meant to be
  overridable by the user (e.g. when they have two optimizers for two modules)
* Added attributes modules_, criteria_ and optimizers_ to the net to keep track
  of those; first started as OrderedDicts to mirror nn.Modules, but that was
  flaky, as the values in the dict would often get out of sync with the
  attributes on the net
* If the criterion/criteria have learnable params, those are now passed to the
  optimizer as well (think GANs)

Minor changes

* net.set_params(...) no longer initializes the net if it's not yet initialized
  - this was simply unnecessary and could lead to some unexpected behavior
* custom module instances now need to be set inside initialize_module (and the
  name must end on an underscore), else the user will get an appropriate error
  message; same logic for criterion and optimizer
* added a bunch of unit tests for the custom modules etc. that cover the cases
  not covered so far
* checking of kwargs is now done during initialize, not during __init__ anymore,
  since at that point, we don't know yet what custom modules could exist
* run a _check_kwargs during set_params - previously, this was a loophole that
  allowed users to set params with typos etc.
* two unconditional print statements are now conditioned on verbosity level

Notes

I took extra effort to write the code as clearly as possible and add lots of
comments, since this touches some complicated parts of the code base. But if
something is not obvious, please tell me so that I can improve the code since
now it's still fresh in my mind.

You will see that a few of the existing tests have been changed to now call
initialize on the net when previously they didn't. The reason is that some work
like checking kwargs is now moved to initialize.

Also, you will see that some tests now use mocked modules to check for device
calls. I found this preferable to actually moving to 'cuda' since this will also
work without a cuda device (e.g. during CI).
  • Loading branch information
BenjaminBossan committed Jun 13, 2021
1 parent 37313f5 commit 812f54d
Show file tree
Hide file tree
Showing 9 changed files with 1,491 additions and 414 deletions.
6 changes: 4 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Added `load_best` attribute to `Checkpoint` callback to automatically load state of the
best result at the end of training
- Added `load_best` attribute to `Checkpoint` callback to automatically load state of the best result at the end of training
- Added a `get_all_learnable_params` method to retrieve the named parameters of all PyTorch modules defined on the net, including of criteria if applicable

### Changed

- Changed the signature of `validation_step`, `train_step_single`, `train_step`, `evaluation_step`, `on_batch_begin`, and `on_batch_end` such that instead of receiving `X` and `y`, they receive the whole batch; this makes it easier to deal with datasets that don't strictly return an `(X, y)` tuple, which is true for quite a few PyTorch datasets; please refer to the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-9-to-0-10) if you encounter problems
- Checking of arguments to `NeuralNet` is now during `.initialize()`, not during `__init__`, to avoid raising false positives for yet unknown module or optimizer attributes
- Modules, criteria, and optimizers that are added to a net by the user are now first class: skorch takes care of setting train/eval mode, moving to the indicated device, and updating all learnable parameters during training (check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details)

### Fixed

Expand Down
199 changes: 181 additions & 18 deletions docs/user/customization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,26 @@ Customization
Customizing NeuralNet
---------------------

:class:`.NeuralNet` and its subclasses like
:class:`.NeuralNetClassifier` are already very flexible as they are
and should cover many use cases by adjusting the provided
parameters. However, this may not always be sufficient for your use
cases. If you thus find yourself wanting to customize
:class:`.NeuralNet`, please follow these guidelines.
Apart from the :class:`.NeuralNet` base class, we provide
:class:`.NeuralNetClassifier`, :class:`.NeuralNetBinaryClassifier`,
and :class:`.NeuralNetRegressor` for typical classification, binary
classification, and regressions tasks. They should work as drop-in
replacements for sklearn classifiers and regressors.

Initialization
^^^^^^^^^^^^^^
The :class:`.NeuralNet` class is a little less opinionated about the
incoming data, e.g. it does not determine a loss function by default.
Therefore, if you want to write your own subclass for a special use
case, you would typically subclass from :class:`.NeuralNet`. The
:func:`~skorch.net.NeuralNet.predict` method returns the same output
as :func:`~skorch.net.NeuralNet.predict_proba` by default, which is
the module output (or the first module output, in case it returns
multiple values).

The method :func:`~skorch.net.NeuralNet.initialize` is responsible for
initializing all the components needed by the net, e.g. the module and
the optimizer. For this, it calls specific initialization methods,
such as :func:`~skorch.net.NeuralNet.initialize_module` and
:func:`~skorch.net.NeuralNet.initialize_optimizer`. If you'd like to
customize the initialization behavior, you should override the
corresponding methods. Following sklearn conventions, the created
components should be set as an attribute with a trailing underscore as
the name, e.g. ``module_`` for the initialized module. Finally, the
method should return ``self``.
:class:`.NeuralNet` and its subclasses are already very flexible as they are and
should cover many use cases by adjusting the provided parameters or by using
callbacks. However, this may not always be sufficient for your use cases. If you
thus find yourself wanting to customize :class:`.NeuralNet`, please follow the
guidelines in this section.

Methods starting with get_*
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -38,6 +38,31 @@ quite sure, consult their documentations. In general, these methods
are fairly safe to override as long as you make sure to conform to the
same signature as the original.

A short example should serve to illustrate this.
:func:`~skorch.net.NeuralNet.get_loss` is called when the loss is determined.
Below we show an example of overriding :func:`~skorch.net.NeuralNet.get_loss` to
add L1 regularization to our total loss:

.. code:: python
class RegularizedNet(NeuralNet):
def __init__(self, *args, lambda1=0.01, **kwargs):
super().__init__(*args, **kwargs)
self.lambda1 = lambda1
def get_loss(self, y_pred, y_true, X=None, training=False):
loss = super().get_loss(y_pred, y_true, X=X, training=training)
loss += self.lambda1 * sum([w.abs().sum() for w in self.module_.parameters()])
return loss
.. note:: This example also regularizes the biases, which you typically
don't need to do.

It is often a good idea to call ``super`` of the method you override, to make
sure that everything that needs to happen inside that method does happen. If you
don't, you should make sure to take care of everything that needs to happen by
following the original implementation.

Training and validation
^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -96,3 +121,141 @@ perform some book keeping, like making sure that callbacks are handled
or writing logs to the ``history``. If you do need to override these,
make sure that you perform the same book keeping as the original
methods.

Initialization and custom modules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The method :func:`~skorch.net.NeuralNet.initialize` is responsible for
initializing all the components needed by the net, e.g. the module and
the optimizer. For this, it calls specific initialization methods,
such as :func:`~skorch.net.NeuralNet.initialize_module` and
:func:`~skorch.net.NeuralNet.initialize_optimizer`. If you'd like to
customize the initialization behavior, you should override the
corresponding methods. Following sklearn conventions, the created
components should be set as an attribute with a trailing underscore as
the name, e.g. ``module_`` for the initialized module.

A possible modification you may want to make is to add more modules, criteria,
and optimizers to your net. This is possible in skorch by following the
guidelines below. If you do this, your custom modules and optimizers will be
treated as "first class citizens" in skorch land. This means:

1. The parameters of your custom modules are automatically passed to the
optimizer (but you can modify this behavior).
2. skorch takes care of moving your modules to the correct device.
3. skorch takes care of setting the training/eval mode correctly.
4. When a module needs to be re-initialized because ``set_params`` was called,
all modules and optimizers that may depend on it are also re-initialized.
This is for instance important for the optimizer, which must know about the
parameters of the newly initialized module.
5. You can pass arguments to the custom modules and optimizers using the now
familiar double-underscore notation. E.g., you can initialize your net like
this:

.. code:: python
net = MyNet(
module=MyModule,
module__num_units=100,
othermodule=MyOtherModule,
othermodule__num_units=200,
)
net.fit(X, y)
A word about the distinction between modules and criteria made by skorch:
Typically, criteria are also just subclasses of PyTorch
:class:`~torch.nn.Module`. As such, skorch moves them to CUDA if that is the
indicated device and will even pass parameters of criteria to the optimizers, if
there are any. This can be useful when e.g. training GANs, where you might
implement the discriminator as the criterion (and the generator as the module).

A difference between module and criterion is that the output of modules are used
for generating the predictions and are thus returned by
:func:`~skorch.net.NeuralNet.predict` etc. In contrast, the output of the
criterion is used for calculating the loss and should therefore be a scalar.

skorch assumes that criteria may depend on the modules. Therefore, if a module
is re-initialized, all criteria are also re-initialized, but not vice-versa. On
top of that, the optimizer is re-initialized when either modules or criteria
are changed.

So after all this talk, what are the aforementioned guidelines to add your own
modules, criteria, and optimizers? You have to follow these rules:

1. Initialize them during their respective ``initialize_`` methods, e.g. modules
should be set inside :func:`~skorch.net.NeuralNet.initialize_module`.
2. If they have learnable parameters, they should be instances of
:class:`~torch.nn.Module`. Optimizers should be instances of
:class:`~torch.optim.Optimizer`.
3. Their names should end on an underscore. This is true for all attributes that
are created during ``initialize`` and distinguishes them from arguments
passed to ``__init__``. So a name for a custom module could be ``mymodule_``.
4. Inside the initialization method, use :meth:`.get_params_for` (or,
if dealing with an optimizer, :meth:`.get_params_for_optimizer`) to
retrieve the arguments for the constructor of the instance.

Here is an example of how this could look like in practice:

.. code:: python
class MyNet(NeuralNet):
def initialize_module(self):
super().initialize_module()
# add an additional module called 'module2_'
params = self.get_params_for('module2')
self.module2_ = Module2(**params)
return self
def initialize_criterion(self):
super().initialize_criterion()
# add an additional criterion called 'other_criterion_'
params = self.get_params_for('other_criterion')
self.other_criterion_ = nn.BCELoss(**params)
return self
def initialize_optimizer(self):
# first initialize the normal optimizer
named_params = self.module_.named_parameters()
args, kwargs = self.get_params_for_optimizer('optimizer', named_params)
self.optimizer_ = self.optimizer(*args, **kwargs)
# next add an another optimizer called 'optimizer2_' that is
# only responsible for training 'module2_'
named_params = self.module2_.named_parameters()
args, kwargs = self.get_params_for_optimizer('optimizer2', named_params)
self.optimizer2_ = torch.optim.SGD(*args, **kwargs)
return self
... # additional changes
net = MyNet(
...,
module2__num_units=123,
other_criterion__reduction='sum',
optimizer2__lr=0.1,
)
net.fit(X, y)
# set_params works
net.set_params(optimizer2__lr=0.05)
net.partial_fit(X, y)
# grid search et al. works
search = GridSearchCV(net, {'module2__num_units': [10, 50, 100]}, ...)
search.fit(X, y)
In this example, a new criterion, a new module, and a new optimizer
were added. Of course, additional changes should be made to the net so
that those new components are actually being used for something, but
this example should illustrate how to start. Since the rules outlined
above are being followed, we can use grid search on our customly
defined components.

.. note:: In the example above, the parameters of ``module_`` are trained by
``optimzer_`` and the parameters of ``module2_`` are trained by
``optimizer2_``. To conveniently obtain the parameters of all modules,
call the method :func:`~skorch.net.NeuralNet.get_all_learnable_params`.
130 changes: 0 additions & 130 deletions docs/user/neuralnet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -514,133 +514,3 @@ Those arguments are used to initialize your ``module``, ``criterion``,
etc. They are not fixed because we cannot know them in advance; in
fact, you can define any parameter for your ``module`` or other
components.

All special prefixes are stored in the ``prefixes_`` class attribute
of :class:`.NeuralNet`. Currently, they are:

- ``module``
- ``iterator_train``
- ``iterator_valid``
- ``optimizer``
- ``criterion``
- ``callbacks``
- ``dataset``

Subclassing NeuralNet
---------------------

Apart from the :class:`.NeuralNet` base class, we provide
:class:`.NeuralNetClassifier`, :class:`.NeuralNetBinaryClassifier`,
and :class:`.NeuralNetRegressor` for typical classification, binary
classification, and regressions tasks. They should work as drop-in
replacements for sklearn classifiers and regressors.

The :class:`.NeuralNet` class is a little less opinionated about the
incoming data, e.g. it does not determine a loss function by default.
Therefore, if you want to write your own subclass for a special use
case, you would typically subclass from :class:`.NeuralNet`.

skorch aims at making subclassing as easy as possible, so that it
doesn't stand in your way. For instance, all components (``module``,
``optimizer``, etc.) have their own initialization method
(:meth:`.initialize_module`, :meth:`.initialize_optimizer`,
etc.). That way, if you want to modify the initialization of a
component, you can easily do so.

Additonally, :class:`.NeuralNet` has a couple of ``get_*`` methods for
when a component is retrieved repeatedly. E.g.,
:func:`~skorch.net.NeuralNet.get_loss` is called when the loss is
determined. Below we show an example of overriding
:func:`~skorch.net.NeuralNet.get_loss` to add L1 regularization to our
total loss:

.. code:: python
class RegularizedNet(NeuralNet):
def __init__(self, *args, lambda1=0.01, **kwargs):
super().__init__(*args, **kwargs)
self.lambda1 = lambda1
def get_loss(self, y_pred, y_true, X=None, training=False):
loss = super().get_loss(y_pred, y_true, X=X, training=training)
loss += self.lambda1 * sum([w.abs().sum() for w in self.module_.parameters()])
return loss
.. note:: This example also regularizes the biases, which you typically
don't need to do.

It is possible to add your own criterion, module, or optimizer to your
customized neural net class. You should follow a few rules when you do
so:

1. Set this attribute inside the corresponding method. E.g., when
setting an optimizer, use :meth:`.initialize_optimizer` for that.
2. Inside the initialization method, use :meth:`.get_params_for` (or,
if dealing with an optimizer, :meth:`.get_params_for_optimizer`) to
retrieve the arguments for the constructor.
3. The attribute name should contain the substring ``"module"`` if
it's a module, ``"criterion"`` if a criterion, and ``"optimizer"``
if an optimizer. This way, skorch knows if a change in
parameters (say, because :meth:`.set_params` was called) should
trigger re-initialization.

When you follow these rules, you will make sure that your added
components are amenable to :meth:`.set_params` and hence to things
like grid search.

Here is an example of how this could look like in practice:

.. code:: python
class MyNet(NeuralNet):
def initialize_criterion(self, *args, **kwargs):
super().initialize_criterion(*args, **kwargs)
# add an additional criterion
params = self.get_params_for('other_criterion')
self.other_criterion_ = nn.BCELoss(**params)
return self
def initialize_module(self, *args, **kwargs):
super().initialize_module(*args, **kwargs)
# add an additional module called 'mymodule'
params = self.get_params_for('mymodule')
self.mymodule_ = MyModule(**params)
return self
def initialize_optimizer(self, *args, **kwargs):
super().initialize_optimizer(*args, **kwargs)
# add an additional optimizer called 'optimizer2' that is
# responsible for 'mymodule'
named_params = self.mymodule_.named_parameters()
pgroups, params = self.get_params_for_optimizer('optimizer2', named_params)
self.optimizer2_ = torch.optim.SGD(*pgroups, **params)
return self
... # additional changes
net = MyNet(
...,
other_criterion__reduction='sum',
mymodule__num_units=123,
optimizer2__lr=0.1,
)
net.fit(X, y)
# set_params works
net.set_params(optimizer2__lr=0.05)
net.partial_fit(X, y)
# grid search et al. works
search = GridSearchCV(net, {'mymodule__num_units': [10, 50, 100]}, ...)
search.fit(X, y)
In this example, a new criterion, a new module, and a new optimizer
were added. Of course, additional changes should be made to the net so
that those new components are actually being used for something, but
this example should illustrate how to start. Since the rules outlined
above are being followed, we can use grid search on our customly
defined components.
2 changes: 1 addition & 1 deletion skorch/callbacks/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -532,7 +532,7 @@ def initialize(self):
return self

def named_parameters(self, net):
return net.module_.named_parameters()
return net.get_all_learnable_params()

def filter_parameters(self, patterns, params):
pattern_fns = (
Expand Down
4 changes: 4 additions & 0 deletions skorch/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ class NotInitializedError(SkorchException):
"""


class SkorchAttributeError(SkorchException):
"""An attribute was set incorrectly on a skorch net."""


class SkorchWarning(UserWarning):
"""Base skorch warning."""

Expand Down

0 comments on commit 812f54d

Please sign in to comment.