Add trainable theta and euler as discretizer #41

gsmalik · 2021-05-12T15:18:33Z

What does this PR add?

It enables the user to optionally train theta
It allows the user to choose between 2 methods of discretization: 1. Zero Order Hold and 2. Euler, as compared to default discretization with Zero Order Hold Previosly.
Updated and new tests to test added features.

How is a trainable `theta` implemented?

First change that has been made is that we now always work with theta_inv = 1/theta since that can result in better gradients, if training theta. If training of theta is disabled, we still work with theta_inv but it does not get updated.
Note that user still specifies an init theta which is then internally inverted to theta_inv
if training of theta is enabled, then theta_inv is added as a weight of the layer. If not, then it is added as an attribute of the layer. This distinction is made so that this implementation stays compatible with models that were built with previous versions of keras-lmu (without trainable theta).

How does training with Euler work?

Since, theta can be decoupled from the A and B matrices when using euler, A and B (weights of the layer) are set to CONST_A and CONST_B and never updated if training theta.
If not training theta, A and B are set to CONST_Atheta_inv and CONST_Btheta_inv respectively (they are still not updated naturally).
However, the call function implementes the memory update as m = m + theta_inv*(A*m + B*u), thus capturing the gradient of theta_inv and ensuring that gradients of theta_inv are well composed.

How does training with Zero Order Hold (zoh) work?

Note that theta cannot be decoupled from A and B matrices when using zoh. Thus, when training, new A and B matrices are generated during the call function itself. This will be slower than discretizing with euler.
Note that a custom _cont2discrete function for discretizing with zoh has been implemented instead of using the previosuly default implementation from scipy.signal. This is because scipy.signal.cont2discrete only accepts numpy inputs and not tf.tensors, which will break the flow of gradients to theta_inv.

Where to start the review?

You can start from the commit of Add trainable theta and discretization options and then go to Update and add new tests. These are the only 2 main commits. There is an additional commit but that is a bones update.

Any other remarks?

According to my understanding, the examples CI run confirms that the new layer is compatible with previous models. But I think I will leave this question here since I am not 100% sure.
The get_config of each LMUCell, LMU and LMUFFT seralises theta_init as the theta parameter and not the final value. Leaving it here to confirm this makes sense.
Finally, given that support for TF2.1 has been dropped, does an update to [docs/compatibility list/pre-reqruisites] section need to be made somewhere?

drasmuss

Finished up my fixups, and then one question below.

Summary of changes:

Simplified the A/B calculations (removed the _gen_constants method, and only call _gen_AB in build rather than in call)
Incorporated the identity matrix into euler A matrix in the trainable_theta=False case
Some improvements to the efficiency of cont2discerete_zoh implementation
Renamed train_theta to trainable_theta (for consistency with Keras API, e.g. self.theta_inv.trainable attribute)
Removed A/B caching in zoh training (it had a non-trivial impact on the training speed, and I think in general training speed is more important than inference speed, since users are likely to spend the vast majority of their time in training).
Stored A/B as constants rather than non-trainable variables. This can offer slight speed improvements. It does mean that you won't be able to load weights from earlier versions, but I think that's fine.
Disabled FFT if trainable_theta=True (this won't work, since the A/B matrices aren't being used in call)
Added a test to make sure that zoh and euler produce approximately the same output (otherwise we're only testing euler implementations against other euler implementations, so it's possible for the euler implementation to be completely incorrect but still pass all the tests because it is internally consistent)
Made some simplifications to the other new tests to focus on the aspects being covered in those tests
Set the minimum TensorFlow version back to 2.1.0 (I'm not sure what wasn't working before, but after I made the other changes everything was OK on 2.1)
Added a theta property to layers for retrieving the (possibly trained) value of theta

I also made some updates to the example notebook (had to update the weights for these changes, and then just noticed some other changes that could be made while I was doing that).

keras_lmu/layers.py

arvoelke · 2021-06-28T20:55:19Z

keras_lmu/layers.py

+                constraint=tf.keras.constraints.NonNeg(),
+            )
+        else:
+            self.theta_inv = tf.constant(1 / self._init_theta, dtype=self.dtype)


It looks here like it will also work when init_theta is an array instead of a scalar? That would be nice as that's how I've done it elsewhere (at least for euler; the zoh method might be too slow this way).

That might work with broadcasting, but it's definitely not tested/supported. We intentionally just left this with scalar thetas for now.

Co-authored-by: Daniel Rasmussen <daniel.rasmussen@appliedbrainresearch.com>

drasmuss

Added a commit to run the docs/examples builds on remote GPU. The FFT implementation on CPU is really slow (see tensorflow/tensorflow#6541), which was causing the notebook to time out.

I did implement a faster version of the CPU FFT (see 1257a40), but it didn't help enough (it scales with the number of available cores, and we only have two when running on TravisCI). Could be useful for some other reason in the future though, so I saved it in the train_theta2 branch.

Also, in the future we'll probably switched to the convolution-based implementation (see #42), which is much faster on CPU. We could probably switch back to running the build on CPU at that point.

With the builds all passing this LGTM!

arvoelke mentioned this pull request May 12, 2021

Update theta's type in docstrings #31

Closed

gsmalik force-pushed the train_theta branch from c9fcd99 to 1f89ac3 Compare June 16, 2021 20:08

drasmuss force-pushed the train_theta branch 5 times, most recently from 7e1b230 to 357044f Compare June 21, 2021 19:47

drasmuss reviewed Jun 22, 2021

View reviewed changes

keras_lmu/layers.py Outdated Show resolved Hide resolved

drasmuss force-pushed the train_theta branch from ed73d4b to cec965a Compare June 23, 2021 14:13

arvoelke reviewed Jun 28, 2021

View reviewed changes

drasmuss force-pushed the train_theta branch 2 times, most recently from e00c270 to 70d7b69 Compare June 29, 2021 20:37

gsmalik and others added 3 commits June 30, 2021 09:48

Add trainable theta and discretization options

183fe79

Co-authored-by: Daniel Rasmussen <daniel.rasmussen@appliedbrainresearch.com>

Update example notebook

1052e88

Run docs/example builds on remote GPU

928a692

drasmuss force-pushed the train_theta branch from a823c0d to 928a692 Compare June 30, 2021 12:48

drasmuss approved these changes Jun 30, 2021

View reviewed changes

drasmuss merged commit 928a692 into master Aug 6, 2021

drasmuss deleted the train_theta branch August 6, 2021 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trainable theta and euler as discretizer #41

Add trainable theta and euler as discretizer #41

gsmalik commented May 12, 2021 •

edited

drasmuss left a comment •

edited

arvoelke Jun 28, 2021 •

edited

drasmuss Jun 30, 2021

drasmuss left a comment •

edited

Add trainable theta and euler as discretizer #41

Add trainable theta and euler as discretizer #41

Conversation

gsmalik commented May 12, 2021 • edited

What does this PR add?

How is a trainable theta implemented?

How does training with Euler work?

How does training with Zero Order Hold (zoh) work?

Where to start the review?

Any other remarks?

drasmuss left a comment • edited

Choose a reason for hiding this comment

arvoelke Jun 28, 2021 • edited

Choose a reason for hiding this comment

drasmuss Jun 30, 2021

Choose a reason for hiding this comment

drasmuss left a comment • edited

Choose a reason for hiding this comment

gsmalik commented May 12, 2021 •

edited

How is a trainable `theta` implemented?

drasmuss left a comment •

edited

arvoelke Jun 28, 2021 •

edited

drasmuss left a comment •

edited