Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trainable theta and euler as discretizer #41

Merged
merged 3 commits into from Aug 6, 2021
Merged

Conversation

gsmalik
Copy link
Contributor

@gsmalik gsmalik commented May 12, 2021

What does this PR add?

  • It enables the user to optionally train theta
  • It allows the user to choose between 2 methods of discretization: 1. Zero Order Hold and 2. Euler, as compared to default discretization with Zero Order Hold Previosly.
  • Updated and new tests to test added features.

How is a trainable theta implemented?

  • First change that has been made is that we now always work with theta_inv = 1/theta since that can result in better gradients, if training theta. If training of theta is disabled, we still work with theta_inv but it does not get updated.
  • Note that user still specifies an init theta which is then internally inverted to theta_inv
  • if training of theta is enabled, then theta_inv is added as a weight of the layer. If not, then it is added as an attribute of the layer. This distinction is made so that this implementation stays compatible with models that were built with previous versions of keras-lmu (without trainable theta).

How does training with Euler work?

  • Since, theta can be decoupled from the A and B matrices when using euler, A and B (weights of the layer) are set to CONST_A and CONST_B and never updated if training theta.
  • If not training theta, A and B are set to CONST_Atheta_inv and CONST_Btheta_inv respectively (they are still not updated naturally).
  • However, the call function implementes the memory update as m = m + theta_inv*(A*m + B*u), thus capturing the gradient of theta_inv and ensuring that gradients of theta_inv are well composed.

How does training with Zero Order Hold (zoh) work?

  • Note that theta cannot be decoupled from A and B matrices when using zoh. Thus, when training, new A and B matrices are generated during the call function itself. This will be slower than discretizing with euler.
  • Note that a custom _cont2discrete function for discretizing with zoh has been implemented instead of using the previosuly default implementation from scipy.signal. This is because scipy.signal.cont2discrete only accepts numpy inputs and not tf.tensors, which will break the flow of gradients to theta_inv.

Where to start the review?

You can start from the commit of Add trainable theta and discretization options and then go to Update and add new tests. These are the only 2 main commits. There is an additional commit but that is a bones update.

Any other remarks?

  • According to my understanding, the examples CI run confirms that the new layer is compatible with previous models. But I think I will leave this question here since I am not 100% sure.
  • The get_config of each LMUCell, LMU and LMUFFT seralises theta_init as the theta parameter and not the final value. Leaving it here to confirm this makes sense.
  • Finally, given that support for TF2.1 has been dropped, does an update to [docs/compatibility list/pre-reqruisites] section need to be made somewhere?

Copy link
Member

@drasmuss drasmuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished up my fixups, and then one question below.

Summary of changes:

  • Simplified the A/B calculations (removed the _gen_constants method, and only call _gen_AB in build rather than in call)
  • Incorporated the identity matrix into euler A matrix in the trainable_theta=False case
  • Some improvements to the efficiency of cont2discerete_zoh implementation
  • Renamed train_theta to trainable_theta (for consistency with Keras API, e.g. self.theta_inv.trainable attribute)
  • Removed A/B caching in zoh training (it had a non-trivial impact on the training speed, and I think in general training speed is more important than inference speed, since users are likely to spend the vast majority of their time in training).
  • Stored A/B as constants rather than non-trainable variables. This can offer slight speed improvements. It does mean that you won't be able to load weights from earlier versions, but I think that's fine.
  • Disabled FFT if trainable_theta=True (this won't work, since the A/B matrices aren't being used in call)
  • Added a test to make sure that zoh and euler produce approximately the same output (otherwise we're only testing euler implementations against other euler implementations, so it's possible for the euler implementation to be completely incorrect but still pass all the tests because it is internally consistent)
  • Made some simplifications to the other new tests to focus on the aspects being covered in those tests
  • Set the minimum TensorFlow version back to 2.1.0 (I'm not sure what wasn't working before, but after I made the other changes everything was OK on 2.1)
  • Added a theta property to layers for retrieving the (possibly trained) value of theta

I also made some updates to the example notebook (had to update the weights for these changes, and then just noticed some other changes that could be made while I was doing that).

keras_lmu/layers.py Outdated Show resolved Hide resolved
constraint=tf.keras.constraints.NonNeg(),
)
else:
self.theta_inv = tf.constant(1 / self._init_theta, dtype=self.dtype)
Copy link
Contributor

@arvoelke arvoelke Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks here like it will also work when init_theta is an array instead of a scalar? That would be nice as that's how I've done it elsewhere (at least for euler; the zoh method might be too slow this way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might work with broadcasting, but it's definitely not tested/supported. We intentionally just left this with scalar thetas for now.

@drasmuss drasmuss force-pushed the train_theta branch 2 times, most recently from e00c270 to 70d7b69 Compare June 29, 2021 20:37
gsmalik and others added 3 commits June 30, 2021 09:48
Copy link
Member

@drasmuss drasmuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit to run the docs/examples builds on remote GPU. The FFT implementation on CPU is really slow (see tensorflow/tensorflow#6541), which was causing the notebook to time out.

I did implement a faster version of the CPU FFT (see 1257a40), but it didn't help enough (it scales with the number of available cores, and we only have two when running on TravisCI). Could be useful for some other reason in the future though, so I saved it in the train_theta2 branch.

Also, in the future we'll probably switched to the convolution-based implementation (see #42), which is much faster on CPU. We could probably switch back to running the build on CPU at that point.

With the builds all passing this LGTM!

@drasmuss drasmuss merged commit 928a692 into master Aug 6, 2021
@drasmuss drasmuss deleted the train_theta branch August 6, 2021 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants