Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

He initialization #6

Closed
joaogui1 opened this issue Feb 21, 2020 · 8 comments
Closed

He initialization #6

joaogui1 opened this issue Feb 21, 2020 · 8 comments
Labels
enhancement New feature or request

Comments

@joaogui1
Copy link
Contributor

The default initialization for linear and convolutional modules seems to be Glorot initialization, but for the commonly used ReLU activation function He initialization is superior, while only requiring a quick change to the stddev definition, should we implement better defaults?
I know that there are many initialization schemes, I only suggest it as it would't be computationally expensive and would also be only a minor code change.

@trevorcai
Copy link
Contributor

Hey, thanks for the issue. This is intentional, as we explicitly match Sonnet v2 initialization schemes.

@joaogui1
Copy link
Contributor Author

joaogui1 commented Feb 21, 2020

How "limited" are we by sonnet? Will you accept PRs that implement things that are not in sonnet?
For example other initialization schemes, activations or layers

@tomhennigan tomhennigan added the enhancement New feature or request label Feb 21, 2020
@tomhennigan
Copy link
Collaborator

We aim to make it easy to port code from Sonnet to Haiku, so for core modules that can also be found in Sonnet we should match their API and defaults (in the same way jax.numpy is aligned with numpy).

New features are welcome 😄. In general I think in Haiku itself we should aim to only include well known modules/networks and we should make it really easy for folks to build anything custom that they want using the components in Haiku (e.g. we should be open to exposing utilities from core if needed, but we should not aim to have everything in core).

Concretely I think it would be great if you added an He initializer to Haiku as one of the initializers we support, but that we should keep the default Linear initializer as it is today.

@inoryy
Copy link
Contributor

inoryy commented Feb 21, 2020

Note that a variety of initializers are implicitly supported through the generic VarianceScaling initializer. For instance, here is how to initialize a Linear layer with the He scheme:

hk.Linear(num_units, w_init=hk.initializers.VarianceScaling(scale=2.0))

@tomhennigan
Copy link
Collaborator

Ah thank you @inoryy (hi again btw!), I'm glad that we have support for this. Perhaps it would be worth including the table from Sonnet showing how to drive variance scaling in common ways:

  ==============  ==============================================================
  Name            Parameters
  ==============  ==============================================================
  glorot_uniform  scale=1.0, mode=``fan_avg``, distribution=``uniform``
  glorot_normal   scale=1.0, mode=``fan_avg``, distribution=``truncated_normal``
  lecun_uniform   scale=1.0, mode=``fan_in``,  distribution=``uniform``
  lecun_normal    scale=1.0, mode=``fan_in``,  distribution=``truncated_normal``
  he_uniform      scale=2.0, mode=``fan_in``,  distribution=``uniform``
  he_normal       scale=2.0, mode=``fan_in``,  distribution=``truncated_normal``
  ==============  ==============================================================

https://sonnet.readthedocs.io/en/latest/api.html#variancescaling

@inoryy
Copy link
Contributor

inoryy commented Feb 21, 2020

@tomhennigan (hello! :)) yep, that table seems like a great idea! Maybe even include it both in code and as a separate note in the docs?

@joaogui1
Copy link
Contributor Author

Including it in the docs seems like a great idea!

@fredguth
Copy link

Just a suggestion in regards to Haiku documentation: create a topic in Initializers with "common" names like "he" and "glorot" in order for new developers to find out it inside "VarianceScaling".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants