He initialization #6

joaogui1 · 2020-02-21T05:43:20Z

The default initialization for linear and convolutional modules seems to be Glorot initialization, but for the commonly used ReLU activation function He initialization is superior, while only requiring a quick change to the stddev definition, should we implement better defaults?
I know that there are many initialization schemes, I only suggest it as it would't be computationally expensive and would also be only a minor code change.

trevorcai · 2020-02-21T11:52:10Z

Hey, thanks for the issue. This is intentional, as we explicitly match Sonnet v2 initialization schemes.

joaogui1 · 2020-02-21T11:56:04Z

How "limited" are we by sonnet? Will you accept PRs that implement things that are not in sonnet?
For example other initialization schemes, activations or layers

tomhennigan · 2020-02-21T14:35:28Z

We aim to make it easy to port code from Sonnet to Haiku, so for core modules that can also be found in Sonnet we should match their API and defaults (in the same way jax.numpy is aligned with numpy).

New features are welcome 😄. In general I think in Haiku itself we should aim to only include well known modules/networks and we should make it really easy for folks to build anything custom that they want using the components in Haiku (e.g. we should be open to exposing utilities from core if needed, but we should not aim to have everything in core).

Concretely I think it would be great if you added an He initializer to Haiku as one of the initializers we support, but that we should keep the default Linear initializer as it is today.

inoryy · 2020-02-21T15:15:38Z

Note that a variety of initializers are implicitly supported through the generic VarianceScaling initializer. For instance, here is how to initialize a Linear layer with the He scheme:

hk.Linear(num_units, w_init=hk.initializers.VarianceScaling(scale=2.0))

tomhennigan · 2020-02-21T15:53:59Z

Ah thank you @inoryy (hi again btw!), I'm glad that we have support for this. Perhaps it would be worth including the table from Sonnet showing how to drive variance scaling in common ways:

  ==============  ==============================================================
  Name            Parameters
  ==============  ==============================================================
  glorot_uniform  scale=1.0, mode=``fan_avg``, distribution=``uniform``
  glorot_normal   scale=1.0, mode=``fan_avg``, distribution=``truncated_normal``
  lecun_uniform   scale=1.0, mode=``fan_in``,  distribution=``uniform``
  lecun_normal    scale=1.0, mode=``fan_in``,  distribution=``truncated_normal``
  he_uniform      scale=2.0, mode=``fan_in``,  distribution=``uniform``
  he_normal       scale=2.0, mode=``fan_in``,  distribution=``truncated_normal``
  ==============  ==============================================================

https://sonnet.readthedocs.io/en/latest/api.html#variancescaling

inoryy · 2020-02-21T18:06:37Z

@tomhennigan (hello! :)) yep, that table seems like a great idea! Maybe even include it both in code and as a separate note in the docs?

joaogui1 · 2020-02-21T21:25:58Z

Including it in the docs seems like a great idea!

fredguth · 2023-02-12T14:37:22Z

Just a suggestion in regards to Haiku documentation: create a topic in Initializers with "common" names like "he" and "glorot" in order for new developers to find out it inside "VarianceScaling".

tomhennigan added the enhancement New feature or request label Feb 21, 2020

copybara-service bot closed this as completed in 6070457 Feb 21, 2020

tomhennigan mentioned this issue Mar 27, 2020

He initialization google-deepmind/sonnet#160

Closed

BrunoGomesCoelho mentioned this issue Jun 25, 2021

Check weight initilization icmc-data/BadGlobalMinima#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

He initialization #6

He initialization #6

joaogui1 commented Feb 21, 2020

trevorcai commented Feb 21, 2020

joaogui1 commented Feb 21, 2020 •

edited

Loading

tomhennigan commented Feb 21, 2020

inoryy commented Feb 21, 2020 •

edited

Loading

tomhennigan commented Feb 21, 2020

inoryy commented Feb 21, 2020

joaogui1 commented Feb 21, 2020

fredguth commented Feb 12, 2023

He initialization #6

He initialization #6

Comments

joaogui1 commented Feb 21, 2020

trevorcai commented Feb 21, 2020

joaogui1 commented Feb 21, 2020 • edited Loading

tomhennigan commented Feb 21, 2020

inoryy commented Feb 21, 2020 • edited Loading

tomhennigan commented Feb 21, 2020

inoryy commented Feb 21, 2020

joaogui1 commented Feb 21, 2020

fredguth commented Feb 12, 2023

joaogui1 commented Feb 21, 2020 •

edited

Loading

inoryy commented Feb 21, 2020 •

edited

Loading