-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
He initialization #6
Comments
Hey, thanks for the issue. This is intentional, as we explicitly match Sonnet v2 initialization schemes. |
How "limited" are we by sonnet? Will you accept PRs that implement things that are not in sonnet? |
We aim to make it easy to port code from Sonnet to Haiku, so for core modules that can also be found in Sonnet we should match their API and defaults (in the same way New features are welcome 😄. In general I think in Haiku itself we should aim to only include well known modules/networks and we should make it really easy for folks to build anything custom that they want using the components in Haiku (e.g. we should be open to exposing utilities from core if needed, but we should not aim to have everything in core). Concretely I think it would be great if you added an He initializer to Haiku as one of the initializers we support, but that we should keep the default |
Note that a variety of initializers are implicitly supported through the generic hk.Linear(num_units, w_init=hk.initializers.VarianceScaling(scale=2.0)) |
Ah thank you @inoryy (hi again btw!), I'm glad that we have support for this. Perhaps it would be worth including the table from Sonnet showing how to drive variance scaling in common ways:
https://sonnet.readthedocs.io/en/latest/api.html#variancescaling |
@tomhennigan (hello! :)) yep, that table seems like a great idea! Maybe even include it both in code and as a separate note in the docs? |
Including it in the docs seems like a great idea! |
Just a suggestion in regards to Haiku documentation: create a topic in Initializers with "common" names like "he" and "glorot" in order for new developers to find out it inside "VarianceScaling". |
The default initialization for linear and convolutional modules seems to be Glorot initialization, but for the commonly used ReLU activation function He initialization is superior, while only requiring a quick change to the stddev definition, should we implement better defaults?
I know that there are many initialization schemes, I only suggest it as it would't be computationally expensive and would also be only a minor code change.
The text was updated successfully, but these errors were encountered: