Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Kaiming init of conv and linear layers, why gain = sqrt(5) #15314
Looking into the initialisation of Linear and Convolution layers we have the following
Notice the sqrt(5) scaling factor.
Using the same principle as Glorot et al paper, for an uniform distribution we should use bounds of
This is what is done here:
Diving deeper into the implementation
It seems like the a = √5 is used in
Furthermore this √5 factor conflicts with the recommended
Whether the √5 factor is intentional or not, the documentation is wrong for the weights.
While for bias
Plenty of tutorials uses ReLU and not LeakyReLU, having the default initialisation for
At the very least it should be noted in the documentation that Linear and Conv layers initialisation is done assuming it is followed by a leaky relu activation.
Finally the √5 should be explained.
closing via @eugeneware 's comment.
the code refactor from jramseyer changes the default pytorch initialization from manually initializing the weights by calling random number generator function
The initialization itself comes from torch7 and torch5 and is a modified version of initialization fro Lecun'98 Efficient Backprop. This post gives more context: https://plus.google.com/106447253626219410322/posts/RZfdrRQWL6u
The G+ link no longer works. Alternative Internet Archive link follows: https://web.archive.org/web/20170721060953/https://plus.google.com/+SoumithChintala/posts/RZfdrRQWL6u