Possible mismatch in runtime weight scaling implementation (equalized learning rate section) #32

akanimax · 2018-07-04T07:00:18Z

In your code here: x = self.conv(x.mul(self.scale)), the input x is multiplied by the scale which is equal to scale = sqrt(2 / fan_in) from HE initializer. I am a bit confused about the multiplication. The paper states that w_i_hat = w / scale which in case of convolution, can be achieved by doing out = conv(x / scale).

My question is: why is the scale multiplied by the x, instead of dividing? Please help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible mismatch in runtime weight scaling implementation (equalized learning rate section) #32

Possible mismatch in runtime weight scaling implementation (equalized learning rate section) #32

akanimax commented Jul 4, 2018

Possible mismatch in runtime weight scaling implementation (equalized learning rate section) #32

Possible mismatch in runtime weight scaling implementation (equalized learning rate section) #32

Comments

akanimax commented Jul 4, 2018