-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output of SPADE #4
Comments
It's just because the initialization of the last layer outputs something close to zero. In case of gamma, we initialized it to output something close to one, so that in the beginning of training SPADE will preserve normalized activations. Another way of having the same effect would be initializing the bias of the last layer to be 1. Just to reiterate your issue that it's different from the paper, it doesn't really matter because gamma is learned. You can have gamma learn x or 1 + x. |
Got it. Thanks! |
Hi,
Would you mind explaining why in the implementation of SPADE you have:
out = normalized * (1 + gamma) + beta
This is different to what is described in the paper, where I understood it as:
out = normalized *gamma + beta
Thank you.
The text was updated successfully, but these errors were encountered: