You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code implementation didn't correspond exactly to the equation we have in the layer normalization paper.
I also have doubts about normalizing all the gates, so for example, the forget gate will never be equal to zero du to the shift we add.
Isn't more logic to just keep the gates as they are and then just normalize cell state?
Thank you
The text was updated successfully, but these errors were encountered:
supercell/supercell.py
Line 216 in 063b01e
The code implementation didn't correspond exactly to the equation we have in the layer normalization paper.
I also have doubts about normalizing all the gates, so for example, the forget gate will never be equal to zero du to the shift we add.
Isn't more logic to just keep the gates as they are and then just normalize cell state?
Thank you
The text was updated successfully, but these errors were encountered: