Why adding bias to the forget gate? #7

AzizCode92 · 2018-10-15T07:21:08Z

Line 216 in 063b01e

new_c = c*tf.sigmoid(f+self.forget_bias) + tf.sigmoid(i)*g

The code implementation didn't correspond exactly to the equation we have in the layer normalization paper.
I also have doubts about normalizing all the gates, so for example, the forget gate will never be equal to zero du to the shift we add.
Isn't more logic to just keep the gates as they are and then just normalize cell state?

Thank you

AzizCode92 · 2018-10-15T09:18:53Z

found the reason why

AzizCode92 closed this as completed Oct 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why adding bias to the forget gate? #7

Why adding bias to the forget gate? #7

AzizCode92 commented Oct 15, 2018

AzizCode92 commented Oct 15, 2018

Why adding bias to the forget gate? #7

Why adding bias to the forget gate? #7

Comments

AzizCode92 commented Oct 15, 2018

AzizCode92 commented Oct 15, 2018