-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainable Initial State for Recurrent Bricks #600
Conversation
@bartvm, @pbrakel, @dmitriy-serdyuk , somebody, please review this PR. |
@@ -32,6 +32,10 @@ class BaseRecurrent(Brick): | |||
def initial_state(self, state_name, batch_size, *args, **kwargs): | |||
r"""Return an initial state for an application call. | |||
|
|||
Default implementation returns a zero matrix. All the standard | |||
recurrent bricks override it with trainable initial states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"standard recurrent bricks" is a bit vague, I guess you mean bricks derived from SimpleRecurrent
?
Looks good, but it would be nice to have some tests. |
Sure, I will think of something. |
Plus respective documentation updates
I got rid of "standard recurrent bricks" and wrote some tests (in fact I simply just check that trainable initial states are there in the computation graph). |
Waiting for a final LGTM from anybody of @bartvm , @pbrakel , @dmitriy-serdyuk. |
LGTM, but I'd rather @pbrakel has a quick look as well, because they use these bricks a lot and I'm not sure what the consequences are for them. |
As far as I can tell, this doesn't seem to break any of my code and learnable initial states seem like a good thing. Just a quick question: How do I define the initialization? Should I use |
They are initially zeros and this is not configurable, see https://github.com/rizar/blocks/blob/gru_initial_state/blocks/bricks/recurrent.py#L385 If one needs it, a special initialization scheme can be added for those later. The initial states are neither weight nor biases, so I do not think that applying |
I'm perfectly fine with the standard being zero. I just wanted to be sure it wouldn't be NaN :). |
Cool, I assume this was the final OK for merge :) |
Trainable Initial State for Recurrent Bricks
As discussed in #594