Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainable Initial State for Recurrent Bricks #600

Merged
merged 5 commits into from
May 22, 2015

Conversation

rizar
Copy link
Contributor

@rizar rizar commented Apr 26, 2015

As discussed in #594

@rizar rizar changed the title Gru initial state Trainable Initial State for Recurrent Bricks Apr 26, 2015
@rizar
Copy link
Contributor Author

rizar commented May 14, 2015

The Theano bug was fixed, tests pass, PR is ready for review. @bartvm , @pbrakel

@rizar
Copy link
Contributor Author

rizar commented May 21, 2015

@bartvm, @pbrakel, @dmitriy-serdyuk , somebody, please review this PR.

@@ -32,6 +32,10 @@ class BaseRecurrent(Brick):
def initial_state(self, state_name, batch_size, *args, **kwargs):
r"""Return an initial state for an application call.

Default implementation returns a zero matrix. All the standard
recurrent bricks override it with trainable initial states
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"standard recurrent bricks" is a bit vague, I guess you mean bricks derived from SimpleRecurrent?

@bartvm
Copy link
Member

bartvm commented May 21, 2015

Looks good, but it would be nice to have some tests.

@rizar
Copy link
Contributor Author

rizar commented May 21, 2015

Sure, I will think of something.

@rizar
Copy link
Contributor Author

rizar commented May 21, 2015

I got rid of "standard recurrent bricks" and wrote some tests (in fact I simply just check that trainable initial states are there in the computation graph).

@rizar
Copy link
Contributor Author

rizar commented May 22, 2015

Waiting for a final LGTM from anybody of @bartvm , @pbrakel , @dmitriy-serdyuk.

@bartvm
Copy link
Member

bartvm commented May 22, 2015

LGTM, but I'd rather @pbrakel has a quick look as well, because they use these bricks a lot and I'm not sure what the consequences are for them.

@pbrakel
Copy link
Contributor

pbrakel commented May 22, 2015

As far as I can tell, this doesn't seem to break any of my code and learnable initial states seem like a good thing. Just a quick question: How do I define the initialization? Should I use biases_init, weights_init or something else?

@rizar
Copy link
Contributor Author

rizar commented May 22, 2015

They are initially zeros and this is not configurable, see https://github.com/rizar/blocks/blob/gru_initial_state/blocks/bricks/recurrent.py#L385

If one needs it, a special initialization scheme can be added for those later. The initial states are neither weight nor biases, so I do not think that applying weight_init or biases_init would be reasonable.

@pbrakel
Copy link
Contributor

pbrakel commented May 22, 2015

I'm perfectly fine with the standard being zero. I just wanted to be sure it wouldn't be NaN :).

@rizar
Copy link
Contributor Author

rizar commented May 22, 2015

Cool, I assume this was the final OK for merge :)

rizar added a commit that referenced this pull request May 22, 2015
Trainable Initial State for Recurrent Bricks
@rizar rizar merged commit f80b044 into mila-iqia:master May 22, 2015
@rizar rizar mentioned this pull request May 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants