Trainable Initial State for Recurrent Bricks #600

rizar · 2015-04-26T13:49:54Z

As discussed in #594

rizar · 2015-05-14T20:51:11Z

The Theano bug was fixed, tests pass, PR is ready for review. @bartvm , @pbrakel

rizar · 2015-05-21T14:44:53Z

@bartvm, @pbrakel, @dmitriy-serdyuk , somebody, please review this PR.

bartvm · 2015-05-21T14:50:48Z

blocks/bricks/recurrent.py

@@ -32,6 +32,10 @@ class BaseRecurrent(Brick):
    def initial_state(self, state_name, batch_size, *args, **kwargs):
        r"""Return an initial state for an application call.

+        Default implementation returns a zero matrix. All the standard
+        recurrent bricks override it with trainable initial states


"standard recurrent bricks" is a bit vague, I guess you mean bricks derived from SimpleRecurrent?

bartvm · 2015-05-21T14:53:03Z

Looks good, but it would be nice to have some tests.

rizar · 2015-05-21T15:06:21Z

Sure, I will think of something.

Plus respective documentation updates

rizar · 2015-05-21T15:46:17Z

I got rid of "standard recurrent bricks" and wrote some tests (in fact I simply just check that trainable initial states are there in the computation graph).

rizar · 2015-05-22T10:24:46Z

Waiting for a final LGTM from anybody of @bartvm , @pbrakel , @dmitriy-serdyuk.

bartvm · 2015-05-22T13:37:54Z

LGTM, but I'd rather @pbrakel has a quick look as well, because they use these bricks a lot and I'm not sure what the consequences are for them.

pbrakel · 2015-05-22T14:43:33Z

As far as I can tell, this doesn't seem to break any of my code and learnable initial states seem like a good thing. Just a quick question: How do I define the initialization? Should I use biases_init, weights_init or something else?

rizar · 2015-05-22T14:49:41Z

They are initially zeros and this is not configurable, see https://github.com/rizar/blocks/blob/gru_initial_state/blocks/bricks/recurrent.py#L385

If one needs it, a special initialization scheme can be added for those later. The initial states are neither weight nor biases, so I do not think that applying weight_init or biases_init would be reasonable.

pbrakel · 2015-05-22T14:53:26Z

I'm perfectly fine with the standard being zero. I just wanted to be sure it wouldn't be NaN :).

rizar · 2015-05-22T14:56:26Z

Cool, I assume this was the final OK for merge :)

Trainable Initial State for Recurrent Bricks

rizar changed the title ~~Gru initial state~~ Trainable Initial State for Recurrent Bricks Apr 26, 2015

rizar force-pushed the gru_initial_state branch from 4f076b9 to f222155 Compare May 14, 2015 20:23

bartvm reviewed May 21, 2015
View reviewed changes

rizar added 4 commits May 21, 2015 17:22

Configurable initial states

1755337

Trainable initial state to all recurrent bricks

972f417

Plus respective documentation updates

Disambiguate "standard recurrent brick"

26aa522

Tests for trainable initial states

6e9a5a8

rizar force-pushed the gru_initial_state branch from f222155 to 6e9a5a8 Compare May 21, 2015 15:36

Update related sequence generator docs

5cdd1cc

rizar added a commit that referenced this pull request May 22, 2015

Merge pull request #600 from rizar/gru_initial_state

f80b044

Trainable Initial State for Recurrent Bricks

rizar merged commit f80b044 into mila-iqia:master May 22, 2015

rizar mentioned this pull request May 22, 2015

WIP: Faster GatedRecurrent #655

Merged

dwf mentioned this pull request May 25, 2015

Trainable Initial State for Recurrent Bricks #594

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainable Initial State for Recurrent Bricks #600

Trainable Initial State for Recurrent Bricks #600

rizar commented Apr 26, 2015

rizar commented May 14, 2015

rizar commented May 21, 2015

bartvm May 21, 2015

bartvm commented May 21, 2015

rizar commented May 21, 2015

rizar commented May 21, 2015

rizar commented May 22, 2015

bartvm commented May 22, 2015

pbrakel commented May 22, 2015

rizar commented May 22, 2015

pbrakel commented May 22, 2015

rizar commented May 22, 2015

Trainable Initial State for Recurrent Bricks #600

Trainable Initial State for Recurrent Bricks #600

Conversation

rizar commented Apr 26, 2015

rizar commented May 14, 2015

rizar commented May 21, 2015

bartvm May 21, 2015

Choose a reason for hiding this comment

bartvm commented May 21, 2015

rizar commented May 21, 2015

rizar commented May 21, 2015

rizar commented May 22, 2015

bartvm commented May 22, 2015

pbrakel commented May 22, 2015

rizar commented May 22, 2015

pbrakel commented May 22, 2015

rizar commented May 22, 2015