Added Recurrent Batch Normalization #163

iassael · 2016-04-16T18:28:52Z

Following the implementation of Recurrent Batch Normalization http://arxiv.org/abs/1603.09025, the code implements Batch-Normalized LSTMs.

karpathy · 2016-04-16T18:30:49Z

Thanks! Curious - have you tested if this works better?

iassael · 2016-04-16T18:34:34Z

I had the same question, and I just deployed it to our servers. I'll come back with more results!
Thank you!

iassael · 2016-04-16T22:01:45Z

Here are the validation scores for LSTM and BN-LSTM using the default options.

BN-LSTM trains faster but without dropout it tends to overfit faster as well.

windweller · 2016-04-16T22:08:34Z

Hey @iassael did you have different mean/variance for each timestep? Or a shared mean/variance over all timesteps of one batch? The paper said " Consequently, we recommend using separate statistics for each timestep to preserve information of the initial transient phase in the activations.".

iassael · 2016-04-16T22:20:35Z

UPDATE: Check my reply below.

Hi @windweller you are right. In this case, following the current project structure, the statistics were calculated overall.

iassael · 2016-04-17T01:03:40Z

@windweller, looking at the implementation of nn.BatchNormalization, the running_mean and running_var, variables are not part of the parameters vector as they are not trainable.

Therefore, even when we the proto.rnn is cloned, each nn.BatchNormalization layer of each clone keeps its own statistics (running_mean and running_var).

Hence, the implementation is acting as recommended in the paper.

Thank you for pointing it out!

fmassa · 2016-04-17T13:21:03Z

Quick note: there is no need to implement LinearNB, as the no-bias functionality was integrated in nn already torch/nn#583

karpathy · 2016-04-19T05:12:28Z

Can I ask what the motivation is for removing biases from that linear layer? (haven't read the BN LSTM papers yet). Is this just to avoid redundancy? Also, is it a big deal if this wasn't done? Also, is this code fully backwards compatible and identical in functionality? And how would the code behave if someone has an older version of torch that does not have the LinearNB patch?

EDIT: e.g. it seems to me that due to the additional , false in one of the nn.Linears this code is not backwards compatible and does not behave identically. Although, I think it should be fine because the xtoh pathway already has biases?

iassael · 2016-04-19T11:10:22Z

Hi @karpathy, the motivation is exactly to avoid redundancy. This saves 2*rnn_size parameters. In our case it is the 256 / 239297 (~0.1%) of the model's parameters (default settings), which is not significant, and therefore, it could be ignored.

In terms of backward compatibility, a redundant parameter passed to a function in Lua is ignored. Therefore, although the layer would have slightly different behavior, it should still maintain backward compatibility, and in both cases, it should work perfectly.

A simple example is the following:

function test(a,b) print(a,b) end
test(1,2,3)
> 1, 2

# Conflicts: # Readme.md

added recurrent batch normalization

590230b

iassael added 2 commits April 16, 2016 20:35

added LinearNB to inspect_checkpoint

4ac478b

added bnlstm in readme

4a65678

batch normalization gamma and beta initialization

2542ac9

iassael added 2 commits April 17, 2016 14:31

switched to nn.Linear nobias

0483593

removed LinearNB

483b606

iassael added 3 commits May 22, 2016 20:59

bnlstm param init

68a12cf

sampling using running mean and var of each step

e5d3401

Merge branch 'master' of https://github.com/karpathy/char-rnn

dd4b064

# Conflicts: # Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Recurrent Batch Normalization #163

Added Recurrent Batch Normalization #163

iassael commented Apr 16, 2016

karpathy commented Apr 16, 2016

iassael commented Apr 16, 2016

iassael commented Apr 16, 2016 •

edited

Loading

windweller commented Apr 16, 2016

iassael commented Apr 16, 2016 •

edited

Loading

iassael commented Apr 17, 2016 •

edited

Loading

fmassa commented Apr 17, 2016

karpathy commented Apr 19, 2016 •

edited

Loading

iassael commented Apr 19, 2016

Added Recurrent Batch Normalization #163

Are you sure you want to change the base?

Added Recurrent Batch Normalization #163

Conversation

iassael commented Apr 16, 2016

karpathy commented Apr 16, 2016

iassael commented Apr 16, 2016

iassael commented Apr 16, 2016 • edited Loading

windweller commented Apr 16, 2016

iassael commented Apr 16, 2016 • edited Loading

iassael commented Apr 17, 2016 • edited Loading

fmassa commented Apr 17, 2016

karpathy commented Apr 19, 2016 • edited Loading

iassael commented Apr 19, 2016

iassael commented Apr 16, 2016 •

edited

Loading

iassael commented Apr 16, 2016 •

edited

Loading

iassael commented Apr 17, 2016 •

edited

Loading

karpathy commented Apr 19, 2016 •

edited

Loading