LSTM with Batch Normalization #2183

udibr · 2016-04-04T11:44:16Z

Simplified LSTM with Batch Normalization from the paper Recurrent Batch Normalization.
The main simplification is that the same gamma is used on all steps.

This PR is for Keras-0. I will merge to Keras-1 once its out of preview

EderSantana · 2016-04-04T12:28:32Z

is there a way to make a recurrent batch normalization module that can be used by any type of recurrent layer? Something like a model that gets an RNN layer as input? Or is this too layer specific?

Also, how much improvement did you observe in your tests?

udibr · 2016-04-04T12:33:09Z

I had to dig into LSTM re-code it a little bit so it is not possible to do it in a generic way without re-writing the existing Recurrent units (which is not a lot of work)

I had huge improvement in speed and in the total loss I reached.
doing more tests....

EderSantana · 2016-04-04T12:52:39Z

That is pretty good!!!

here is an aside: both dropout and batchnorm should be kept the same throughout the entire run. It is something that makes sense. Dropout is meant to drop weights and not activations.

Maybe in the future we should look into the way we write RNNs in Keras to support dropout and batchnorm callbacks in a generalized way. This will make it easy to support more complicated and custom RNN types (like neural-gpu and conv-rnns) without having to rewrite them.

Any ideas on how to implement RNN callbacks @fchollet, @farizrahman4u ? Can Keras-1 help with that?

udibr · 2016-04-04T13:00:47Z

actually the paper Recurrent Batch Normalization said they got better results when different BN parameters are used in each step of the LSTM iteration. This would have complicated the code a lot, add a lot of memory consumption and also the implementation of the paper's author that I found did not do it
https://github.com/cooijmanstim/Attentive_reader/blob/bn/codes/att_reader/layers.py#L33

udibr · 2016-04-04T13:03:23Z

For now I only tested on Theano... which takes ages to compile.

surprisingly tensorflow compilation is also slow...

fchollet · 2016-04-04T17:14:07Z

It's great to have this implementation available, but at the same time I don't think we should have more than one implementation of LSTM in Keras. Either we should figure out how to make BatchNorm part of the current implementation, or we should make the BN LSTM implementation part of a Keras extensions package (like @EderSantana's Seya).

udibr · 2016-04-04T17:34:53Z

Ok, I will add it as a parameter to the existing LSTM

udibr · 2016-04-04T23:23:58Z

I merged the BN code into LSTM class. I removed some of the usual BN parameters from the interface because I believe there is no need to touch them and they would have made the LSTM API even more complex.

This PR also fix a bug that was hidden in several places in the code in which nb_param in set_weights method was computed directly (and wrongly) instead of using len(self.get_weights()) or something similar. This bug showed up only when you start to merge models that have non_trainable_weights such as BatchNormalization and now LSTM and I had to fix it to complete my tests.

fchollet · 2016-04-04T23:57:24Z

In any case, we won't merge anything into Keras 0, the current version is considered final. All new PRs should be made to Keras-1, because that branch will replace master rather than be merged into master.

udibr · 2016-04-05T00:08:07Z

Ok, it's too much for me (and my GPU) to switch between two versions of Keras so once I personally switch to Keras-1 I will also make a new PR. We can keep this PR as-is for anyone who wants it.

fchollet · 2016-04-05T01:39:17Z

Cool, you can just adapt it to Keras 1 (Lstm hasn't changed much) and resubmit it after Keras 1 is officially released.

cooijmanstim · 2016-04-06T19:24:21Z

Primary author here, just wanted to confirm that we do use the same gamma/beta on all steps. Only the statistics can differ over time, not the parameters.

The code looks right to me, though I don't understand how/where Keras manages the updates. Thanks for making this available!

udibr · 2016-04-06T20:45:28Z

Hi Tim,
If you mean how gamma is being updated? this is eventually done through theano's mechanism of updating shared variables. http://deeplearning.net/software/theano/tutorial/examples.html#using-shared-variables

cooijmanstim · 2016-04-06T21:45:23Z

The part that confuses me is that the updates added to self.updates in the bn method use variables from the inner graph of the scan op. Somewhere along the line these have to be translated into the corresponding outer variables, but I don't see where this is done.

udibr · 2016-04-06T22:17:50Z

looks like I already explained this in the code:
https://github.com/udibr/keras/blob/lstmbn/keras/layers/recurrent.py#L171-L175

henry0312 · 2016-04-24T09:22:37Z

any progress?

xingdi-eric-yuan · 2016-05-06T18:35:56Z

@udibr I'll try to port your code into keras 1, if you haven't yet.

udibr · 2016-05-06T18:55:19Z

@xingdi-eric-yuan I have a working version of LSTM BN on a branch called mymaster https://github.com/udibr/keras/blob/mymaster/keras/layers/recurrent.py#L583

The version in the original PR (for Keras 0.3) is incorrect. As @cooijmanstim hinted we are updating the mean/var at every step of the LSTM and if you want to have a generic solution you should somehow allow the step function to accept the mean/var as input to each step and give the updated values as output of each step. My new code on mymaster branch does this but it is very convoluted code which I am not too proud off and as it is now, I think it really should be outside of Keras (see what @fchollet said about Seya) or at least in separate class from LSTM. Please update me if you look into it and manage to clean it up or of course find bugs.

I played with it a little bit on real examples and it looks like the LSTM+BN mechanism slows run time and on the other side cause overfitting. So it is not just turning the batch_norm flag to True. You need to change the model size to get comparable results. Maybe you can end up with a smaller/faster model but I'm not sure.

xingdi-eric-yuan · 2016-05-06T18:59:18Z

@udibr Thanks for pointing it out, I'll look into it 👍

xingdi-eric-yuan · 2016-05-07T00:46:15Z

I'll make a new pull request, and paste some of the test results on it.
#2657

fchollet · 2016-05-10T22:59:32Z

So what are we going to do with this PR, and with BN for RNNs in general?

I think it really should be outside of Keras

Yes, maybe. I see quite a bit of interest for this feature, coming from advanced users. But do the results delivered justify the interest? For users, is the runtime performance cost justified by superior learning performance? And for us developers, is the added codebase complexity and maintenance cost justified by the usefulness of the feature?

udibr · 2016-05-10T23:11:48Z

I now think it should be outside.

Also I tried to merge my working version (on mymaster branch) to this pull request branch so that other users will not accidently start working from the wrong version but there were few changes in the master LSTM and so now some real merge work should be done. Also as @xingdi-eric-yuan found out some work is needed to make it work on tf.

braingineer · 2016-05-11T02:58:58Z

I see quite a bit of interest for this feature, coming from advanced users. But do the results delivered justify the interest? For users, is the runtime performance cost justified by superior learning performance? And for us developers, is the added codebase complexity and maintenance cost justified by the usefulness of the feature?

@fchollet -- what about a second repo, one where things are a little more 'wild west', but solutions like this are integrated. I see many comments around the slack/google group/github issues that indicate people are recreating many wheels with keras. for example, I have a decent soft attention mechanism and I've seen at least 3 separate people also implement their own.

and it wouldn't be without precedent. Lasagne has Recipes. Blocks has blocks-extras. caffe has a variety, such as nlpcaffe. I'd be willing at least assist with PRs provided that I didn't have to maintain API consistency the way you are (excellently) doing with keras. given wild-west status, I think it might be nice.

udibr · 2016-05-12T01:29:20Z

To avoid future confusion I moved my code back to my lstmbn branch of my code which was used in this PR.
I also put all my changes in a separate class LSTMBN so there will be no conflicts with keras master LSTM which is all the time changing. I also fixed a bug that @xingdi-eric-yuan found (thanks!)

HOWEVER, I did not run any tests on LSTMBN and I closed this PR and I dont think it should be merged with keras/master.

udibr · 2016-05-12T01:33:54Z

@braingineer the problem with a keras-extras repository is maintenance. Keras is changing all the time (in a good way) but every time it changes someone needs to go back and make sure that all the add-ons continue to work.

There were several attempts to create keras-extras such as Seya and seq2seq but as far as I know currently both repos do not work because of changes in Keras 1.

braingineer · 2016-05-12T02:09:45Z

@udibr that makes sense. I think if they were the combined personal libraries of some of the more prolific keras users, it would have a higher chance of being maintained.

Plus, the wild-west status means not having to guarantee everything works. It could come with the academic guarantee, so to speak ('it worked for me once, but I give no promises after that'). Even having it be in a central place without compatibility with some new version would be useful. someone could come along and decide to port it, as a way to contribute.

I think, though, minimally, there should at least be an index of personal keras libraries and it should be pointed at by keras master (by a link or, more generously, a page in the docs). right now, there is a loose set of repositories and gists that you have know about in order to find (for example, seya, seqtoseq, some vgg ports, a keras branch that converts caffe models to keras models, attention mechanisms, etc).

udibr · 2016-05-12T02:15:57Z

@braingineer How about this wiki page https://github.com/fchollet/keras/wiki/Built-with-Keras

braingineer · 2016-05-12T02:22:25Z

That's awesome. I would consider it the minimum level though. Also, did not realize there was a wiki, so it could use some more sign posting for it. It's not mentioned anywhere (I don't think).

I feel that something like the awesome series would be a better minimum (aka https://github.com/kjw0612/awesome-rnn) and it should/could have a blurb about it (a natural place would be near the mention of the examples page where it says In the examples folder of the repository, you will find more advanced models: question-answering with memory networks, text generation with stacked LSTMs, etc.)

pxlong · 2016-06-28T16:34:21Z

@barvinograd @udibr Hi, does this work now? Thanks!

matthiasplappert · 2016-06-30T15:05:57Z

Has this been updated for Keras 1.x? It'd be very useful for what I'm working on right now.

anirudh9119 · 2016-07-04T16:27:07Z

Has anyone tried this with sampling ? I was unable to make this work in sampling mode. Any pointers ?

dgboy2000 · 2017-07-29T08:03:10Z

@fchollet @braingineer @udibr @xingdi-eric-yuan What was the group decision on whether BN for RNN belongs in Keras?

visionscaper · 2017-08-18T22:22:51Z

HOWEVER, I did not run any tests on LSTMBN and I closed this PR and I dont think it should be merged with keras/master.

@udibr Why was your final conclusion that your LSTMBN class should not be merged in to Keras?

Further questions:

Your latest LSTMBN implementation is for Keras 1.x correct?
What should in your opinion be done to make it Keras 'worthy'?

Thanks,

-- Freddy Snijder

liangsun-ponyai · 2020-04-03T02:18:46Z

@udibr Have you tried this method? I have had some experiments, Using population statistics (training: False) at test time gives worse results than batch statistics. I don't know why?

udibr force-pushed the lstmbn branch from 12d69a2 to 2fc510c Compare April 4, 2016 13:23

udibr force-pushed the lstmbn branch from 2fc510c to aaccd03 Compare April 4, 2016 23:18

udibr force-pushed the lstmbn branch 2 times, most recently from 9855d8a to dbda22d Compare April 5, 2016 00:39

LSTM with Batch Normalization

c8d2015

udibr force-pushed the lstmbn branch from dbda22d to c8d2015 Compare April 5, 2016 01:03

udibr added 3 commits April 28, 2016 11:11

merge

f372e1b

merge

f266117

LSTM with BN update for Keras 1

0948907

xingdi-eric-yuan mentioned this pull request May 8, 2016

Recurrent Batch Normalization #2657

Closed

merge from mymaster

3f5f779

a

957c967

udibr added 2 commits May 11, 2016 21:07

restore recurrent with LSTMBN

8975ef8

move LSTM with Batch Normalization to separate class LSTMBN

9c67822

replace_row backend

9eef81e

gaoyuankidult mentioned this pull request May 16, 2016

A bug to solve gaoyuankidult/einstein#2

Open

fchollet closed this Nov 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM with Batch Normalization #2183

LSTM with Batch Normalization #2183

udibr commented Apr 4, 2016

EderSantana commented Apr 4, 2016

udibr commented Apr 4, 2016

EderSantana commented Apr 4, 2016

udibr commented Apr 4, 2016

udibr commented Apr 4, 2016

fchollet commented Apr 4, 2016

udibr commented Apr 4, 2016

udibr commented Apr 4, 2016

fchollet commented Apr 4, 2016

udibr commented Apr 5, 2016

fchollet commented Apr 5, 2016

cooijmanstim commented Apr 6, 2016

udibr commented Apr 6, 2016

cooijmanstim commented Apr 6, 2016

udibr commented Apr 6, 2016

henry0312 commented Apr 24, 2016

xingdi-eric-yuan commented May 6, 2016

udibr commented May 6, 2016

xingdi-eric-yuan commented May 6, 2016

xingdi-eric-yuan commented May 7, 2016 •

edited

fchollet commented May 10, 2016

udibr commented May 10, 2016 •

edited

braingineer commented May 11, 2016 •

edited

udibr commented May 12, 2016

udibr commented May 12, 2016

braingineer commented May 12, 2016 •

edited

udibr commented May 12, 2016

braingineer commented May 12, 2016

pxlong commented Jun 28, 2016

matthiasplappert commented Jun 30, 2016

anirudh9119 commented Jul 4, 2016

dgboy2000 commented Jul 29, 2017

visionscaper commented Aug 18, 2017

liangsun-ponyai commented Apr 3, 2020

LSTM with Batch Normalization #2183

LSTM with Batch Normalization #2183

Conversation

udibr commented Apr 4, 2016

EderSantana commented Apr 4, 2016

udibr commented Apr 4, 2016

EderSantana commented Apr 4, 2016

udibr commented Apr 4, 2016

udibr commented Apr 4, 2016

fchollet commented Apr 4, 2016

udibr commented Apr 4, 2016

udibr commented Apr 4, 2016

fchollet commented Apr 4, 2016

udibr commented Apr 5, 2016

fchollet commented Apr 5, 2016

cooijmanstim commented Apr 6, 2016

udibr commented Apr 6, 2016

cooijmanstim commented Apr 6, 2016

udibr commented Apr 6, 2016

henry0312 commented Apr 24, 2016

xingdi-eric-yuan commented May 6, 2016

udibr commented May 6, 2016

xingdi-eric-yuan commented May 6, 2016

xingdi-eric-yuan commented May 7, 2016 • edited

fchollet commented May 10, 2016

udibr commented May 10, 2016 • edited

braingineer commented May 11, 2016 • edited

udibr commented May 12, 2016

udibr commented May 12, 2016

braingineer commented May 12, 2016 • edited

udibr commented May 12, 2016

braingineer commented May 12, 2016

pxlong commented Jun 28, 2016

matthiasplappert commented Jun 30, 2016

anirudh9119 commented Jul 4, 2016

dgboy2000 commented Jul 29, 2017

visionscaper commented Aug 18, 2017

liangsun-ponyai commented Apr 3, 2020

xingdi-eric-yuan commented May 7, 2016 •

edited

udibr commented May 10, 2016 •

edited

braingineer commented May 11, 2016 •

edited

braingineer commented May 12, 2016 •

edited