New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dropout and recurrent_dropout to CuDNNLSTM and CuDNNGRU #8935

Closed
bzamecnik opened this Issue Jan 1, 2018 · 18 comments

Comments

Projects
None yet
@bzamecnik
Contributor

bzamecnik commented Jan 1, 2018

Native Keras GRU and LSTM layers support dropout and recurrent_dropout, but their CuDNN-accelerated counterparts, CuDNNLSTM and CuDNNGRU, do not. It might be good to add these features. Although CuDNN RNNs do not support dropout natively, it seems to be possible to implement it outside of CuDNN. At least TensorFlow is capable of that. In Keras dropout can be applied either on inputs (dropout), which should be straightforward, or on previous hidden state (recurrent_dropout). I'm not sure if the latter might be possible, tough.

The reason is using CuDNN RNN implementation for fast training and allow dropout regularization at the same time.

Please comment if this makes sense or it is wanted. I'd be happy to try implementing that. Thanks.

@whatever1983

This comment has been minimized.

whatever1983 commented Jan 3, 2018

This is what is desperately needed. Plus, in layer_CuDNN_LSTM, the dtype = float16 is not enabled for Nvidia's CUDA 9.1 FP16 training.

@tRosenflanz

This comment has been minimized.

tRosenflanz commented Jan 3, 2018

I guess applying Dropout(x)(inputs) before the LSTM layer will do the same as the dropout, right? Or do you think it can cause a slow down?

@ml-pickle

This comment has been minimized.

ml-pickle commented Jan 3, 2018

This was would be tremendously helpful to many, many people. Not being able to use dropout often renders CuDNN layers virtually useless for training smaller datasets.

@tRosenflanz

This comment has been minimized.

tRosenflanz commented Jan 7, 2018

Recurrent dropout is still not supported with Tensorflow, if you would like to see it please submit the request there. The input dropout can easily be achieved by adding a dropout layers before the CuDNRNN layer manually

@bitmanlger

This comment has been minimized.

bitmanlger commented Feb 14, 2018

@tRosenflanz

This comment has been minimized.

tRosenflanz commented Feb 14, 2018

If I am reading the tensorflow thread right, it says that the dropout they support is applied between layers only and not on the hidden states that get passed from CuDNN cell to CuDNN cell within one layer. The dropout they support is equivalent to adding a dropout layer yourself as far as I understand.

@bitmanlger

This comment has been minimized.

@fchollet

This comment has been minimized.

Collaborator

fchollet commented Feb 14, 2018

Recurrent dropout is not implemented in cuDNN RNN ops. At the cuDNN level. So we can't have it in Keras.

The dropout option in the cuDNN API is not recurrent dropout (unlike what is in Keras), so it is basically useless (regular dropout doesn't work with RNNs).

Actually using such dropout in a stacked RNN will wreck training.

@fchollet fchollet closed this Feb 14, 2018

@smyskoff

This comment has been minimized.

Contributor

smyskoff commented Mar 9, 2018

Will time-distributed dropout solve the problem? Something like this:

...
for idx in range(num_layers):
    top_layer = idx == num_layers - 1
    layer = CuDNNLSTM(..., return_sequences=top_layer)(layer)
    if not top_layer:
        layer = TimeDistributed(Dropout(dropout))(layer)

...
@smyskoff

This comment has been minimized.

Contributor

smyskoff commented Mar 9, 2018

Oh, it looks like even simpler due to documentation: https://keras.io/layers/core/#dropout

noise_shape: 1D integer tensor representing the shape of the binary dropout mask that will be multiplied with the input. For instance, if your inputs have shape (batch_size, timesteps, features) and you want the dropout mask to be the same for all timesteps, you can use noise_shape=(batch_size, 1, features).

@tRosenflanz

This comment has been minimized.

tRosenflanz commented Mar 9, 2018

This will not produce the recurrent dropout. It will apply dropout between layers of the network while recurrent dropout works on the states that are passed within the same layer. Since CuDNN layer works through calling CuDNN layer and doesn't rely on the cells implementation, Keras team cannot do anything to it

@evictor

This comment has been minimized.

evictor commented Apr 23, 2018

@fchollet can you elaborate on these comments:

regular dropout doesn't work with RNNs

Actually using such dropout in a stacked RNN will wreck training.

@changbinlu

This comment has been minimized.

changbinlu commented May 27, 2018

Any updates on this problem? The built-in dropout truly wreck training.

@rsmith49

This comment has been minimized.

rsmith49 commented Jun 2, 2018

This paper mentions using DropConnect (Dropout applied to the weights, instead of the state vector) on the recurrent weights in an LSTM in order to have some dropout without changing the cuDNN implementation. They say that for each batch in training they perform dropout on the weights before the forward and backward propagation, and repeat for the next batch. From the paper:

We propose the use of DropConnect (Wan et al., 2013) on the recurrent hidden to hidden weight matrices which does not require any modifications to an RNN’s formu- lation. As the dropout operation is applied once to the weight matrices, before the forward and backward pass, the impact on training speed is minimal and any standard RNN implementation can be used, including inflexible but highly optimized black box LSTM implementations such as NVIDIA’s cuDNN LSTM.

By performing DropConnect on the hidden-to-hidden weight matrices [Ui,Uf,Uo,Uc] within the LSTM, we can prevent overfitting from occurring on the recurrent connections of the LSTM. This regularization technique would also be applicable to preventing overfitting on the recurrent weight matrices of other RNN cells.

Is there any interest in implementing this as an option? I am not totally familiar with how dropout is applied in the Model and Sequential classes, but hopefully this would not be too hard to implement.

@brunoalano

This comment has been minimized.

brunoalano commented Jun 5, 2018

@rsmith49 You can use the TensorLayer implementation [1] of DropConnect directly on Keras. There's an example where you can interchange Keras & TensorLayer together [2].

[1] http://tensorlayer.readthedocs.io/en/latest/modules/layers.html#dropconnect-dense-layer
[2] https://github.com/tensorlayer/tensorlayer/blob/master/example/tutorial_keras.py

@scotthuang1989

This comment has been minimized.

scotthuang1989 commented Jun 5, 2018

@fchollet when you say that:

Actually using such dropout in a stacked RNN will wreck training.

Do you refer to this paper?

@rsmith49

This comment has been minimized.

rsmith49 commented Jun 5, 2018

@brunoalano Do you know of any implementations of DropConnect applied to an LSTM layer? The link you provided only has DropconnectDenseLayer, and I did not find any in TensorLayer's recurrent.py.

@moritzaugustin

This comment has been minimized.

moritzaugustin commented Jun 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment