-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep Gate Recurrent Neural Network #2387
Conversation
Just curious, do you have new figures for figure9/10/11 in your paper with dropout applied? |
@xingdi-eric-yuan not yet. But if you are interested, I will try to run several experiments with dropout applied in this week. According to my experiences, dropout rate can not be too high when using DSGU. |
@gaoyuankidult Yes please run experiments. What happens when dropout rate is high (how high)? Sometimes it may related to your init method 👻 |
Have you considered adding Batch Normalization to your layers as well? Given it's success with LSTMs it may be beneficial here as well... |
That's a good point! |
@xingdi-eric-yuan I think it should not be more than 0.3. The dropout function was added to make sure DSGU is consistent with other RNN classes in Keras. |
@the-moliver Thanks for your input. I also think it can be very beneficial to this model. |
@xingdi-eric-yuan This is the dropout results of IMDB example. I don't see clear differences. Maybe it is because IMDB dataset is small. |
This model uses sigmoid activation function for both binary (link) and multi-class classification problems (link). Although it is fine in binary case but in multi-class case, it can not provide a proper probabilistic distribution (it only gives the best class). As a consequence , the applicability of this model is limited. After discussion with different people, I decided to close this pull request. If you are still interested in this model, some discussions are also here. |
Thanks for letting us know, and best of luck with future iterations of this research. In general we won't merge into Keras algorithms that aren't widely accepted or haven't been covered in a peer-reviewed paper. At the same time, we try to stay on top of things and incorporate the latest advances --as soon as we are confident in their viability. |
Thanks for the information. It is good to know principles of merging into Keras library. I will be more cautious when I make a pull request next time. |
I designed a new structures called Deep Simple Gated Unit.
The structure has shown some advantages comparing with LSTM and GRU. (details can be found in this paper: http://arxiv.org/abs/1604.02910)
Originally the experiments were done using an early version of Keras. (https://github.com/gaoyuankidult/einstein/blob/master/einstein/layers/recurrent.py#L345)
I have done several initial experiments with this model but the paper is still under development. You are welcome to test the model. If it is proven to be useful, then maybe we can add it to Keras library.