Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep Gate Recurrent Neural Network #2387

Closed
wants to merge 6 commits into from

Conversation

gaoyuankidult
Copy link

@gaoyuankidult gaoyuankidult commented Apr 18, 2016

I designed a new structures called Deep Simple Gated Unit.
The structure has shown some advantages comparing with LSTM and GRU. (details can be found in this paper: http://arxiv.org/abs/1604.02910)

Originally the experiments were done using an early version of Keras. (https://github.com/gaoyuankidult/einstein/blob/master/einstein/layers/recurrent.py#L345)

I have done several initial experiments with this model but the paper is still under development. You are welcome to test the model. If it is proven to be useful, then maybe we can add it to Keras library.

@xingdi-eric-yuan
Copy link
Contributor

Just curious, do you have new figures for figure9/10/11 in your paper with dropout applied?

@gaoyuankidult
Copy link
Author

@xingdi-eric-yuan not yet. But if you are interested, I will try to run several experiments with dropout applied in this week. According to my experiences, dropout rate can not be too high when using DSGU.

@xingdi-eric-yuan
Copy link
Contributor

@gaoyuankidult Yes please run experiments. What happens when dropout rate is high (how high)? Sometimes it may related to your init method 👻

@the-moliver
Copy link
Contributor

Have you considered adding Batch Normalization to your layers as well? Given it's success with LSTMs it may be beneficial here as well...

@xingdi-eric-yuan
Copy link
Contributor

Have you considered adding Batch Normalization to your layers as well? Given it's success with LSTMs it may be beneficial here as well...

That's a good point!

@gaoyuankidult
Copy link
Author

gaoyuankidult commented May 6, 2016

@xingdi-eric-yuan I think it should not be more than 0.3. The dropout function was added to make sure DSGU is consistent with other RNN classes in Keras.

@gaoyuankidult
Copy link
Author

@the-moliver Thanks for your input. I also think it can be very beneficial to this model.

@gaoyuankidult
Copy link
Author

gaoyuankidult commented May 9, 2016

@xingdi-eric-yuan This is the dropout results of IMDB example.

I don't see clear differences. Maybe it is because IMDB dataset is small.
This figure shows one and half epochs for each configuration. Every 25 iterations is one epoch.

dropout-iters

@gaoyuankidult
Copy link
Author

gaoyuankidult commented May 9, 2016

A complete result.
dropout-iters 1

@gaoyuankidult
Copy link
Author

gaoyuankidult commented May 16, 2016

This model uses sigmoid activation function for both binary (link) and multi-class classification problems (link). Although it is fine in binary case but in multi-class case, it can not provide a proper probabilistic distribution (it only gives the best class). As a consequence , the applicability of this model is limited. After discussion with different people, I decided to close this pull request. If you are still interested in this model, some discussions are also here.

@fchollet
Copy link
Member

Thanks for letting us know, and best of luck with future iterations of this research.

In general we won't merge into Keras algorithms that aren't widely accepted or haven't been covered in a peer-reviewed paper. At the same time, we try to stay on top of things and incorporate the latest advances --as soon as we are confident in their viability.

@gaoyuankidult
Copy link
Author

Thanks for the information.

It is good to know principles of merging into Keras library. I will be more cautious when I make a pull request next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants