-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of RNN #89
Comments
It is a great news to hear that RNN implementation for Mocha. On Sun, Aug 23, 2015 at 2:09 PM, Ratan notifications@github.com wrote:
Best regards,
|
For the approximate gradient computation issue in the caffe discussion. That is because the back-propagate through time get truncated at the boundary of minibatches when the sequence is longer than the minibatch size. So it is approximate |
Ah, ok. Any other comments on my post? |
Just thought I would mention that there is a pure Julia implementation of various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package. https://github.com/Andy-P/RecurrentNN.jl That might be a useful starting point. Andre |
That's wonderful information. 2015년 8월 23일 일요일, Andre Pemmelaarnotifications@github.com님이 작성한 메시지:
Best regards,
|
Thanks for pointing me to that @Andy-P, I'll definitely take a look at those when I need help with the conceptual stuff :) |
@RatanRSur Thanks for your interests in doing this! Some other comments:
|
Oops, didn't mean to close the issue. |
It happen sometimes. That's okay. Now, I have a question. Is there any special point which makes Mocha to On Mon, Aug 24, 2015 at 11:16 AM, Ratan notifications@github.com wrote:
Best regards,
|
Thank you for sharing your model, Ratan. Then, while we are implementing RNN we put modes for full RNN and canonical On Mon, Aug 24, 2015 at 1:12 PM, Ratan notifications@github.com wrote:
Best regards,
|
@RatanRSur Yes, conceptually the unrolled network looks exactly like what you described. |
For those who is interested in RNN/LSTM in Julia. Please checkout this char-rnn LSTM example in MXNet.jl now. It used explicit unrolling so everything fit in the current |
Awesome, thanks! |
So I want to work on adding RNN functionality mainly to help myself understand them better and to do something of a larger scale in Julia! I did want to open this issue though so that there would be a forum for discussion about implementation.
Here are my current thoughts, I don't know if they're consistent with Mocha's architecture, or even with the principles of RNN's as I only spent a little time getting acquainted but here goes. Please point out any of my misunderstandings!
RNN Specific Stuff
Topology of an RNN in Mocha
To my understanding, there are
split
layers which allow a layer's output to be sent to two different layers and still be able to play nice with backprop. An RNN implementation would likely need to use this. Additionally, would something like ajoin
layer be necessary?Caffe
I think BVLC/caffe#1873 is the relevant thread from Caffe.
If I'm understanding correctly, one of the inputs to a recurrent layer is a stream that represents the past states of that layer. Understandably, the forward prop is exact as it only depends on the current value of an input layer and the most recent past value, presumably stored at one end of the stream. He mentions, however, that the back prop is approximate. This is the part I don't understand at all, how is the backprop being approximated?
Thanks for reading!
The text was updated successfully, but these errors were encountered: