Implementation of RNN #89

RatanRSur · 2015-08-23T18:09:10Z

So I want to work on adding RNN functionality mainly to help myself understand them better and to do something of a larger scale in Julia! I did want to open this issue though so that there would be a forum for discussion about implementation.

Here are my current thoughts, I don't know if they're consistent with Mocha's architecture, or even with the principles of RNN's as I only spent a little time getting acquainted but here goes. Please point out any of my misunderstandings!

RNN Specific Stuff

Not strictly forward and back
Backprop is unrolled through time instead, which essentially means a final "equivalent" FF net of varying sizes dependent on the number of time steps to backprop
LSTM to prevent exploding/vanishing gradients

Topology of an RNN in Mocha

To my understanding, there are split layers which allow a layer's output to be sent to two different layers and still be able to play nice with backprop. An RNN implementation would likely need to use this. Additionally, would something like a join layer be necessary?

Caffe

I think BVLC/caffe#1873 is the relevant thread from Caffe.

If I'm understanding correctly, one of the inputs to a recurrent layer is a stream that represents the past states of that layer. Understandably, the forward prop is exact as it only depends on the current value of an input layer and the most recent past value, presumably stored at one end of the stream. He mentions, however, that the back prop is approximate. This is the part I don't understand at all, how is the backprop being approximated?

Thanks for reading!

The text was updated successfully, but these errors were encountered:

jskDr · 2015-08-23T18:19:01Z

It is a great news to hear that RNN implementation for Mocha.
I guess it will be useful for highly nonlinear chemical molecule property
modeling.

On Sun, Aug 23, 2015 at 2:09 PM, Ratan notifications@github.com wrote:

So I want to work on adding RNN functionality mainly to help myself
understand them better and to do something of a larger scale in Julia! I
did want to open this issue though so that there would be a forum for
discussion about implementation.

Here are my current thoughts, I don't know if they're consistent with
Mocha's architecture, or even with the principles of RNN's as I only spent
a little time getting acquainted but here goes. Please point out any of my
misunderstandings!
RNN Specific Stuff

Not strictly forward and back

Backprop is unrolled through time instead, which essentially means a
final "equivalent" FF net of varying sizes dependent on the number of time
steps to backprop

LSTM to prevent exploding/vanishing gradients

Topology of an RNN in Mocha

To my understanding, there are split layers which allow a layer's output
to be sent to two different layers and still be able to play nice with
backprop. An RNN implementation would likely need to use this.
Additionally, would something like a join layer be necessary?
Caffe

I think BVLC/caffe#1873 BVLC/caffe#1873 is the
relevant thread from Caffe.

If I'm understanding correctly, one of the inputs to a recurrent layer is
a stream that represents the past states of that layer. Understandably, the
forward prop is exact as it only depends on the current value of an input
layer and the most recent past value, presumably stored at one end of the
stream. He mentions, however, that the back prop is approximate. This is
the part I don't understand at all, how is the backprop being approximated?

Thanks for reading!

—
Reply to this email directly or view it on GitHub
#89.

Best regards,
(James) Sungjin Kim, Ph.D.

Post-doc, CCB department in Harvard
sungjinkim@fas.harvard.edu
(Tech-consultant in Samsung Elec.)

pluskid · 2015-08-23T18:45:10Z

For the approximate gradient computation issue in the caffe discussion. That is because the back-propagate through time get truncated at the boundary of minibatches when the sequence is longer than the minibatch size. So it is approximate

RatanRSur · 2015-08-23T19:53:17Z

Ah, ok. Any other comments on my post?

Andy-P · 2015-08-24T01:20:46Z

Just thought I would mention that there is a pure Julia implementation of various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package.

https://github.com/Andy-P/RecurrentNN.jl

That might be a useful starting point.

Andre

jskDr · 2015-08-24T01:51:32Z

That's wonderful information.

2015년 8월 23일 일요일, Andre Pemmelaarnotifications@github.com님이 작성한 메시지:

Just thought I would mention that there is a pure Julia implementation of
various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package.

https://github.com/Andy-P/RecurrentNN.jl

That might be a useful starting point.

Andre

—
Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

Post-doc, CCB department in Harvard
sungjinkim@fas.harvard.edu
(Tech-consultant in Samsung Elec.)

RatanRSur · 2015-08-24T03:04:57Z

Thanks for pointing me to that @Andy-P, I'll definitely take a look at those when I need help with the conceptual stuff :)

pluskid · 2015-08-24T14:48:24Z

@RatanRSur Thanks for your interests in doing this! Some other comments:

Yes SplitLayer is needed to pass the output blob to both the current outputs and the input to the next time step. I do not think a join layer is needed. Because ultimately, the outputs at each time step goes to the loss layer, and Mocha automatically accumulates all the losses.
When un-rolled in time, RNN becomes an ordinary deep nets, with a large depth (depth = time-step unrolled). You will need to use the parameter sharing mechanism to make share the un-rolled layers shares the same parameters.

RatanRSur · 2015-08-24T15:16:27Z

Oops, didn't mean to close the issue.

jskDr · 2015-08-24T16:40:43Z

It happen sometimes. That's okay.

Now, I have a question. Is there any special point which makes Mocha to
adopt RNN yet?
May be no right?

On Mon, Aug 24, 2015 at 11:16 AM, Ratan notifications@github.com wrote:

Oops, didn't mean to close the issue.

—
Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

Post-doc, CCB department in Harvard
sungjinkim@fas.harvard.edu
(Tech-consultant in Samsung Elec.)

RatanRSur · 2015-08-24T17:12:06Z

Assuming a simple net like:

X_t-1 ----> Y_t-1 ------> H_t-1
|
V
X_t -------->Y_t ------> H_t

So, in some way, the user specifies the recurrence of the hidden layer (more on this later) and it is converted into the unrolled RNN by Net.jl? Is this what the solver eventually sees?

Regarding designating a layer as recurrent, I'm guessing this would be implemented through a characterization?

jskDr · 2015-08-24T17:26:17Z

Thank you for sharing your model, Ratan.
It seems a canonical form of RNN. Is my understanding right?

Then, while we are implementing RNN we put modes for full RNN and canonical
RNN.
In linear model, if I remember correctly this is called as decision
feedback model.

On Mon, Aug 24, 2015 at 1:12 PM, Ratan notifications@github.com wrote:

Assuming a simple net like:

X_t-1 ----> Y_t-1 ------> H_t-1
|
V
X_t -------->Y_t ------> H_t

So, in some way, the user specifies the recurrence of the hidden layer
(more on this later) and it is converted into the unrolled RNN by Net.jl?
Is this what the solver eventually sees?
[image: imag0486]
https://cloud.githubusercontent.com/assets/4733314/9446758/4bab76be-4a61-11e5-8cae-0eefff87003b.jpg

Regarding designating a layer as recurrent, I'm guessing this would be
implemented through a characterization?

—
Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

Post-doc, CCB department in Harvard
sungjinkim@fas.harvard.edu
(Tech-consultant in Samsung Elec.)

pluskid · 2015-08-25T14:14:23Z

@RatanRSur Yes, conceptually the unrolled network looks exactly like what you described.

pluskid · 2015-11-13T15:06:11Z

For those who is interested in RNN/LSTM in Julia. Please checkout this char-rnn LSTM example in MXNet.jl now. It used explicit unrolling so everything fit in the current FeedForward model, therefore multi-GPU training can be used directly. For more general purpose variable length RNN without unrolling, we will still need to develop the modeling interface. I will add tutorial document soon.

RatanRSur · 2015-11-13T16:39:56Z

Awesome, thanks!

RatanRSur closed this as completed Aug 24, 2015

RatanRSur reopened this Aug 24, 2015

pluskid closed this as completed Mar 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of RNN #89

Implementation of RNN #89

RatanRSur commented Aug 23, 2015

jskDr commented Aug 23, 2015

pluskid commented Aug 23, 2015

RatanRSur commented Aug 23, 2015

Andy-P commented Aug 24, 2015

jskDr commented Aug 24, 2015

RatanRSur commented Aug 24, 2015

pluskid commented Aug 24, 2015

RatanRSur commented Aug 24, 2015

jskDr commented Aug 24, 2015

RatanRSur commented Aug 24, 2015

jskDr commented Aug 24, 2015

pluskid commented Aug 25, 2015

pluskid commented Nov 13, 2015

RatanRSur commented Nov 13, 2015

Implementation of RNN #89

Implementation of RNN #89

Comments

RatanRSur commented Aug 23, 2015

RNN Specific Stuff

Topology of an RNN in Mocha

Caffe

jskDr commented Aug 23, 2015

pluskid commented Aug 23, 2015

RatanRSur commented Aug 23, 2015

Andy-P commented Aug 24, 2015

jskDr commented Aug 24, 2015

RatanRSur commented Aug 24, 2015

pluskid commented Aug 24, 2015

RatanRSur commented Aug 24, 2015

jskDr commented Aug 24, 2015

RatanRSur commented Aug 24, 2015

jskDr commented Aug 24, 2015

pluskid commented Aug 25, 2015

pluskid commented Nov 13, 2015

RatanRSur commented Nov 13, 2015