'clipping_threshold' LSTM parameter same at 'clip_gradient' Caffe parameter? #18

aurotripathy · 2016-08-15T18:32:18Z

I'm porting your simple LSTM example to the Caffe mainline tree. As expected some keywords and parameters are different as the implementations were independently developed.

My question is about the clipping_threshold parameter.
In your lstm implementation, I see (in the backward lstm computation):

      // Clip deriviates before nonlinearity
      if (clipping_threshold_ > Dtype(0.)) {
        caffe_bound(4*H_, pre_gate_diff_t, -clipping_threshold_,
            clipping_threshold_, pre_gate_diff_t);
      }

I don't see this in the caffe mainline code. Here, the clip_gradient is converted into a scale_factor.

Dtype scale_factor = clip_gradients / l2norm_diff;

Is it the same parameter? Does it have the same effect? Is one the scaled version of the other?

Could you help with your insight?

Thank you
Auro

The text was updated successfully, but these errors were encountered:

junhyukoh · 2016-08-17T20:46:07Z

I think they are different.
I followed Alex Graves' paper for gradient clipping.
The idea is to clip the individual dimension of the gradient separately.

As far as I know, the Caffe main code scales the whole gradient based on L2-norm.
In this case, the gradient direction is preserved.
I think this (scaling) is more widely used these days.

aurotripathy · 2016-08-18T17:36:13Z

@junhyukoh
Thank you for the clarification.

I have yet to find the right 'clip_gradient' value with caffe mainline. I have tried values between 1 and 10 but they do not reproduce the signal as faithfully as your 'clipping_threshold' value (of 0.1).

Any guidance would be very valuable. I'm using the simple single-stack lstm signal following example.

Thank you.

p.s. do you think gradient bounding should be an option in addition to gradient scaling?

junhyukoh · 2016-08-19T23:18:35Z

Are you initializing the bias of the forget gate to some large positive value (say 5)?
This trick was very important to train RNNs for my toy experiments.

aurotripathy · 2016-08-20T01:17:50Z

Thank you for looking into this.

Yes, that was very clear from the clockwork rnn paper (and your code). I've attached my entire python script. Kindly take a look.

solver.net.params['lstm'][1].data[15:30] = 5.0

train_toy_lstm.txt

The prototxt file below inputs the data in the 'proper' Caffe way. (T x N x 1)

toy_lstm.txt

soulslicer · 2018-03-06T18:26:21Z

Did you solve this? What do I use in place of clip_gradients: 0.1 ? Are you able to share the code which you used?

Also could you share your test solver for the toy example?

@aurotripathy

aurotripathy · 2018-03-06T18:52:50Z

Probably not to my satisfaction (bit dated and don't remember). I now use Keras/TF mostly, and you can find lots of links on clipping. All the best. From: Raaj <notifications@github.com> To: junhyukoh/caffe-lstm <caffe-lstm@noreply.github.com> Cc: aurotripathy <ipserv@yahoo.com>; State change <state_change@noreply.github.com> Sent: Tuesday, March 6, 2018 10:26 AM Subject: Re: [junhyukoh/caffe-lstm] 'clipping_threshold' LSTM parameter same at 'clip_gradient' Caffe parameter? (#18) Did you solve this?— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

soulslicer · 2018-03-06T19:46:44Z

Ic, I have implemented the test model as well following your example. The training loss looks to be reducing as well. However, when I run it, the prediction always gives the same output. I am not sure why this is the case.

If you have any ideas why that might occur, I would like to know. The the lstm_deploy model, I merely changed the shape from 320 to 2

aurotripathy changed the title ~~'clipping_threshold' lstm parameter same at 'clip_gradient' LSTM parameter?~~ 'clipping_threshold' LSTM parameter same at 'clip_gradient' caffe parameter? Aug 15, 2016

aurotripathy changed the title ~~'clipping_threshold' LSTM parameter same at 'clip_gradient' caffe parameter?~~ 'clipping_threshold' LSTM parameter same at 'clip_gradient' Caffe parameter? Aug 15, 2016

aurotripathy closed this as completed Aug 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'clipping_threshold' LSTM parameter same at 'clip_gradient' Caffe parameter? #18

'clipping_threshold' LSTM parameter same at 'clip_gradient' Caffe parameter? #18

aurotripathy commented Aug 15, 2016 •

edited

junhyukoh commented Aug 17, 2016

aurotripathy commented Aug 18, 2016 •

edited

junhyukoh commented Aug 19, 2016

aurotripathy commented Aug 20, 2016 •

edited

soulslicer commented Mar 6, 2018 •

edited

aurotripathy commented Mar 6, 2018 via email

soulslicer commented Mar 6, 2018 •

edited

'clipping_threshold' LSTM parameter same at 'clip_gradient' Caffe parameter? #18

'clipping_threshold' LSTM parameter same at 'clip_gradient' Caffe parameter? #18

Comments

aurotripathy commented Aug 15, 2016 • edited

junhyukoh commented Aug 17, 2016

aurotripathy commented Aug 18, 2016 • edited

junhyukoh commented Aug 19, 2016

aurotripathy commented Aug 20, 2016 • edited

soulslicer commented Mar 6, 2018 • edited

aurotripathy commented Mar 6, 2018 via email

soulslicer commented Mar 6, 2018 • edited

aurotripathy commented Aug 15, 2016 •

edited

aurotripathy commented Aug 18, 2016 •

edited

aurotripathy commented Aug 20, 2016 •

edited

soulslicer commented Mar 6, 2018 •

edited

soulslicer commented Mar 6, 2018 •

edited