fix inconsistent decay of beta1 in Adam #1144

SwordYork · 2016-09-07T15:10:10Z

According to the Adam paper, the beta1 that used to calculate stepsize should be the same with the one used to update biased first moment estimate. However, this bug won't cause a problem when decay_factor is very close to 1 or t1 is small.

rizar · 2017-01-02T02:14:52Z

Sorry for the extremely late reply! To be honest, I can't understand why in the current implementation in Blocks the first-order moment estimate is updated using \beta_1^t It seem like in the paper just \beta_1 is used. Do you have an idea why it's the case?

SwordYork · 2017-01-02T09:29:36Z

Never mind. It's mentioned above Theorem 4.1 in the paper.
first moment running average coefcient β1,t decay exponentially with λ, that is typically closeto 1, e.g. 1−10−8.

I will close this pr, it would be better to rewrite the Adam algorithm (issue #1159 ).

rizar · 2017-01-02T14:38:43Z

Oh, I see. I only read Algorithm 1 in which \beta_1 is not decayed. The other hyperparameter, \lambda, is only introduced in the Theorem 4.1.

Rewriting Adam as #1159 would break a lot of existing code, but maybe we still should do it.

I am not sure why you closed the PR, the change looks good to me now.

SwordYork · 2017-01-02T15:18:47Z

Thanks! I closed this PR, because I think it may be out of date.
Would you please fix beta_1t, I don't have a suitable device to update the code right now.

SwordYork · 2017-01-03T11:26:10Z

I have synchronized the code, but the checks still failed.

It says:
FAIL: tests.bricks.test_conv.FunctionTestCase (test_untied_biases) AssertionError: AbstractConv shape mismatch: shape of image does not match given imshp.
but in test_conv.py, it is
assert_raises_regexp(ValueError, 'Input dimension mis-match.*', wrongsize).

I don't know why, is it related to the newer version of Theano? for example this commit.
I think test_conv.py should be modified.

rizar · 2017-01-04T02:50:55Z

You are right, thanks for the heads-up! I created #1172

rizar · 2017-01-04T20:47:35Z

The tests pass, except for a clearly unrelated issue #1173 . The only thing I worried about is that the outputs did not even change in the test that covers Adam. Apparently that's because the impact of this change is negligible during the first iterations of training, when \lambda^t is almost one. But otherwise, the change seems legit to me, because what is called learning_rate in the code is in fact simply a trick to speed things up a bit, as mentioned in the second last paragraph of Page 2 of the paper.

rizar · 2017-01-04T20:48:02Z

Thanks for your contribution, @SwordYork !

SwordYork · 2017-01-05T04:24:30Z

Thanks! It may be problematic when decay_factor is not close to 1 and t1 is large (for example, (1-10**-5)**10000 = 0.9048), because after changing the order of computation of Algorithm 1, \alpha_t should become \alpha_t = \alpha \sqrt{1-\beta_2^t} / (1-\beta_{1,t}^t) instead of \alpha_t = \alpha \sqrt{1-\beta_2^t} / (1-\beta_1^t) (previous code).
Nevertheless, I think it is rare to change decay_factor to such value.
Thanks for this wonderful framework!

SwordYork changed the title ~~fix inconsistency decay of beta1 in Adam~~ fix inconsistent decay of beta1 in Adam Sep 7, 2016

SwordYork closed this Jan 2, 2017

rizar reopened this Jan 2, 2017

SwordYork closed this Jan 3, 2017

SwordYork force-pushed the master branch from de941ce to e1fedb0 Compare January 3, 2017 10:30

fix inconsistent beta1 in Adam

a31ef0b

SwordYork reopened this Jan 3, 2017

rizar mentioned this pull request Jan 4, 2017

Fix tests for the convolutional brick #1172

Merged

rizar merged commit e89cf93 into mila-iqia:master Jan 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix inconsistent decay of beta1 in Adam #1144

fix inconsistent decay of beta1 in Adam #1144

SwordYork commented Sep 7, 2016

rizar commented Jan 2, 2017

SwordYork commented Jan 2, 2017

rizar commented Jan 2, 2017

SwordYork commented Jan 2, 2017

SwordYork commented Jan 3, 2017

rizar commented Jan 4, 2017

rizar commented Jan 4, 2017

rizar commented Jan 4, 2017

SwordYork commented Jan 5, 2017

fix inconsistent decay of beta1 in Adam #1144

fix inconsistent decay of beta1 in Adam #1144

Conversation

SwordYork commented Sep 7, 2016

rizar commented Jan 2, 2017

SwordYork commented Jan 2, 2017

rizar commented Jan 2, 2017

SwordYork commented Jan 2, 2017

SwordYork commented Jan 3, 2017

rizar commented Jan 4, 2017

rizar commented Jan 4, 2017

rizar commented Jan 4, 2017

SwordYork commented Jan 5, 2017