- Update for pytorch 0.2. - Fix LockedDropout to broadcast correct axis. - Use relative path for default data source. #5

jph00 · 2017-09-01T19:07:40Z

No description provided.

- Fix LockedDropout to broadcast correct axis. - Use relative path for default data source.

salesforce-cla · 2017-09-01T19:07:42Z

Thanks for the contribution! Before we can merge this, we need @racheltho to sign the Salesforce Contributor License Agreement.

Smerity · 2017-09-05T19:23:21Z

As a heads up, I've not forgotten about this ;)

The default PTB parameters as given in the README result in an untuned model validation perplexity of 65.2 which is far from the 61.2 that existed before the v0.2 update. My current theory is that it's a result of the fixed variational / locked dropout - either locked dropout doesn't work as well as expected (which may well be possible as I've never seen it directly compared for non-recurrent connections where it has a massive advantage) or it requires substantially different hyperparameters.

I am currently testing what happens if you just run with v0.2 with normal dropout. Once I've ascertained the source of discrepancy I'll move forward with merging :)

jph00 · 2017-09-06T00:35:25Z

I'll be interested to hear what you find!

Smerity · 2017-09-06T22:09:50Z

Unfortunately the weight dropout update appears to be broken. After testing [fixed locked dropout, Smerity's silly locked dropout, normal dropout] and finding the same problematic convergence, I had a bug hunt.

import torch
from weight_drop import WeightDrop

wdrnn = WeightDrop(torch.nn.LSTM(10, 10), ['weight_hh_l0'], dropout=0.9)

# Input is (seq, batch, input)
x = torch.autograd.Variable(torch.randn(2, 1, 10))
h0 = None

print(wdrnn.module._all_weights)

run1 = [x.sum() for x in wdrnn(x, h0)[0].data]
run2 = [x.sum() for x in wdrnn(x, h0)[0].data]

print('Run 1:', run1)
print('Run 2:', run2)

# First time step, not influenced by hidden to hidden weights, should be equal
assert run1[0] == run2[0]
# Second step should not
assert run1[1] != run2[1]

On PyTorch 0.2 we end up with:

Applying weight drop of 0.9 to weight_hh_l0
[['weight_ih_l0', 'weight_hh_l0_raw', 'bias_ih_l0', 'bias_hh_l0']]
Run 1: [-0.12040333496406674, -0.45819888915866613]
Run 2: [-0.12040333496406674, -0.45819888915866613]
Traceback (most recent call last):
  File "wdrop_test.py", line 21, in <module>
    assert run1[1] != run2[1]
AssertionError

while on PyTorch 0.1.12 we end up with:

Applying weight drop of 0.9 to weight_hh_l0
[['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0']]
Run 1: [-0.003149353200569749, -0.43234687950462103]
Run 2: [-0.003149353200569749, -0.5475110579282045]

This is the expected behaviour, where the first output should always be the same (as we're only performing weight drop on the hidden-to-hidden weight matrix which hasn't been used yet) but the second outputs should always be different (unless we're very unlucky to have the exact same dropout masks :P). I'll also replace the if __name__ == "__main__" mini test in weight_drop with this as it'll be a more useful mini test than whatever in the world I wrote there previously ;)

Further pondering on weight drop may be necessary.

Smerity · 2017-09-06T23:36:58Z

I've got a fix that I believe works, now testing it, and will submit a pull request to your branch with that singular fix if it should work :)

Related issue (and why this breakage is also similar for weight norm):
pytorch/pytorch#2515 (comment)

jph00 · 2017-09-07T03:11:38Z

Yup I just figured out the same thing :) I started working on language modeling for the course today so this came up at just the right time!

Smerity · 2017-09-07T21:28:10Z

With weight drop working with PyTorch 0.2, python -u main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.225 --seed 28 --epoch 500 gives 61.2 / 58.9 before finetuning, which approximately matches previous. As locked dropout is now different, the hyper parameters will likely need updating. I also want to retry using weight dropping to perform variational dropout.

I'll merge this and update the README (PyTorch 0.2 instructions, remove exact reproduction as that no longer holds, point to the PyTorch==0.1.12 release).

This also closes #3.

- Update for pytorch 0.2.

a3020a7

- Fix LockedDropout to broadcast correct axis. - Use relative path for default data source.

salesforce-cla bot added cla:missing cla:signed and removed cla:missing labels Sep 1, 2017

Fix Weight Drop by monkey patching CUDNN RNN flatten parameters call

6373b58

Smerity merged commit 28683b2 into salesforce:master Sep 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

- Update for pytorch 0.2. - Fix LockedDropout to broadcast correct axis. - Use relative path for default data source. #5

- Update for pytorch 0.2. - Fix LockedDropout to broadcast correct axis. - Use relative path for default data source. #5

jph00 commented Sep 1, 2017

salesforce-cla bot commented Sep 1, 2017

Smerity commented Sep 5, 2017

jph00 commented Sep 6, 2017

Smerity commented Sep 6, 2017

Smerity commented Sep 6, 2017

jph00 commented Sep 7, 2017

Smerity commented Sep 7, 2017

- Update for pytorch 0.2. - Fix LockedDropout to broadcast correct axis. - Use relative path for default data source. #5

- Update for pytorch 0.2. - Fix LockedDropout to broadcast correct axis. - Use relative path for default data source. #5

Conversation

jph00 commented Sep 1, 2017

salesforce-cla bot commented Sep 1, 2017

Smerity commented Sep 5, 2017

jph00 commented Sep 6, 2017

Smerity commented Sep 6, 2017

Smerity commented Sep 6, 2017

jph00 commented Sep 7, 2017

Smerity commented Sep 7, 2017