-
Notifications
You must be signed in to change notification settings - Fork 491
- Update for pytorch 0.2. - Fix LockedDropout to broadcast correct axis. - Use relative path for default data source. #5
Conversation
- Fix LockedDropout to broadcast correct axis. - Use relative path for default data source.
Thanks for the contribution! Before we can merge this, we need @racheltho to sign the Salesforce Contributor License Agreement. |
As a heads up, I've not forgotten about this ;) The default PTB parameters as given in the README result in an untuned model validation perplexity of I am currently testing what happens if you just run with v0.2 with normal dropout. Once I've ascertained the source of discrepancy I'll move forward with merging :) |
I'll be interested to hear what you find! |
Unfortunately the weight dropout update appears to be broken. After testing [fixed locked dropout, Smerity's silly locked dropout, normal dropout] and finding the same problematic convergence, I had a bug hunt.
On PyTorch 0.2 we end up with:
while on PyTorch 0.1.12 we end up with:
This is the expected behaviour, where the first output should always be the same (as we're only performing weight drop on the hidden-to-hidden weight matrix which hasn't been used yet) but the second outputs should always be different (unless we're very unlucky to have the exact same dropout masks :P). I'll also replace the Further pondering on weight drop may be necessary. |
I've got a fix that I believe works, now testing it, and will submit a pull request to your branch with that singular fix if it should work :) Related issue (and why this breakage is also similar for weight norm): |
Yup I just figured out the same thing :) I started working on language modeling for the course today so this came up at just the right time! |
With weight drop working with PyTorch 0.2, I'll merge this and update the README (PyTorch 0.2 instructions, remove exact reproduction as that no longer holds, point to the PyTorch==0.1.12 release). This also closes #3. |
No description provided.