Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Fixed reproducability issue with newer pytorch versions. #107

Closed
wants to merge 1 commit into from

Conversation

florianmai
Copy link

For Pytorch 1.0.0, you have to specify the aggregation of losses over the batch torch.nn.CrossEntropyLoss already when calling the constructor. Changing the attribute manually afterwards does not have an effect. This causes the reproducability issue in #96 , which my commit fixes.
It also fixes crashes that occur with PyTorch 1.0.0. due to improper access of scalar values.

@facebook-github-bot
Copy link
Contributor

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@N-Almarwani
Copy link

Did you get the same result as the one in the paper in SNLI using this change, I've try it and I got the following result:
Dev is 84.78 and Test is 84.43 ?

@shawnkx
Copy link

shawnkx commented Apr 18, 2019

@NadaOH how many epochs did you use to get that? Thanks!

@N-Almarwani
Copy link

@shawnkx I used the same number of epochs, but I used the code from @florianmai repository

@florianmai
Copy link
Author

@NadaOH With more recent versions of PyTorch you will not get the exact same results as reported in the Infersent paper, which were produced with older versions of PyTorch. See here : https://pytorch.org/docs/stable/notes/randomness.html
I believe one major factor why it is not reproducible is that the order in which random numbers are generated changes. That's why with newer versions of PyTorch you do not get the same initialization of your weights as with previous versions.
The differences you observe (84.78 instead of 85 and 84.43 instead of 84.5) are very small and easily explainable by randomness in initialization.

@eedeleon
Copy link

eedeleon commented Jun 5, 2019

What is the timeline on this PR being checked in ?

We are having issues reproducing the published results with the pre trained models. Would this also be resolved with the bug fix if the models were re trained?

@aconneau aconneau closed this Aug 29, 2019
@aconneau
Copy link
Contributor

Closing as train_nli.py was removed. Please refer to the XLM repo for training a model on NLI with much better performance.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants