New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non Deterministic results although did everything in order for it to be #39849
Comments
if you use torch 1.4 is the result still nondeterministic? |
I see that you are using cudnn 7.4.2. Can you try with the latest cudnn 7.6.5 or v8? https://developer.nvidia.com/cudnn This may or may not be related to LSTM. There is a known LSTM non-deterministic issue #35661, but that one only happens with non-zero dropout. |
Using pytorch 1.4.0 solved the problem (at least in the current situation). |
it might be worth reopening this bug because I have had the same issue with torch 1.5 and cuda 7.6.5 |
After some tests, it seems like this issue only exists in Volta (Titan V), but not Turing (2070). I tested with cuda 10.2 and cudnn 7.6.5. Pytorch version master/1.5/1.4 doesn't matter. This is very likely a LSTM issue, and I will check with cudnn team for a fix. As a temporary workaround, you can use |
This is a known issue to cuDNN 7.6.5 and v8, and will be fixed in the next release. Here is the explanations and a workaround https://docs.nvidia.com/deeplearning/sdk/cudnn-release-notes/rel_8.html#rel-800-Preview__section_qhc_jc1_5kb , in the Limitations section. Specifically, you can use I'll close this issue. Please feel free to reopen it if there is anything new. Edit: please track this issue at #35661. |
Non Deterministic outputs is a long discussion, I thought I found the formula for deterministic outputs:
Apparently it's not enough.
The issue, of course, is that the results are different although training with the same seed and the above settings.
Reproducing is easy , I'm attaching the code under "additional context", put it undet main.py, download the data from:https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews , put them in the same dir and just run:
python3 main.py
(If I had the ability to upload the py and csv files it was much easier)
Environment
Collecting environment information
...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Ubuntu 18.04.2 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: version 3.10.2
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: TITAN V
GPU 1: TITAN V
GPU 2: TITAN V
GPU 3: TITAN V
Nvidia driver version: 440.33.01
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.4.2
Versions of relevant libraries:
[pip3] numpy==1.16.2
[pip3] torch==1.5.0
[pip3] torchfile==0.1.0
[pip3] torchnet==0.0.4
[pip3] torchvision==0.2.2
[conda] Could not collect
conda
,pip
, source):Additional context
The full code:
cc @csarofeen @ptrblck
The text was updated successfully, but these errors were encountered: