You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See #130 for a repro , with 4+ GPU there's a measurable accuracy discrepancy on the same problem vs normal DDP and OSS+DDP. It does not show with 2 GPUs.
To Reproduce
Steps to reproduce the behavior:
'python3 fairscale/benchmark/oss.py' on a machine with 4 or more GPUs
Observe that the two first runs (DDP and OSS+DDP) match, but that the third one differs measurably Example with CircleCI
Expected behavior
The logs should exactly match for all three methods
Environment
`
Torch version: 1.6.0+cu101
Collecting environment information...
PyTorch version: 1.6.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla M60
GPU 1: Tesla M60
GPU 2: Tesla M60
GPU 3: Tesla M60
Nvidia driver version: 418.87.00
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.6.0+cu101
[pip3] torchtext==0.6.0
[pip3] torchvision==0.7.0
[conda] Could not collect
`
Additional context
In this toy example all the ranks get the same seed, but the data served for every rank differ (as it should)
The text was updated successfully, but these errors were encountered:
Looks like the bug is in computing the update actually, we're not using the params which we should be using possibly. It was not visible with DDP thanks to the all_reduce meaning that the data was there anyway
See P146437109, the first gradient is completely matching, but the first update differs slightly
Changing torch versions also introduces discrepancies on the reduced gradients, so that seems beyond the reach of ShardedDDP. If anything the DDP <> ShardedDDP discrepancy is reduced with pytorch 1.7
🐛 Bug
See #130 for a repro , with 4+ GPU there's a measurable accuracy discrepancy on the same problem vs normal DDP and OSS+DDP. It does not show with 2 GPUs.
To Reproduce
Steps to reproduce the behavior:
Example with CircleCI
Expected behavior
The logs should exactly match for all three methods
Environment
`
Torch version: 1.6.0+cu101
Collecting environment information...
PyTorch version: 1.6.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla M60
GPU 1: Tesla M60
GPU 2: Tesla M60
GPU 3: Tesla M60
Nvidia driver version: 418.87.00
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.6.0+cu101
[pip3] torchtext==0.6.0
[pip3] torchvision==0.7.0
[conda] Could not collect
`
Additional context
In this toy example all the ranks get the same seed, but the data served for every rank differ (as it should)
The text was updated successfully, but these errors were encountered: