-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
RandomSampler / DistributedSampler does not seem really random #64986
Comments
For cc @VitalyFedyunin @ejguan regarding RandomSampler/general data loader question. |
That was indeed the issue with DistributedSampler. However, the issue (if any?) with RandomSampler remains. I've ran many experiments, replacing torch.randperm() with python's random.shuffle() and got the same surprising results. I have no idea why and how those sawtooth shapes happen, but they do happen and that's unwanted in applications such as GANs that are sensitive to sudden changes in gradients. |
The logic in RandomSampler is really straight foward.. Without replacement, RandomSampler simply used pytorch/torch/utils/data/sampler.py Line 124 in dfbd030
And, the generator used by the pytorch/torch/utils/data/sampler.py Lines 114 to 116 in dfbd030
If it's something related to RandomSampler, I think a statistic test on the result of |
For PyTorch and Python Random, generators on CPU use the same Mersenne Twister algorithm. Could you try to use NumPy generator to test your result? It uses different random algorithm. |
I can, but not immediately, it will take hours of compute. I need that compute for my job and I'm pretty busy these days. I'll update you with further results when I will be able to run those experiments. |
馃悰 Bug
Training a net with DataLoader(..., shuffle=True) produces weird sawtooth artifacts in both loss and accuracy (train & test), indicating that the end of an epoch kind of looks like the beginning of the next one. The dataset needs to be big enough to observe this bias.
https://discuss.pytorch.org/t/observing-strange-loss-jumps-between-epochs/64066/15
This looks bad as it clearly biases gradients and might mess with momentum-based optimizers
To Reproduce
I can reproduce this on imagenet and can provide the code, but it just boils down to shuffle=True or RandomSampler withtout replacement. It does not happen with some other datasets AFAIK.
Expected behavior
There shouldn't be any sawtooth shape like that.
Environment
conda
,pip
, source): pipcc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang @gcramer23 @ssnl @VitalyFedyunin @ejguan @cbalioglu
The text was updated successfully, but these errors were encountered: