You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We also encountered dataloader deadlocks in some environments occasionally. While in some environments, deadlocks never occur. It is not easy to reproduce since the frequency is rather low. Our solution is to resume from the latest checkpoint (resume_from field in config file).
Compiling PyTorch from source seems to decrease the possibility of deadlock. At least we have not encountered a deadlock in the latest 50 experiments.
I'm training Mask RCNN and got this error during the 9th epoch. It seems like a dataloader deadlock.
I also encountered dataloader deadlock using Detectron.pytorch code before and the solution was to train with 1 img/gpu. (Check this issue)
Any idea what might cause this problem? I'm not sure whether it is a PyTorch dataloader problem or the dataset function: __getitem()__ problem.
Thanks in advance.
The text was updated successfully, but these errors were encountered: