-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyG's DataLoader hangs when I use it in a subprocess created by the "fork" start method #3565
Comments
That's super interesting. It's indeed caused by the |
As this seems to be a PyTorch issue, I'm not sure if it's worth fixing on our end. WDYT? |
Created a PR nonetheless. Please let me know what you think. |
Hi @rusty1s! Thank you for your PR! This indeed seems to be Pytorch issue. However, it is good to avoid of using Cheers, Artem. |
BTW, I can confirm that the PR fixes this error |
I raised the issue on the Pytorch repo |
馃悰 Bug
There are 2 identical completely independent functions which use torch_geometric.loader.DataLoader. If we run function 1 in the main process and then function 2 in the subprocess created by the "fork" start method the program hangs between these 2 lines. Note, that for standard torch.utils.data.DataLoader all works well. If we use another start method all works well too. Also, the program works If we run only function 2 in the subprocess created by any start method (w/o calling the function 1 first). And, the most interesting, the program runs well if we move tensors to the GPU.
I attach the example code for the broken case above.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The queue is not Empty or the timeout value can be omitted
Environment
Additional context
I faced this issue running unittests in the library I develop. There are several unit tests. Some of them check parallel runs. When I run a single unit test for the fork start method, it was passed. However, it was suspended when I run multiple tests using
python -m unittest discover
. I used Pytorch Lightning for model training and the lines containingnext(enumerate(dataloader))
are coming from there. So, I could not find a good workaround except to use another start methods (which are more expensive)The text was updated successfully, but these errors were encountered: