-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in DataLoader worker in PyTorch 1.8.0 if set_num_threads is called beforehand #54752
Comments
Isn't this just the thing where you can't fork after spawning threads? cc @malfet |
DataLoader calls |
Thanks for posting this! I was stuck the whole day because of this. After removing torch.set_num_threads(32) the segfaults are gone.
The above example creates a crash for me: ERROR: Unexpected segmentation fault encountered in worker. |
@malfet Sorry, I don't have a (useful) backtrace. The Python backtrace starts in @ezyang I thought so too but |
I tried to reproduce the issue on macOS. No segfault, but instead this warning appears:
The message is repeated |
This comment has been minimized.
This comment has been minimized.
I managed to obtain a backtrace of the segfault following the instructions in #53894. It seems to happen in
|
This comment has been minimized.
This comment has been minimized.
I'm pretty sure that whatever thread data structures responsible for this segfault always have their first three entries in a virtual memory page that's shared (mapped to the same physical memory page) between the parent process & child process due to the While I haven't verified my hypothesis, since the segfault occurs in |
@malfet says when we do the fork, we should destroy all thread pools. (In this case, the caffe2 threadpool is the thing that is causing the problem) |
@ezyang, please clarify if you mean that a fix would entail somehow resetting Caffe2's thread-pool in the child process before calling EDIT: Removed info duplicated from my previous comment. |
I met the same error using 1.8.1/1.8.0, for now I have to stick with 1.7.1. |
This comment has been minimized.
This comment has been minimized.
@malfet should clarify but I think it's get rid of the threads before forking. |
Yes, from the crash it looks like we are trying to call |
🐛 Bug
A segmentation fault occurs if one uses
DataLoader
withnum_workers
> 0 after callingset_num_threads
with a sufficiently high value.I observed this behaviour in PyTorch 1.8.0 and 1.8.1, but I am unable to reproduce it with PyTorch 1.7.1.
To Reproduce
The above code crashes when
set_num_threads
is called with 4 or more as its argument.Incidentally (or maybe not) 4 is the number of vCPUs in the AWS EC2 instance I am using.
Expected behavior
No crash.
Environment
Additional context
Perhaps this issue is related to #53894 and #54716.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @ssnl @VitalyFedyunin @ejguan
The text was updated successfully, but these errors were encountered: