Segmentation fault in DataLoader worker in PyTorch 1.8.0 if set_num_threads is called beforehand #54752

zhenlin · 2021-03-26T04:55:09Z

🐛 Bug

A segmentation fault occurs if one uses DataLoader with num_workers > 0 after calling set_num_threads with a sufficiently high value.
I observed this behaviour in PyTorch 1.8.0 and 1.8.1, but I am unable to reproduce it with PyTorch 1.7.1.

To Reproduce

import torch

def main():
    torch.set_num_threads(4)

    dataloader = torch.utils.data.DataLoader([1, 2, 3], num_workers=1)
    iter(dataloader).next()

    return

if __name__ == '__main__':
    main()

The above code crashes when set_num_threads is called with 4 or more as its argument.
Incidentally (or maybe not) 4 is the number of vCPUs in the AWS EC2 instance I am using.

Expected behavior

No crash.

Environment

PyTorch version: 1.8.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla K80
Nvidia driver version: 460.32.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.8.1
[conda] Could not collect

Additional context

Perhaps this issue is related to #53894 and #54716.

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @ssnl @VitalyFedyunin @ejguan

The text was updated successfully, but these errors were encountered:

ezyang · 2021-03-26T16:05:33Z

Isn't this just the thing where you can't fork after spawning threads? cc @malfet

malfet · 2021-03-26T17:09:00Z

DataLoader calls torch.set_num_threads(1) right after the fork, so it should not be the issue.
@zhenlin , can you please share a backtrace from 1.8.1?
Update: Above example does not crash using latest nightly on OSX

maxidl · 2021-03-26T20:30:01Z

Thanks for posting this! I was stuck the whole day because of this. After removing torch.set_num_threads(32) the segfaults are gone.
My env:

conda create -n wav2vec python=3.8
conda install -c pytorch -c nvidia pytorch torchaudio cudatoolkit=11.1

The above example creates a crash for me: ERROR: Unexpected segmentation fault encountered in worker.

zhenlin · 2021-03-26T22:42:54Z

@malfet Sorry, I don't have a (useful) backtrace. The Python backtrace starts in _try_get_data on the receiving side. I tried to get a gdb backtrace from the worker process core dump but it seems the stack is corrupted – the displayed caller is 0x00...00.

@ezyang I thought so too but torch.set_num_threads(3) doesn't segfault for me.

zhenlin · 2021-03-26T22:55:34Z

I tried to reproduce the issue on macOS. No segfault, but instead this warning appears:

[W ParallelNative.cpp:206] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)

The message is repeated n_workers times. I guess this must be related to what @malfet said about the worker process calling set_num_threads after forking. This message does not appear on the Linux machine I was using.

zhenlin · 2021-03-29T00:23:16Z

I managed to obtain a backtrace of the segfault following the instructions in #53894. It seems to happen in set_num_threads. I'm guessing that pthreadpool_destroy is being called on a thread pool that no longer exists after forking... but if that's the case why does it not crash when the initial set_num_threads is called with a low number? Anyway, I hope this helps narrow down the problem.

#0  0x00007ffff7fa1aab in __pthread_clockjoin_ex (threadid=140735666181888, thread_return=0x0, clockid=0, abstime=0x0, block=true) at pthread_join_common.c:89
        pd = 0x7fff9363d700
        self = <optimized out>
        result = <optimized out>
        pd_result = <optimized out>
#1  0x00007fffe70b88db in pthreadpool_destroy () from /home/ubuntu/.local/share/virtualenvs/test-2nMYMFG7/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
No symbol table info available.
#2  0x00007fffe4cc6a07 in caffe2::PThreadPool::set_thread_count(unsigned long) () from /home/ubuntu/.local/share/virtualenvs/test-2nMYMFG7/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
No symbol table info available.
#3  0x00007fffe38dacbf in at::set_num_threads(int) () from /home/ubuntu/.local/share/virtualenvs/test-2nMYMFG7/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
No symbol table info available.
#4  0x00007ffff571a9d6 in THPModule_setNumThreads(_object*, _object*) () from /home/ubuntu/.local/share/virtualenvs/test-2nMYMFG7/lib/python3.7/site-packages/torch/lib/libtorch_python.so
No symbol table info available.
#5  0x000055555566e803 in _PyCFunction_FastCallKeywords ()
No symbol table info available.
#6  0x0000555555703ed4 in ?? ()
No symbol table info available.
#7  0x0000555555700fe2 in _PyEval_EvalFrameDefault ()
No symbol table info available.
#8  0x00005555556716ba in _PyFunction_FastCallDict ()
No symbol table info available.
#9  0x00005555556fe0fd in _PyEval_EvalFrameDefault ()
No symbol table info available.
#10 0x0000555555670a0a in _PyFunction_FastCallKeywords ()
No symbol table info available.
#11 0x0000555555703dcb in ?? ()
No symbol table info available.
#12 0x00005555556fcc78 in _PyEval_EvalFrameDefault ()
No symbol table info available.
#13 0x0000555555670a0a in _PyFunction_FastCallKeywords ()
No symbol table info available.
#14 0x0000555555703dcb in ?? ()
No symbol table info available.
#15 0x00005555556fcc78 in _PyEval_EvalFrameDefault ()
No symbol table info available.
#16 0x0000555555670a0a in _PyFunction_FastCallKeywords ()
No symbol table info available.
#17 0x0000555555703dcb in ?? ()
No symbol table info available.
#18 0x00005555556fcc78 in _PyEval_EvalFrameDefault ()
No symbol table info available.
#19 0x0000555555671cfa in _PyObject_Call_Prepend ()
No symbol table info available.
#20 0x00005555556c6d1c in ?? ()
No symbol table info available.
#21 0x00005555556c2f19 in ?? ()
No symbol table info available.
#22 0x000055555566fd65 in _PyObject_FastCallKeywords ()
No symbol table info available.
#23 0x0000555555703f51 in ?? ()
No symbol table info available.
#24 0x00005555556fcbe8 in _PyEval_EvalFrameDefault ()
No symbol table info available.
#25 0x0000555555670a0a in _PyFunction_FastCallKeywords ()
No symbol table info available.
#26 0x0000555555703dcb in ?? ()
No symbol table info available.
#27 0x0000555555700fe2 in _PyEval_EvalFrameDefault ()
No symbol table info available.
#28 0x0000555555670a0a in _PyFunction_FastCallKeywords ()
No symbol table info available.
#29 0x0000555555703dcb in ?? ()
No symbol table info available.
#30 0x0000555555700fe2 in _PyEval_EvalFrameDefault ()
No symbol table info available.

imaginary-person · 2021-03-29T08:10:56Z

why does it not crash when the initial set_num_threads is called with a low number?

I'm pretty sure that whatever thread data structures responsible for this segfault always have their first three entries in a virtual memory page that's shared (mapped to the same physical memory page) between the parent process & child process due to the Copy-on-write technique used by fork. However, the rest of the entries pertaining to the rest of the parent process' threads (fourth & beyond) in the parent process seem to be in another virtual memory page that isn't shared between the parent process & the child process, as not all pages are shared in the current fork implementation. So, the child process seems to read some gibberish instead of data structures for the parent process' 4th thread , as it seems to have a different physical memory page mapped to the corresponding virtual memory page, than does the parent process.

While I haven't verified my hypothesis, since the segfault occurs in nptl, a workaround would have to be used to resolve this issue anyway. Perhaps a new variant of set_num_threads for a newly forked process in ParallelOpenMP.cpp, that only handles its initial set_num_threads(1)?

ezyang · 2021-03-29T17:16:44Z

@malfet says when we do the fork, we should destroy all thread pools. (In this case, the caffe2 threadpool is the thing that is causing the problem)

imaginary-person · 2021-03-29T17:42:45Z

when we do the fork, we should destroy all thread pools. In this case, the caffe2 threadpool is the thing that is causing the problem)

@ezyang, please clarify if you mean that a fix would entail somehow resetting Caffe2's thread-pool in the child process before calling set_num_threads(1) for it. Or do you mean that before forking, the parent process would initially do set_num_threads(1), then fork, and after fork, do set_num_threads(old_value) (in the parent process)? Thank you!

EDIT: Removed info duplicated from my previous comment.

hiyyg · 2021-03-29T18:14:57Z

I met the same error using 1.8.1/1.8.0, for now I have to stick with 1.7.1.

ezyang · 2021-03-30T02:38:52Z

@malfet should clarify but I think it's get rid of the threads before forking.

malfet · 2021-03-30T14:37:16Z

Yes, from the crash it looks like we are trying to call pthread_join from a child process, where threads do not exist.
Idea for the fix is to register pthread_atfork that would leak pthread pool in https://github.com/pytorch/pytorch/blob/master/caffe2/utils/threadpool/pthreadpool-cpp.cc or propose a similar fix directly against https://github.com/Maratyszcza/pthreadpool.git

VitalyFedyunin added high priority module: dataloader Related to torch.utils.data.DataLoader and Sampler labels Mar 26, 2021

pytorch-probot bot added the triage review label Mar 26, 2021

VitalyFedyunin self-assigned this Mar 26, 2021

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 26, 2021

This comment has been minimized.

Sign in to view

ezyang assigned malfet and unassigned VitalyFedyunin Mar 29, 2021

ezyang added module: multiprocessing Related to torch.multiprocessing module: multithreading Related to issues that occur when running on multiple CPU threads and removed triage review labels Mar 29, 2021

imaginary-person mentioned this issue Mar 29, 2021

Leak Caffe2 threadpool in child processes right after fork to prevent segfault #54895

Closed

This comment has been minimized.

Sign in to view

VitalyFedyunin mentioned this issue Apr 1, 2021

Segmentation fault in PyTorch dataloader #54716

Open

facebook-github-bot closed this as completed in 8ed20b3 Apr 3, 2021

imaginary-person mentioned this issue Apr 21, 2021

torch.set_num_threads() causes DataLoader workers to be killed by segfault. #56615

Closed

imaginary-person mentioned this issue Jun 19, 2021

Warning: Leaking Caffe2 thread-pool after fork when using DataLoader with num_workers>0 and pin_memory=True #57273

Closed

VisionOra mentioned this issue Feb 19, 2023

Dataloader not loading data astra-vision/MonoScene#48

Closed

sfantao mentioned this issue Feb 13, 2024

Segmentation fault in dataloader worker sub-process #119845

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault in DataLoader worker in PyTorch 1.8.0 if set_num_threads is called beforehand #54752

Segmentation fault in DataLoader worker in PyTorch 1.8.0 if set_num_threads is called beforehand #54752

zhenlin commented Mar 26, 2021 •

edited by pytorch-probot bot

Loading

ezyang commented Mar 26, 2021

malfet commented Mar 26, 2021 •

edited

Loading

maxidl commented Mar 26, 2021 •

edited

Loading

zhenlin commented Mar 26, 2021

zhenlin commented Mar 26, 2021 •

edited

Loading

This comment has been minimized.

zhenlin commented Mar 29, 2021

This comment has been minimized.

imaginary-person commented Mar 29, 2021 •

edited

Loading

ezyang commented Mar 29, 2021

imaginary-person commented Mar 29, 2021 •

edited

Loading

hiyyg commented Mar 29, 2021

This comment has been minimized.

ezyang commented Mar 30, 2021

malfet commented Mar 30, 2021

Segmentation fault in DataLoader worker in PyTorch 1.8.0 if set_num_threads is called beforehand #54752

Segmentation fault in DataLoader worker in PyTorch 1.8.0 if set_num_threads is called beforehand #54752

Comments

zhenlin commented Mar 26, 2021 • edited by pytorch-probot bot Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

ezyang commented Mar 26, 2021

malfet commented Mar 26, 2021 • edited Loading

maxidl commented Mar 26, 2021 • edited Loading

zhenlin commented Mar 26, 2021

zhenlin commented Mar 26, 2021 • edited Loading

This comment has been minimized.

zhenlin commented Mar 29, 2021

This comment has been minimized.

imaginary-person commented Mar 29, 2021 • edited Loading

ezyang commented Mar 29, 2021

imaginary-person commented Mar 29, 2021 • edited Loading

hiyyg commented Mar 29, 2021

This comment has been minimized.

ezyang commented Mar 30, 2021

malfet commented Mar 30, 2021

zhenlin commented Mar 26, 2021 •

edited by pytorch-probot bot

Loading

malfet commented Mar 26, 2021 •

edited

Loading

maxidl commented Mar 26, 2021 •

edited

Loading

zhenlin commented Mar 26, 2021 •

edited

Loading

imaginary-person commented Mar 29, 2021 •

edited

Loading

imaginary-person commented Mar 29, 2021 •

edited

Loading