-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning: Leaking Caffe2 thread-pool after fork
when using DataLoader
with num_workers>0
and pin_memory=True
#57273
Comments
It's supposed to be normal behavior. The Caffe2 thread-pool is leaked & recreated in child processes after fork, in order to prevent a segfault, which otherwise occurs while handling data-structures pertaining to the Caffe2 thread-pool of the parent process, that no longer exists in the child process, which is single-threaded right after fork). Perhaps, the warning message can be revised, so that it doesn't cause any confusion among users. |
We're seeing the same issue. After that PR (#54895), I got a lot of warnings in one of our project. |
If you're referring to filtering this warning in another project, then yes. |
If both fork from dataloader and this warning is normal behavior, I would say possible suppression options other than a complete removal are:
|
I personally believe that if this is normal behaviour, then this warning should be removed. Also, I don't really understand the point of the warning. "The Caffe2 thread-pool was leaked". As a user, do I need to do anything about that? Should I even care? |
Hi, facing similar warning in latest 1.9 rc1. I tried suppressing the warning with -
Both options do not seem to suppress the warning as it is from |
I confirm OP's observation that the error messages disappear for pin_memory = False. Either way, the error messages do not seem to correspond to a meaningful problem for end-users. The warnings are very slightly different in a recent copy of the PyTorch 1.9 nightly preview. This is part of my error log: [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) |
Hello @malfet, some users reported in this thread that they were misled by the warnings pertaining to leaking the Caffe2 thread-pool, and that they didn't find it to be useful. EDIT: Oops, I had missed the |
Just wondering if lazy construct caffe2 threadpool can help to resolve this issue. When we call |
Hi, this problem still exists in v 1.9. |
Is there a way to suppress this? - That's all we users care at this point. |
Hello @ejguan, when this warning was added, segfault had been happening even with
|
I'm sorry I couldn't fully understand this solution. pytorch/aten/src/ATen/ParallelOpenMP.cpp Lines 44 to 65 in cac9ae1
BTW @ejguan, one of the preconditions for this warning to be produced is that So, the following script would also still produce a warning -
```
from torch.utils.data import DataLoader
import torch
torch.set_num_threads(4)
def main(): if name == 'main':
|
@ejguan, actually, I'm unable to reproduce this issue with @rafi-cohen's script in this issue. UPDATE: It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so |
@rafi-cohen @LinxiFan @moskomule @Jeffrey-Ede @mansimane I'm unable to reproduce the issue with @rafi-cohen's script, probably because this warning requires a Can you please confirm if you can reproduce this issue with this script in this issue? |
The 'Leaking Caffe2 thread-pool' warning is now appearing in the YOLOv5 (https://github.com/ultralytics/yolov5) Docker Image with PyTorch 1.9. To reproduce: t=ultralytics/yolov5:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all $t
python train.py --epochs 3 This is a lot verbose warning content that is going to worry users, can we please take some action to suppress these if they are of no consequence? |
@imaginary-person Hi, I could reproduce @rafi-cohen's example by using |
Thanks for the info, @moskomule! It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so The same goes for YOLOv5 - |
Thanks. By the way, does it actually cause any memory leak? |
@moskomule, no, the warning was added after fixing a segfault. We are leaking the Caffe2 pthreadpool in child processes (workers), as child processes inherit its data structures from the parent process, but child processes are single-threaded, so they don't actually have those threads, but still call BTW, are you unable to reproduce the issue in this issue's script with |
@imaginary-person Hi, thanks for the explanation. Yes, when |
I can confirm the script reproduces the warnings as-is (no need for
I noticed that if I first run the script with Output of running both optionsPython 3.8.10 (default, Jun 4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch.utils.data import DataLoader
>>> from torchvision.datasets import FakeData
>>> from torchvision.transforms import ToTensor
>>>
>>>
>>> def main(pin_memory):
... data = FakeData(transform=ToTensor())
... dataloader = DataLoader(data, num_workers=2, pin_memory=pin_memory)
... for e in range(1, 6):
... print(f'epoch {e}:')
... for _ in dataloader:
... pass
...
>>> main(True)
epoch 1:
epoch 2:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 3:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 4:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 5:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
>>> main(False)
epoch 1:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 2:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 3:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 4:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 5:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
>>> Output of running with `pin_memory=False` onlyPython 3.8.10 (default, Jun 4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch.utils.data import DataLoader
>>> from torchvision.datasets import FakeData
>>> from torchvision.transforms import ToTensor
>>>
>>>
>>> def main(pin_memory):
... data = FakeData(transform=ToTensor())
... dataloader = DataLoader(data, num_workers=2, pin_memory=pin_memory)
... for e in range(1, 6):
... print(f'epoch {e}:')
... for _ in dataloader:
... pass
...
>>> main(False)
epoch 1:
epoch 2:
epoch 3:
epoch 4:
epoch 5:
>>> |
Thanks for the info, @rafi-cohen & @moskomule! Basically, when |
Having the same issue, would appreciate a way to at least suppress the warning from user space it if it's not affecting training. In my case the reproduction steps are to install latest version of pytorch and try the sample |
This is annoying enough that I hope the torch team seriously considers releasing a hotfix patch to 1.9 to suppress this warning. That is, unless 1.10 would come out sooner than a hotfix could be done. If you guys are going to release 1.10 in the next 2 weeks, I'm fine to wait, but if it's going to take longer please consider releasing a patched version of 1.9 so I can see the text above my TQDM progress bars. I printed stuff out before the training loop for a reason. I want to see messages I printed, not "Leaking Caff2e2" warnings that I don't have any control over. Or maybe provide a link to a pip install command for a release candidate for 1.10? My scrolling finger thanks you for any consideration that could be given to this issue. |
If it's annoying enough, see if you can ask it to shut up with some glog env, e.g. Alternatively you may try |
Setting But piping stderr to null works! Much better than the above tail workaround. |
Confirming that this is still an issue and EXTREMELY annoying, also prohibitive to testing. Any idea if this could be a hot-fix? |
I can also confirm this. I use 6 worker threads to speed up my training and I get 6 red warnings for every epoch I train. thats about 30% of my output... |
Can you try to use PyTorch nightly as the PR to disable the warning has landed into master branch? https://pytorch.org/get-started/locally/ |
The next PyTorch release wouldn't have this warning. If it's not possible for you to use the nightly release (which has this warning disabled), is it possible for you to build from the 1.9 release branch, while also adding the change from #60318? Thanks |
I made:
It looked like it worked, but the warning just came again :( P.S.: above command only updated the certificate of pytorch, but it seems it did not install anything. |
This is still present in the nightly-build, even after |
@Bernd1969 @subhacom |
My apologies for a false alarm. Upon running the script I realized that torch was still at version 1.9.0. Earlier I ran:
And the installation messages look like this:
So it appears that uninstalling just Once I uninstalled all three, torch 1.10.0 installed properly, and now the warning is gone.
I suspect @Bernd1969 is experiencing the same with |
Correct. Subhacom's uninstall and reinstall also worked for me. |
Would it be possible to include a fix to this issue in 1.9.1? |
I'm too confirming that with today's nightly:
the warnings are gone! Thank you! |
Is there a way to suppress this before 1.9.1? My output is flooded with it. But I can't downgrade pytorch. |
|
Summary: Fixes pytorch#57273. Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it. It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch). pytorch#60171 test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code. cc malfet & ejguan, who have the authority to make a decision. Pull Request resolved: pytorch#60318 Reviewed By: albanD Differential Revision: D29265771 Pulled By: ezyang fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b
Summary: Fixes #57273. Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it. It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch). #60171 test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code. cc malfet & ejguan, who have the authority to make a decision. Pull Request resolved: #60318 Reviewed By: albanD Differential Revision: D29265771 Pulled By: ezyang fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b Co-authored-by: Winston Smith <76181208+imaginary-person@users.noreply.github.com>
Hi. Did changing yo "pin_memory=False" help to avoid this warning? I have the same issue. |
🐛 Bug
When using a
DataLoader
withnum_workers>0
andpin_memory=True
, warnings trigger aboutLeaking Caffe2 thread-pool after fork
. This warning shows multiple times, and populates the screen.The warning doesn't trigger when either
num_workers=0
orpin_memory=False
.To Reproduce
Steps to reproduce the behavior:
Output:
Expected behavior
No warnings
Environment
Additional context
It looks like this warning was introduced in #54895. I don't quite follow the details there, though.
cc @ssnl @VitalyFedyunin @ejguan
The text was updated successfully, but these errors were encountered: