`Warning: Leaking Caffe2 thread-pool after fork` when using `DataLoader` with `num_workers>0` and `pin_memory=True` #57273

rafi-cohen · 2021-04-29T10:15:56Z

🐛 Bug

When using a DataLoader with num_workers>0 and pin_memory=True, warnings trigger about Leaking Caffe2 thread-pool after fork. This warning shows multiple times, and populates the screen.
The warning doesn't trigger when either num_workers=0 or pin_memory=False.

To Reproduce

Steps to reproduce the behavior:

Run the following:

 from torch.utils.data import DataLoader
 from torchvision.datasets import FakeData
 from torchvision.transforms import ToTensor
 
 
 def main():
     data = FakeData(transform=ToTensor())
     dataloader = DataLoader(data, num_workers=2, pin_memory=True)
     for e in range(1, 6):
         print(f'epoch {e}:')
         for _ in dataloader:
             pass
 
 
 if __name__ == '__main__':
     main()

Output:

epoch 1:
epoch 2:
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 3:
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 4:
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 5:
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

Expected behavior

No warnings

Environment

PyTorch version: 1.9.0.dev20210428
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti
GPU 3: GeForce RTX 2080 Ti
GPU 4: GeForce RTX 2080 Ti
GPU 5: GeForce RTX 2080 Ti
GPU 6: GeForce RTX 2080 Ti
GPU 7: GeForce RTX 2080 Ti

Nvidia driver version: 460.32.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.9.0.dev20210428
[pip3] torchaudio==0.9.0a0+999026d
[pip3] torchvision==0.10.0.dev20210428
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.1.74              h6bb024c_0    nvidia
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py38h27cfd23_1
[conda] mkl_fft                   1.3.0            py38h42c9631_2
[conda] mkl_random                1.2.1            py38ha9443f7_2
[conda] numpy                     1.20.1           py38h93e21f0_0
[conda] numpy-base                1.20.1           py38h7d8b39e_0
[conda] pytorch                   1.9.0.dev20210428 py3.8_cuda11.1_cudnn8.0.5_0    pytorch-nightly
[conda] torchaudio                0.9.0.dev20210428            py38    pytorch-nightly
[conda] torchvision               0.10.0.dev20210428      py38_cu111    pytorch-nightly

Additional context

It looks like this warning was introduced in #54895. I don't quite follow the details there, though.

cc @ssnl @VitalyFedyunin @ejguan

The text was updated successfully, but these errors were encountered:

imaginary-person · 2021-04-29T17:15:21Z

It's supposed to be normal behavior. The Caffe2 thread-pool is leaked & recreated in child processes after fork, in order to prevent a segfault, which otherwise occurs while handling data-structures pertaining to the Caffe2 thread-pool of the parent process, that no longer exists in the child process, which is single-threaded right after fork).

Perhaps, the warning message can be revised, so that it doesn't cause any confusion among users.

xkszltl · 2021-05-16T05:04:31Z

We're seeing the same issue.

After that PR (#54895), I got a lot of warnings in one of our project.
Looks like 4 warnings for each epoch, emitted within the same second.
My guess is it's from data loader restart.
Because for this specific case we have high epoch count with small dataset, it's flooding our log.
If this is expected, should we suppress if after certain times?

imaginary-person · 2021-05-16T16:49:53Z

TORCH_WARN_ONCE would've worked the same way as TORCH_WARN in #54895, as this warning is displayed only once per child process that leaks & rebuilds the Caffe2 thread-pool.

If this is expected, should we suppress if after certain times?

If you're referring to filtering this warning in another project, then yes.
But if you're referring to suppressing this warning in the PyTorch source code, then it'd have to be removed.

xkszltl · 2021-05-18T07:04:14Z

If both fork from dataloader and this warning is normal behavior, I would say possible suppression options other than a complete removal are:

Suppress directly from user side, e.g. inside dataloader.
Use IPC-capable counter for that "ONCE", e.g. with mmap

rafi-cohen · 2021-05-18T08:04:31Z

I personally believe that if this is normal behaviour, then this warning should be removed.
This is a very basic use case in PyTorch, which, in my opinion, shouldn't trigger a warning unless something is wrong.

Also, I don't really understand the point of the warning. "The Caffe2 thread-pool was leaked". As a user, do I need to do anything about that? Should I even care?

mansimane · 2021-05-24T22:54:30Z

Hi, facing similar warning in latest 1.9 rc1. I tried suppressing the warning with -

Launching script as python -W ignore
Filtering warnings as below

import warnings
warnings.filterwarnings("ignore")

Both options do not seem to suppress the warning as it is from cpp module. Can someone suggest a way to suppress it without modifying the PyTorch code?

Jeffrey-Ede · 2021-06-01T09:30:52Z

I confirm OP's observation that the error messages disappear for pin_memory = False. Either way, the error messages do not seem to correspond to a meaningful problem for end-users.

The warnings are very slightly different in a recent copy of the PyTorch 1.9 nightly preview. This is part of my error log:

[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

imaginary-person · 2021-06-01T17:15:59Z

Hello @malfet, some users reported in this thread that they were misled by the warnings pertaining to leaking the Caffe2 thread-pool, and that they didn't find it to be useful.
Please confirm if it'd be okay to remove this warning. I can submit a PR to do so.
Thank you!

EDIT: Oops, I had missed the pin_memory part. Will look into it.

ejguan · 2021-06-01T21:39:21Z

Just wondering if lazy construct caffe2 threadpool can help to resolve this issue. When we call set_num_thread, just write the number to a global variable without creating the threadpool. In this case, does the pin_memory_thread have the COW problem?

moskomule · 2021-06-18T08:45:29Z

Hi, this problem still exists in v 1.9.

DrJimFan · 2021-06-19T03:00:56Z

Is there a way to suppress this? - That's all we users care at this point.

imaginary-person · 2021-06-19T04:01:13Z

Hello @ejguan, when this warning was added, segfault had been happening even with pin_memory=False. eg #54752.
In that issue, pin_memory was implicitly False.
In fact, if you'd use this script, you'd still get a warning, even if you'd explicitly set pin_memory to True -

import torch

def main():
    torch.set_num_threads(4)

    dataloader = torch.utils.data.DataLoader([1, 2, 3], num_workers=1)
    iter(dataloader).next()

    return

if __name__ == '__main__':
    main()

imaginary-person · 2021-06-19T04:28:02Z

Just wondering if lazy construct caffe2 threadpool can help to resolve this issue. When we call set_num_thread, just write the number to a global variable without creating the threadpool.

I'm sorry I couldn't fully understand this solution. set_num_threads's code-flow is common for child processes & the main-process, so wouldn't this mean that even in the main process, set_num_threads won't modify the number of threads of an existing pthreadpool (which is why a patch was added to enable the following behavior, although I do see your point that pthreadpool is being created even if it isn't required)?

pytorch/aten/src/ATen/ParallelOpenMP.cpp

Lines 44 to 65 in cac9ae1

    
           void set_num_threads(int nthreads) { 
        
             TORCH_CHECK(nthreads > 0, "Expected positive number of threads"); 
        
             num_threads.store(nthreads); 
        
           #ifdef _OPENMP 
        
             omp_set_num_threads(nthreads); 
        
           #endif 
        
           #ifdef TH_BLAS_MKL 
        
             mkl_set_num_threads(nthreads); 
        
             // because PyTorch uses OpenMP outside of MKL invocations 
        
             // as well, we want this flag to be false, so that 
        
             // threads aren't destroyed and recreated across every 
        
             // MKL / non-MKL boundary of OpenMP usage 
        
             // See https://github.com/pytorch/pytorch/issues/13757 
        
             mkl_set_dynamic(false); 
        
           #endif 
        
           #ifdef USE_PTHREADPOOL 
        
             // because PyTorch uses caffe2::pthreadpool() in QNNPACK 
        
             caffe2::PThreadPool* const pool = caffe2::pthreadpool(); 
        
             TORCH_INTERNAL_ASSERT(pool, "Invalid thread pool!"); 
        
             pool->set_thread_count(nthreads); 
        
           #endif

BTW @ejguan, one of the preconditions for this warning to be produced is that torch.set_num_threads() should've been called at least once by the parent process.

So, the following script would also still produce a warning -

``` from torch.utils.data import DataLoader import torch torch.set_num_threads(4)

def main():
dataloader = DataLoader([1,2,3,4], num_workers=1, pin_memory=False)
for e in range(1, 6):
print(f'epoch {e}:')
for _ in dataloader:
pass

if name == 'main':
main()

</details>

imaginary-person · 2021-06-19T04:31:52Z

In this case, does the pin_memory_thread have the COW problem?

@ejguan, actually, I'm unable to reproduce this issue with @rafi-cohen's script in this issue.
That's probably because there's no torch.set_num_threads() invocation in it.
I'm able to reproduce the issue only after adding a torch.set_num_threads() invocation.
I tested after building from source the master branches of PyTorch & Vision.

UPDATE: It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so torch.set_num_threads() invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

imaginary-person · 2021-06-19T05:15:01Z

@rafi-cohen @LinxiFan @moskomule @Jeffrey-Ede @mansimane

I'm unable to reproduce the issue with @rafi-cohen's script, probably because this warning requires a torch.set_num_threads() invocation to have occurred first. (so I can reproduce the issue if I add a torch.set_num_threads() invocation to the script).
However, I tested with the current master branch of PyTorch & the current master branch of Vision.

Can you please confirm if you can reproduce this issue with this script in this issue?
Thank you!

glenn-jocher · 2021-06-19T12:47:25Z

The 'Leaking Caffe2 thread-pool' warning is now appearing in the YOLOv5 (https://github.com/ultralytics/yolov5) Docker Image with PyTorch 1.9. To reproduce:

t=ultralytics/yolov5:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all $t
python train.py --epochs 3

This is a lot verbose warning content that is going to worry users, can we please take some action to suppress these if they are of no consequence?

moskomule · 2021-06-19T14:32:48Z

@imaginary-person Hi, I could reproduce @rafi-cohen's example by using 1.9.0-py3.9_cuda11.1_cudnn8.0.5_0 installed via conda on Ubuntu 20.04.

imaginary-person · 2021-06-19T14:39:25Z

Thanks for the info, @moskomule! It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so torch.set_num_threads() invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

The same goes for YOLOv5 - torch.set_num_threads() invocation isn't required to reproduce the issue.

moskomule · 2021-06-19T14:42:59Z

Thanks. By the way, does it actually cause any memory leak?

imaginary-person · 2021-06-19T14:52:06Z

@moskomule, no, the warning was added after fixing a segfault. We are leaking the Caffe2 pthreadpool in child processes (workers), as child processes inherit its data structures from the parent process, but child processes are single-threaded, so they don't actually have those threads, but still call pthread_destroy on them!

BTW, are you unable to reproduce the issue in this issue's script with pin_memory=False?
If so, I'd have to check why that happens. Thanks!

moskomule · 2021-06-20T01:02:43Z

@imaginary-person Hi, thanks for the explanation.

Yes, when pin_memory=False, the warning disappears.

rafi-cohen · 2021-06-20T05:36:58Z

@imaginary-person

I'm unable to reproduce the issue with @rafi-cohen's script, probably because this warning requires a torch.set_num_threads() invocation to have occurred first. (so I can reproduce the issue if I add a torch.set_num_threads() invocation to the script).
However, I tested with the current master branch of PyTorch & the current master branch of Vision.

Can you please confirm if you can reproduce this issue with this script in this issue?
Thank you!

I can confirm the script reproduces the warnings as-is (no need for torch.set_num_threads()) with PyTorch 1.9.0 (py3.8_cuda11.1_cudnn8.0.5_0), running on Ubuntu 18.04.5 LTS

are you unable to reproduce the issue in this issue's script with pin_memory=False?
If so, I'd have to check why that happens. Thanks!

I noticed that if I first run the script with pin_memory=True, and then run it again with pin_memory=False (on the same python instance), the warnings trigger in both cases. However, if I run a new python instance and run the script with pin_memory=False, there are no warnings.

Output of running both options

Python 3.8.10 (default, Jun  4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch.utils.data import DataLoader
>>> from torchvision.datasets import FakeData
>>> from torchvision.transforms import ToTensor
>>>
>>>
>>> def main(pin_memory):
...     data = FakeData(transform=ToTensor())
...     dataloader = DataLoader(data, num_workers=2, pin_memory=pin_memory)
...     for e in range(1, 6):
...         print(f'epoch {e}:')
...         for _ in dataloader:
...             pass
...
>>> main(True)
epoch 1:
epoch 2:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 3:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 4:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 5:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
>>> main(False)
epoch 1:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 2:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 3:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 4:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 5:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
>>>

Output of running with `pin_memory=False` only

Python 3.8.10 (default, Jun  4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch.utils.data import DataLoader
>>> from torchvision.datasets import FakeData
>>> from torchvision.transforms import ToTensor
>>>
>>>
>>> def main(pin_memory):
...     data = FakeData(transform=ToTensor())
...     dataloader = DataLoader(data, num_workers=2, pin_memory=pin_memory)
...     for e in range(1, 6):
...         print(f'epoch {e}:')
...         for _ in dataloader:
...             pass
...
>>> main(False)
epoch 1:
epoch 2:
epoch 3:
epoch 4:
epoch 5:
>>>

imaginary-person · 2021-06-20T05:51:23Z

Thanks for the info, @rafi-cohen & @moskomule!
Yes, I was able to reproduce the issue with the Conda binary.

Basically, when pin_memory is True, a torch.set_num_threads call is made, which I could see with gdb (at::set_num_threads).
So, it seems that it'd be safe to disable the warning.

joaqo · 2021-06-22T04:22:16Z

Having the same issue, would appreciate a way to at least suppress the warning from user space it if it's not affecting training. In my case the reproduction steps are to install latest version of pytorch and try the sample detection reference script in the torchvision repo to train a fasterrcnn model.

Erotemic · 2021-07-30T02:39:03Z

This is annoying enough that I hope the torch team seriously considers releasing a hotfix patch to 1.9 to suppress this warning. That is, unless 1.10 would come out sooner than a hotfix could be done.

If you guys are going to release 1.10 in the next 2 weeks, I'm fine to wait, but if it's going to take longer please consider releasing a patched version of 1.9 so I can see the text above my TQDM progress bars. I printed stuff out before the training loop for a reason. I want to see messages I printed, not "Leaking Caff2e2" warnings that I don't have any control over.

Or maybe provide a link to a pip install command for a release candidate for 1.10? My scrolling finger thanks you for any consideration that could be given to this issue.

xkszltl · 2021-07-30T03:47:04Z

This is annoying enough that I hope the torch team seriously considers releasing a hotfix patch to 1.9 to suppress this warning. That is, unless 1.10 would come out sooner than a hotfix could be done.

If you guys are going to release 1.10 in the next 2 weeks, I'm fine to wait, but if it's going to take longer please consider releasing a patched version of 1.9 so I can see the text above my TQDM progress bars. I printed stuff out before the training loop for a reason. I want to see messages I printed, not "Leaking Caff2e2" warnings that I don't have any control over.

Or maybe provide a link to a pip install command for a release candidate for 1.10? My scrolling finger thanks you for any consideration that could be given to this issue.

@Erotemic

If it's annoying enough, see if you can ask it to shut up with some glog env, e.g. GLOG_minloglevel=2. Note that this will disable INFO as well.

Alternatively you may try 2>/dev/null if it's on stderr.

Erotemic · 2021-07-30T05:27:30Z

Setting GLOG_minloglevel=2 didn't change the behavior.

But piping stderr to null works! Much better than the above tail workaround.

jmskinner · 2021-08-05T00:33:25Z

Confirming that this is still an issue and EXTREMELY annoying, also prohibitive to testing. Any idea if this could be a hot-fix?

Bernd1969 · 2021-08-05T13:38:52Z

I can also confirm this. I use 6 worker threads to speed up my training and I get 6 red warnings for every epoch I train. thats about 30% of my output...
And why was this bug closed!? Its not fixed and there seems to be not even a valid workaround.
Pipeing stderror to dev/null is like "switch off your monitor, then you dont see it anymore", but not a solution.

ejguan · 2021-08-05T13:41:43Z

Can you try to use PyTorch nightly as the PR to disable the warning has landed into master branch? https://pytorch.org/get-started/locally/

imaginary-person · 2021-08-05T13:44:53Z

And why was this bug closed!? Its not fixed and there seems to be not even a valid workaround.

The next PyTorch release wouldn't have this warning.

If it's not possible for you to use the nightly release (which has this warning disabled), is it possible for you to build from the 1.9 release branch, while also adding the change from #60318? Thanks

Bernd1969 · 2021-08-05T13:57:24Z

I made:

conda install pytorch -c pytorch-nightly

It looked like it worked, but the warning just came again :(
I will retry in a few days with stable build
Thx 4 fast support

P.S.: above command only updated the certificate of pytorch, but it seems it did not install anything.
Do I need to start from scratch with my environment to actually get the nightly pytorch package? Or can I enforce the package change somehow?

subhacom · 2021-08-19T11:49:45Z

I made:

conda install pytorch -c pytorch-nightly

It looked like it worked, but the warning just came again :(
I will retry in a few days with stable build
Thx 4 fast support

P.S.: above command only updated the certificate of pytorch, but it seems it did not install anything.
Do I need to start from scratch with my environment to actually get the nightly pytorch package? Or can I enforce the package change somehow?

This is still present in the nightly-build, even after pip uninstall torch followed by installing the nightly build using pip.

ejguan · 2021-08-19T13:25:18Z

@Bernd1969 @subhacom
Can you use https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py to verify your environment?

subhacom · 2021-08-19T17:55:22Z

@Bernd1969 @subhacom
Can you use https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py to verify your environment?

My apologies for a false alarm. Upon running the script I realized that torch was still at version 1.9.0. Earlier I ran:

pip uninstall torch
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html

And the installation messages look like this:

Successfully uninstalled torch-1.9.0+cu102
Looking in links: https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
Collecting torch
Downloading https://download.pytorch.org/whl/nightly/cu102/torch-1.10.0.dev20210819%2Bcu102-cp37-cp37m-linux_x86_64.whl (879.9 MB)
|██████████████████████████████▍ | 834.1 MB 1.4 MB/s eta 0:00:32tcmalloc: large alloc 1147494400 bytes == 0x55f3bf90c000 @ 0x7fef8c5fd615 0x55f3bc70b02c 0x55f3bc7eb17a 0x55f3bc70de4d 0x55f3bc7ffc0d 0x55f3bc7820d8 0x55f3bc77cc35 0x55f3bc70f73a 0x55f3bc781f40 0x55f3bc77cc35 0x55f3bc70f73a 0x55f3bc77e93b 0x55f3bc800a56 0x55f3bc77dfb3 0x55f3bc800a56 0x55f3bc77dfb3 0x55f3bc800a56 0x55f3bc77dfb3 0x55f3bc70fb99 0x55f3bc752e79 0x55f3bc70e7b2 0x55f3bc781e65 0x55f3bc77cc35 0x55f3bc70f73a 0x55f3bc77e93b 0x55f3bc77cc35 0x55f3bc70f73a 0x55f3bc77db0e 0x55f3bc70f65a 0x55f3bc77dd67 0x55f3bc77cc35
|████████████████████████████████| 879.9 MB 14 kB/s
Requirement already satisfied: torchvision in /usr/local/lib/python3.7/dist-packages (0.10.0+cu102)
Collecting torchaudio
Downloading https://download.pytorch.org/whl/nightly/torchaudio-0.10.0.dev20210819-cp37-cp37m-linux_x86_64.whl (2.0 MB)
|████████████████████████████████| 2.0 MB 155 kB/s
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch) (3.7.4.3)
Collecting torch
Downloading torch-1.9.0-cp37-cp37m-manylinux1_x86_64.whl (831.4 MB)
|████████████████████████████████| 831.4 MB 2.9 kB/s
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchvision) (1.19.5)
Requirement already satisfied: pillow>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision) (7.1.2)
...
...
Installing collected packages: torch, torchaudio
Successfully installed torch-1.9.0 torchaudio-0.9.0

So it appears that uninstalling just torch leaves torchaudio and torchvision in their older versions, and the pip install reinstalls torch1.9.0 in order to satisfy their requirements.

Once I uninstalled all three, torch 1.10.0 installed properly, and now the warning is gone.

pip uninstall torch torchvision torchaudio
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html

I suspect @Bernd1969 is experiencing the same with conda.

Bernd1969 · 2021-08-23T16:42:34Z

Correct. Subhacom's uninstall and reinstall also worked for me.
I had to use cu113 for my environment though.
And I now get another warning for every worker thread, but this seems to be a valid one that I can actually fix in my code :)

aluo-x · 2021-08-26T20:09:52Z

Would it be possible to include a fix to this issue in 1.9.1?
@malfet

stas00 · 2021-08-27T23:28:47Z

I'm too confirming that with today's nightly:

conda install -y pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch-nightly -c nvidia

the warnings are gone!

Thank you!

imjiangjun · 2021-08-31T09:35:46Z

Is there a way to suppress this before 1.9.1? My output is flooded with it. But I can't downgrade pytorch.

xkszltl · 2021-08-31T13:17:44Z

Is there a way to suppress this before 1.9.1? My output is flooded with it. But I can't downgrade pytorch.

2>/dev/null if you're OK with it.

Summary: Fixes pytorch#57273. Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it. It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch). pytorch#60171 test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code. cc malfet & ejguan, who have the authority to make a decision. Pull Request resolved: pytorch#60318 Reviewed By: albanD Differential Revision: D29265771 Pulled By: ezyang fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b

Summary: Fixes #57273. Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it. It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch). #60171 test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code. cc malfet & ejguan, who have the authority to make a decision. Pull Request resolved: #60318 Reviewed By: albanD Differential Revision: D29265771 Pulled By: ezyang fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b Co-authored-by: Winston Smith <76181208+imaginary-person@users.noreply.github.com>

12194916 · 2022-11-11T05:39:02Z

I confirm OP's observation that the error messages disappear for pin_memory = False. Either way, the error messages do not seem to correspond to a meaningful problem for end-users.

The warnings are very slightly different in a recent copy of the PyTorch 1.9 nightly preview. This is part of my error log:

[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

Hi. Did changing yo "pin_memory=False" help to avoid this warning? I have the same issue.

gchanan added module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 29, 2021

imaginary-person mentioned this issue Jun 17, 2021

[torch1.9] Warning: Leaking Caffe2 thread-pool after fork. #60171

Closed

imaginary-person mentioned this issue Jun 19, 2021

Remove Caffe2 thread-pool leak warning #60318

Closed

glenn-jocher mentioned this issue Jun 19, 2021

Warning: Leaking Caffe2 thread-pool after fork with DataLoader num_workers>0 and pin_memory=True ultralytics/yolov5#3695

Closed

facebook-github-bot closed this as completed in 567e6d3 Jun 22, 2021

lexiconium mentioned this issue Aug 27, 2021

based on offered baseline code boostcampaitech2/image-classification-level1-05#7

Merged

sithu31296 mentioned this issue Aug 30, 2021

Question about the dimensionality of the mask. sithu31296/semantic-segmentation#3

Closed

kishwarshafin mentioned this issue Aug 31, 2021

RuntimeError: Unable to get link info (addr overflow, addr = 107864, size = 328, eoa = 2048) w/ v0.5 kishwarshafin/pepper#91

Closed

malfet mentioned this issue Sep 8, 2021

[Release-1.9] Remove Caffe2 thread-pool leak warning (#60318) #64651

Merged

tenzen-y mentioned this issue Sep 25, 2021

[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) KindaiCVLAB/machinelearning-images#16

Closed

kan-bayashi mentioned this issue Oct 30, 2021

Conformer FastSpeech2 fine-tuning suspended at 24epoch espnet/espnet#3721

Closed

ouyhlan mentioned this issue Nov 29, 2021

[bugfix]修复Trainer里check_code函数忽略pin_memory参数导致的内存bug fastnlp/fastNLP#400

Open

5 tasks

ghkhk mentioned this issue Apr 14, 2022

Reproducing mAP with mscoco2014 dataset Alibaba-MIIL/ML_Decoder#39

Closed

JiahaoYao mentioned this issue Jul 1, 2022

warning from the horovod trainer ray-project/ray_lightning#168

Closed

JiahaoYao mentioned this issue Jul 14, 2022

ray_ddp issue of Leaking Caffe2 thread-pool after fork. (function pthreadpool) ray-project/ray_lightning#180

Open

mh0797 mentioned this issue Oct 14, 2022

pytorch causes Warning: Leaking Caffe2 thread-pool after fork motional/nuplan-devkit#161

Closed

Warning: Leaking Caffe2 thread-pool after fork when using DataLoader with num_workers>0 and pin_memory=True #57273

Warning: Leaking Caffe2 thread-pool after fork when using DataLoader with num_workers>0 and pin_memory=True #57273

Comments

rafi-cohen commented Apr 29, 2021 • edited by pytorch-probot bot Loading

🐛 Bug

To Reproduce

Output:

Expected behavior

Environment

Additional context

imaginary-person commented Apr 29, 2021

xkszltl commented May 16, 2021

imaginary-person commented May 16, 2021

xkszltl commented May 18, 2021

rafi-cohen commented May 18, 2021 • edited Loading

mansimane commented May 24, 2021

Jeffrey-Ede commented Jun 1, 2021

imaginary-person commented Jun 1, 2021 • edited Loading

ejguan commented Jun 1, 2021

moskomule commented Jun 18, 2021

DrJimFan commented Jun 19, 2021

imaginary-person commented Jun 19, 2021 • edited Loading

imaginary-person commented Jun 19, 2021 • edited Loading

imaginary-person commented Jun 19, 2021 • edited Loading

imaginary-person commented Jun 19, 2021

glenn-jocher commented Jun 19, 2021 • edited Loading

moskomule commented Jun 19, 2021

imaginary-person commented Jun 19, 2021

moskomule commented Jun 19, 2021

imaginary-person commented Jun 19, 2021 • edited Loading

moskomule commented Jun 20, 2021

rafi-cohen commented Jun 20, 2021 • edited Loading

imaginary-person commented Jun 20, 2021

joaqo commented Jun 22, 2021 • edited Loading

Erotemic commented Jul 30, 2021

xkszltl commented Jul 30, 2021 • edited Loading

Erotemic commented Jul 30, 2021

jmskinner commented Aug 5, 2021 • edited Loading

Bernd1969 commented Aug 5, 2021

ejguan commented Aug 5, 2021

imaginary-person commented Aug 5, 2021

Bernd1969 commented Aug 5, 2021 • edited Loading

subhacom commented Aug 19, 2021

ejguan commented Aug 19, 2021

subhacom commented Aug 19, 2021

Bernd1969 commented Aug 23, 2021

aluo-x commented Aug 26, 2021

stas00 commented Aug 27, 2021

imjiangjun commented Aug 31, 2021

xkszltl commented Aug 31, 2021

12194916 commented Nov 11, 2022

`Warning: Leaking Caffe2 thread-pool after fork` when using `DataLoader` with `num_workers>0` and `pin_memory=True` #57273

`Warning: Leaking Caffe2 thread-pool after fork` when using `DataLoader` with `num_workers>0` and `pin_memory=True` #57273

rafi-cohen commented Apr 29, 2021 •

edited by pytorch-probot bot

Loading

rafi-cohen commented May 18, 2021 •

edited

Loading

imaginary-person commented Jun 1, 2021 •

edited

Loading

imaginary-person commented Jun 19, 2021 •

edited

Loading

imaginary-person commented Jun 19, 2021 •

edited

Loading

imaginary-person commented Jun 19, 2021 •

edited

Loading

glenn-jocher commented Jun 19, 2021 •

edited

Loading

imaginary-person commented Jun 19, 2021 •

edited

Loading

rafi-cohen commented Jun 20, 2021 •

edited

Loading

joaqo commented Jun 22, 2021 •

edited

Loading

xkszltl commented Jul 30, 2021 •

edited

Loading

jmskinner commented Aug 5, 2021 •

edited

Loading

Bernd1969 commented Aug 5, 2021 •

edited

Loading