Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: Leaking Caffe2 thread-pool after fork when using DataLoader with num_workers>0 and pin_memory=True #57273

Closed
rafi-cohen opened this issue Apr 29, 2021 · 56 comments
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@rafi-cohen
Copy link

rafi-cohen commented Apr 29, 2021

🐛 Bug

When using a DataLoader with num_workers>0 and pin_memory=True, warnings trigger about Leaking Caffe2 thread-pool after fork. This warning shows multiple times, and populates the screen.
The warning doesn't trigger when either num_workers=0 or pin_memory=False.

To Reproduce

Steps to reproduce the behavior:

  1. Run the following:
     from torch.utils.data import DataLoader
     from torchvision.datasets import FakeData
     from torchvision.transforms import ToTensor
     
     
     def main():
         data = FakeData(transform=ToTensor())
         dataloader = DataLoader(data, num_workers=2, pin_memory=True)
         for e in range(1, 6):
             print(f'epoch {e}:')
             for _ in dataloader:
                 pass
     
     
     if __name__ == '__main__':
         main()

Output:

epoch 1:
epoch 2:
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 3:
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 4:
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 5:
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:88] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

Expected behavior

No warnings

Environment

PyTorch version: 1.9.0.dev20210428
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti
GPU 3: GeForce RTX 2080 Ti
GPU 4: GeForce RTX 2080 Ti
GPU 5: GeForce RTX 2080 Ti
GPU 6: GeForce RTX 2080 Ti
GPU 7: GeForce RTX 2080 Ti

Nvidia driver version: 460.32.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.9.0.dev20210428
[pip3] torchaudio==0.9.0a0+999026d
[pip3] torchvision==0.10.0.dev20210428
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.1.74              h6bb024c_0    nvidia
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py38h27cfd23_1
[conda] mkl_fft                   1.3.0            py38h42c9631_2
[conda] mkl_random                1.2.1            py38ha9443f7_2
[conda] numpy                     1.20.1           py38h93e21f0_0
[conda] numpy-base                1.20.1           py38h7d8b39e_0
[conda] pytorch                   1.9.0.dev20210428 py3.8_cuda11.1_cudnn8.0.5_0    pytorch-nightly
[conda] torchaudio                0.9.0.dev20210428            py38    pytorch-nightly
[conda] torchvision               0.10.0.dev20210428      py38_cu111    pytorch-nightly

Additional context

It looks like this warning was introduced in #54895. I don't quite follow the details there, though.

cc @ssnl @VitalyFedyunin @ejguan

@gchanan gchanan added module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 29, 2021
@imaginary-person
Copy link
Contributor

It's supposed to be normal behavior. The Caffe2 thread-pool is leaked & recreated in child processes after fork, in order to prevent a segfault, which otherwise occurs while handling data-structures pertaining to the Caffe2 thread-pool of the parent process, that no longer exists in the child process, which is single-threaded right after fork).

Perhaps, the warning message can be revised, so that it doesn't cause any confusion among users.

@xkszltl
Copy link
Contributor

xkszltl commented May 16, 2021

We're seeing the same issue.

After that PR (#54895), I got a lot of warnings in one of our project.
Looks like 4 warnings for each epoch, emitted within the same second.
My guess is it's from data loader restart.
Because for this specific case we have high epoch count with small dataset, it's flooding our log.
If this is expected, should we suppress if after certain times?

@imaginary-person
Copy link
Contributor

TORCH_WARN_ONCE would've worked the same way as TORCH_WARN in #54895, as this warning is displayed only once per child process that leaks & rebuilds the Caffe2 thread-pool.

If this is expected, should we suppress if after certain times?

If you're referring to filtering this warning in another project, then yes.
But if you're referring to suppressing this warning in the PyTorch source code, then it'd have to be removed.

@xkszltl
Copy link
Contributor

xkszltl commented May 18, 2021

If both fork from dataloader and this warning is normal behavior, I would say possible suppression options other than a complete removal are:

  • Suppress directly from user side, e.g. inside dataloader.
  • Use IPC-capable counter for that "ONCE", e.g. with mmap

@rafi-cohen
Copy link
Author

rafi-cohen commented May 18, 2021

I personally believe that if this is normal behaviour, then this warning should be removed.
This is a very basic use case in PyTorch, which, in my opinion, shouldn't trigger a warning unless something is wrong.

Also, I don't really understand the point of the warning. "The Caffe2 thread-pool was leaked". As a user, do I need to do anything about that? Should I even care?

@mansimane
Copy link

Hi, facing similar warning in latest 1.9 rc1. I tried suppressing the warning with -

  1. Launching script as python -W ignore
  2. Filtering warnings as below
import warnings
warnings.filterwarnings("ignore")

Both options do not seem to suppress the warning as it is from cpp module. Can someone suggest a way to suppress it without modifying the PyTorch code?

@Jeffrey-Ede
Copy link

I confirm OP's observation that the error messages disappear for pin_memory = False. Either way, the error messages do not seem to correspond to a meaningful problem for end-users.

The warnings are very slightly different in a recent copy of the PyTorch 1.9 nightly preview. This is part of my error log:

[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

@imaginary-person
Copy link
Contributor

imaginary-person commented Jun 1, 2021

Hello @malfet, some users reported in this thread that they were misled by the warnings pertaining to leaking the Caffe2 thread-pool, and that they didn't find it to be useful.
Please confirm if it'd be okay to remove this warning. I can submit a PR to do so.
Thank you!

EDIT: Oops, I had missed the pin_memory part. Will look into it.

@ejguan
Copy link
Contributor

ejguan commented Jun 1, 2021

Just wondering if lazy construct caffe2 threadpool can help to resolve this issue. When we call set_num_thread, just write the number to a global variable without creating the threadpool. In this case, does the pin_memory_thread have the COW problem?

@moskomule
Copy link
Contributor

Hi, this problem still exists in v 1.9.

@DrJimFan
Copy link

Is there a way to suppress this? - That's all we users care at this point.

@imaginary-person
Copy link
Contributor

imaginary-person commented Jun 19, 2021

Hello @ejguan, when this warning was added, segfault had been happening even with pin_memory=False. eg #54752.
In that issue, pin_memory was implicitly False.
In fact, if you'd use this script, you'd still get a warning, even if you'd explicitly set pin_memory to True -

import torch

def main():
    torch.set_num_threads(4)

    dataloader = torch.utils.data.DataLoader([1, 2, 3], num_workers=1)
    iter(dataloader).next()

    return

if __name__ == '__main__':
    main()

@imaginary-person
Copy link
Contributor

imaginary-person commented Jun 19, 2021

Just wondering if lazy construct caffe2 threadpool can help to resolve this issue. When we call set_num_thread, just write the number to a global variable without creating the threadpool.

I'm sorry I couldn't fully understand this solution. set_num_threads's code-flow is common for child processes & the main-process, so wouldn't this mean that even in the main process, set_num_threads won't modify the number of threads of an existing pthreadpool (which is why a patch was added to enable the following behavior, although I do see your point that pthreadpool is being created even if it isn't required)?

void set_num_threads(int nthreads) {
TORCH_CHECK(nthreads > 0, "Expected positive number of threads");
num_threads.store(nthreads);
#ifdef _OPENMP
omp_set_num_threads(nthreads);
#endif
#ifdef TH_BLAS_MKL
mkl_set_num_threads(nthreads);
// because PyTorch uses OpenMP outside of MKL invocations
// as well, we want this flag to be false, so that
// threads aren't destroyed and recreated across every
// MKL / non-MKL boundary of OpenMP usage
// See https://github.com/pytorch/pytorch/issues/13757
mkl_set_dynamic(false);
#endif
#ifdef USE_PTHREADPOOL
// because PyTorch uses caffe2::pthreadpool() in QNNPACK
caffe2::PThreadPool* const pool = caffe2::pthreadpool();
TORCH_INTERNAL_ASSERT(pool, "Invalid thread pool!");
pool->set_thread_count(nthreads);
#endif

BTW @ejguan, one of the preconditions for this warning to be produced is that torch.set_num_threads() should've been called at least once by the parent process.

So, the following script would also still produce a warning -

``` from torch.utils.data import DataLoader import torch torch.set_num_threads(4)

def main():
dataloader = DataLoader([1,2,3,4], num_workers=1, pin_memory=False)
for e in range(1, 6):
print(f'epoch {e}:')
for _ in dataloader:
pass

if name == 'main':
main()

</details>

@imaginary-person
Copy link
Contributor

imaginary-person commented Jun 19, 2021

In this case, does the pin_memory_thread have the COW problem?

@ejguan, actually, I'm unable to reproduce this issue with @rafi-cohen's script in this issue.
That's probably because there's no torch.set_num_threads() invocation in it.
I'm able to reproduce the issue only after adding a torch.set_num_threads() invocation.
I tested after building from source the master branches of PyTorch & Vision.

UPDATE: It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so torch.set_num_threads() invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

@imaginary-person
Copy link
Contributor

@rafi-cohen @LinxiFan @moskomule @Jeffrey-Ede @mansimane

I'm unable to reproduce the issue with @rafi-cohen's script, probably because this warning requires a torch.set_num_threads() invocation to have occurred first. (so I can reproduce the issue if I add a torch.set_num_threads() invocation to the script).
However, I tested with the current master branch of PyTorch & the current master branch of Vision.

Can you please confirm if you can reproduce this issue with this script in this issue?
Thank you!

@glenn-jocher
Copy link

glenn-jocher commented Jun 19, 2021

The 'Leaking Caffe2 thread-pool' warning is now appearing in the YOLOv5 (https://github.com/ultralytics/yolov5) Docker Image with PyTorch 1.9. To reproduce:

t=ultralytics/yolov5:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all $t
python train.py --epochs 3

Screenshot 2021-06-19 at 14 44 58

This is a lot verbose warning content that is going to worry users, can we please take some action to suppress these if they are of no consequence?

@moskomule
Copy link
Contributor

@imaginary-person Hi, I could reproduce @rafi-cohen's example by using 1.9.0-py3.9_cuda11.1_cudnn8.0.5_0 installed via conda on Ubuntu 20.04.

@imaginary-person
Copy link
Contributor

Thanks for the info, @moskomule! It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so torch.set_num_threads() invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

The same goes for YOLOv5 - torch.set_num_threads() invocation isn't required to reproduce the issue.

@moskomule
Copy link
Contributor

Thanks. By the way, does it actually cause any memory leak?

@imaginary-person
Copy link
Contributor

imaginary-person commented Jun 19, 2021

@moskomule, no, the warning was added after fixing a segfault. We are leaking the Caffe2 pthreadpool in child processes (workers), as child processes inherit its data structures from the parent process, but child processes are single-threaded, so they don't actually have those threads, but still call pthread_destroy on them!

BTW, are you unable to reproduce the issue in this issue's script with pin_memory=False?
If so, I'd have to check why that happens. Thanks!

@moskomule
Copy link
Contributor

@imaginary-person Hi, thanks for the explanation.

Yes, when pin_memory=False, the warning disappears.

@rafi-cohen
Copy link
Author

rafi-cohen commented Jun 20, 2021

@imaginary-person

I'm unable to reproduce the issue with @rafi-cohen's script, probably because this warning requires a torch.set_num_threads() invocation to have occurred first. (so I can reproduce the issue if I add a torch.set_num_threads() invocation to the script).
However, I tested with the current master branch of PyTorch & the current master branch of Vision.

Can you please confirm if you can reproduce this issue with this script in this issue?
Thank you!

I can confirm the script reproduces the warnings as-is (no need for torch.set_num_threads()) with PyTorch 1.9.0 (py3.8_cuda11.1_cudnn8.0.5_0), running on Ubuntu 18.04.5 LTS

are you unable to reproduce the issue in this issue's script with pin_memory=False?
If so, I'd have to check why that happens. Thanks!

I noticed that if I first run the script with pin_memory=True, and then run it again with pin_memory=False (on the same python instance), the warnings trigger in both cases. However, if I run a new python instance and run the script with pin_memory=False, there are no warnings.

Output of running both options
Python 3.8.10 (default, Jun  4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch.utils.data import DataLoader
>>> from torchvision.datasets import FakeData
>>> from torchvision.transforms import ToTensor
>>>
>>>
>>> def main(pin_memory):
...     data = FakeData(transform=ToTensor())
...     dataloader = DataLoader(data, num_workers=2, pin_memory=pin_memory)
...     for e in range(1, 6):
...         print(f'epoch {e}:')
...         for _ in dataloader:
...             pass
...
>>> main(True)
epoch 1:
epoch 2:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 3:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 4:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 5:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
>>> main(False)
epoch 1:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 2:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 3:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 4:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
epoch 5:
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
>>>
Output of running with `pin_memory=False` only
Python 3.8.10 (default, Jun  4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch.utils.data import DataLoader
>>> from torchvision.datasets import FakeData
>>> from torchvision.transforms import ToTensor
>>>
>>>
>>> def main(pin_memory):
...     data = FakeData(transform=ToTensor())
...     dataloader = DataLoader(data, num_workers=2, pin_memory=pin_memory)
...     for e in range(1, 6):
...         print(f'epoch {e}:')
...         for _ in dataloader:
...             pass
...
>>> main(False)
epoch 1:
epoch 2:
epoch 3:
epoch 4:
epoch 5:
>>>

@imaginary-person
Copy link
Contributor

Thanks for the info, @rafi-cohen & @moskomule!
Yes, I was able to reproduce the issue with the Conda binary.

Basically, when pin_memory is True, a torch.set_num_threads call is made, which I could see with gdb (at::set_num_threads).
So, it seems that it'd be safe to disable the warning.

@joaqo
Copy link

joaqo commented Jun 22, 2021

Having the same issue, would appreciate a way to at least suppress the warning from user space it if it's not affecting training. In my case the reproduction steps are to install latest version of pytorch and try the sample detection reference script in the torchvision repo to train a fasterrcnn model.

@Erotemic
Copy link
Contributor

This is annoying enough that I hope the torch team seriously considers releasing a hotfix patch to 1.9 to suppress this warning. That is, unless 1.10 would come out sooner than a hotfix could be done.

If you guys are going to release 1.10 in the next 2 weeks, I'm fine to wait, but if it's going to take longer please consider releasing a patched version of 1.9 so I can see the text above my TQDM progress bars. I printed stuff out before the training loop for a reason. I want to see messages I printed, not "Leaking Caff2e2" warnings that I don't have any control over.

Or maybe provide a link to a pip install command for a release candidate for 1.10? My scrolling finger thanks you for any consideration that could be given to this issue.

@xkszltl
Copy link
Contributor

xkszltl commented Jul 30, 2021

This is annoying enough that I hope the torch team seriously considers releasing a hotfix patch to 1.9 to suppress this warning. That is, unless 1.10 would come out sooner than a hotfix could be done.

If you guys are going to release 1.10 in the next 2 weeks, I'm fine to wait, but if it's going to take longer please consider releasing a patched version of 1.9 so I can see the text above my TQDM progress bars. I printed stuff out before the training loop for a reason. I want to see messages I printed, not "Leaking Caff2e2" warnings that I don't have any control over.

Or maybe provide a link to a pip install command for a release candidate for 1.10? My scrolling finger thanks you for any consideration that could be given to this issue.

@Erotemic

If it's annoying enough, see if you can ask it to shut up with some glog env, e.g. GLOG_minloglevel=2. Note that this will disable INFO as well.

Alternatively you may try 2>/dev/null if it's on stderr.

@Erotemic
Copy link
Contributor

Setting GLOG_minloglevel=2 didn't change the behavior.

But piping stderr to null works! Much better than the above tail workaround.

@jmskinner
Copy link

jmskinner commented Aug 5, 2021

Confirming that this is still an issue and EXTREMELY annoying, also prohibitive to testing. Any idea if this could be a hot-fix?

@Bernd1969
Copy link

I can also confirm this. I use 6 worker threads to speed up my training and I get 6 red warnings for every epoch I train. thats about 30% of my output...
And why was this bug closed!? Its not fixed and there seems to be not even a valid workaround.
Pipeing stderror to dev/null is like "switch off your monitor, then you dont see it anymore", but not a solution.

@ejguan
Copy link
Contributor

ejguan commented Aug 5, 2021

Can you try to use PyTorch nightly as the PR to disable the warning has landed into master branch? https://pytorch.org/get-started/locally/

@imaginary-person
Copy link
Contributor

And why was this bug closed!? Its not fixed and there seems to be not even a valid workaround.

The next PyTorch release wouldn't have this warning.

If it's not possible for you to use the nightly release (which has this warning disabled), is it possible for you to build from the 1.9 release branch, while also adding the change from #60318? Thanks

@Bernd1969
Copy link

Bernd1969 commented Aug 5, 2021

I made:

conda install pytorch -c pytorch-nightly

It looked like it worked, but the warning just came again :(
I will retry in a few days with stable build
Thx 4 fast support

P.S.: above command only updated the certificate of pytorch, but it seems it did not install anything.
Do I need to start from scratch with my environment to actually get the nightly pytorch package? Or can I enforce the package change somehow?

@subhacom
Copy link

I made:

conda install pytorch -c pytorch-nightly

It looked like it worked, but the warning just came again :(
I will retry in a few days with stable build
Thx 4 fast support

P.S.: above command only updated the certificate of pytorch, but it seems it did not install anything.
Do I need to start from scratch with my environment to actually get the nightly pytorch package? Or can I enforce the package change somehow?

This is still present in the nightly-build, even after pip uninstall torch followed by installing the nightly build using pip.

@ejguan
Copy link
Contributor

ejguan commented Aug 19, 2021

@Bernd1969 @subhacom
Can you use https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py to verify your environment?

@subhacom
Copy link

@Bernd1969 @subhacom
Can you use https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py to verify your environment?

My apologies for a false alarm. Upon running the script I realized that torch was still at version 1.9.0. Earlier I ran:

pip uninstall torch
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html

And the installation messages look like this:

Successfully uninstalled torch-1.9.0+cu102
Looking in links: https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
Collecting torch
Downloading https://download.pytorch.org/whl/nightly/cu102/torch-1.10.0.dev20210819%2Bcu102-cp37-cp37m-linux_x86_64.whl (879.9 MB)
|██████████████████████████████▍ | 834.1 MB 1.4 MB/s eta 0:00:32tcmalloc: large alloc 1147494400 bytes == 0x55f3bf90c000 @ 0x7fef8c5fd615 0x55f3bc70b02c 0x55f3bc7eb17a 0x55f3bc70de4d 0x55f3bc7ffc0d 0x55f3bc7820d8 0x55f3bc77cc35 0x55f3bc70f73a 0x55f3bc781f40 0x55f3bc77cc35 0x55f3bc70f73a 0x55f3bc77e93b 0x55f3bc800a56 0x55f3bc77dfb3 0x55f3bc800a56 0x55f3bc77dfb3 0x55f3bc800a56 0x55f3bc77dfb3 0x55f3bc70fb99 0x55f3bc752e79 0x55f3bc70e7b2 0x55f3bc781e65 0x55f3bc77cc35 0x55f3bc70f73a 0x55f3bc77e93b 0x55f3bc77cc35 0x55f3bc70f73a 0x55f3bc77db0e 0x55f3bc70f65a 0x55f3bc77dd67 0x55f3bc77cc35
|████████████████████████████████| 879.9 MB 14 kB/s
Requirement already satisfied: torchvision in /usr/local/lib/python3.7/dist-packages (0.10.0+cu102)
Collecting torchaudio
Downloading https://download.pytorch.org/whl/nightly/torchaudio-0.10.0.dev20210819-cp37-cp37m-linux_x86_64.whl (2.0 MB)
|████████████████████████████████| 2.0 MB 155 kB/s
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch) (3.7.4.3)
Collecting torch
Downloading torch-1.9.0-cp37-cp37m-manylinux1_x86_64.whl (831.4 MB)
|████████████████████████████████| 831.4 MB 2.9 kB/s
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchvision) (1.19.5)
Requirement already satisfied: pillow>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision) (7.1.2)
...
...
Installing collected packages: torch, torchaudio
Successfully installed torch-1.9.0 torchaudio-0.9.0

So it appears that uninstalling just torch leaves torchaudio and torchvision in their older versions, and the pip install reinstalls torch1.9.0 in order to satisfy their requirements.

Once I uninstalled all three, torch 1.10.0 installed properly, and now the warning is gone.

pip uninstall torch torchvision torchaudio
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html

I suspect @Bernd1969 is experiencing the same with conda.

@Bernd1969
Copy link

Correct. Subhacom's uninstall and reinstall also worked for me.
I had to use cu113 for my environment though.
And I now get another warning for every worker thread, but this seems to be a valid one that I can actually fix in my code :)

@aluo-x
Copy link

aluo-x commented Aug 26, 2021

Would it be possible to include a fix to this issue in 1.9.1?
@malfet

@stas00
Copy link
Contributor

stas00 commented Aug 27, 2021

I'm too confirming that with today's nightly:

conda install -y pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch-nightly -c nvidia

the warnings are gone!

Thank you!

@imjiangjun
Copy link

Is there a way to suppress this before 1.9.1? My output is flooded with it. But I can't downgrade pytorch.

@xkszltl
Copy link
Contributor

xkszltl commented Aug 31, 2021

Is there a way to suppress this before 1.9.1? My output is flooded with it. But I can't downgrade pytorch.

2>/dev/null if you're OK with it.

malfet pushed a commit to malfet/pytorch that referenced this issue Sep 8, 2021
Summary:
Fixes pytorch#57273.

Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it.

It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

pytorch#60171 test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code.

cc malfet & ejguan, who have the authority to make a decision.

Pull Request resolved: pytorch#60318

Reviewed By: albanD

Differential Revision: D29265771

Pulled By: ezyang

fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b
malfet added a commit that referenced this issue Sep 8, 2021
Summary:
Fixes #57273.

Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it.

It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

#60171 test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code.

cc malfet & ejguan, who have the authority to make a decision.

Pull Request resolved: #60318

Reviewed By: albanD

Differential Revision: D29265771

Pulled By: ezyang

fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b

Co-authored-by: Winston Smith <76181208+imaginary-person@users.noreply.github.com>
@12194916
Copy link

I confirm OP's observation that the error messages disappear for pin_memory = False. Either way, the error messages do not seem to correspond to a meaningful problem for end-users.

The warnings are very slightly different in a recent copy of the PyTorch 1.9 nightly preview. This is part of my error log:

[W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

Hi. Did changing yo "pin_memory=False" help to avoid this warning? I have the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.