Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: _share_filename_: only available on CPU with num_workers>0 #87688

Closed
lucadiliello opened this issue Oct 25, 2022 · 12 comments
Closed
Labels
module: data torch.utils.data module: dataloader Related to torch.utils.data.DataLoader and Sampler module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@lucadiliello
Copy link

lucadiliello commented Oct 25, 2022

馃悰 Describe the bug

I'm getting the following error when setting the number of workers in the DataLoader to be greater than 0.

  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 437, in __iter__
    return self._get_iterator()
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 383, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1036, in __init__
    w.start()
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU

I can try to create a working example if needed, but since the bug comes from a big project it will take some time to remove everything not publishable.

Versions

PyTorch version: 1.14.0.dev20221025
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.6.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.102)
CMake version: version 3.24.1
Libc version: N/A

Python version: 3.9.12 (main, Jun 1 2022, 06:34:44) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-12.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.3
[pip3] pytorch-lightning==1.7.7
[pip3] torch==1.14.0.dev20221025
[pip3] torchmetrics==0.10.0
[conda] numpy 1.23.3 pypi_0 pypi
[conda] pytorch-lightning 1.7.7 pypi_0 pypi
[conda] torch 1.14.0.dev20221025 pypi_0 pypi
[conda] torchmetrics 0.10.0 pypi_0 pypi

cc @ssnl @VitalyFedyunin @ejguan @NivekT @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev

@bdhirsh bdhirsh added module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: data torch.utils.data labels Oct 27, 2022
@ejguan
Copy link
Contributor

ejguan commented Oct 27, 2022

This Error comes from

reinterpret_cast<THPStorage*>(_self)->cdata->device_type() == at::kCPU,
"_share_filename_: only available on CPU");

Do you have any cuda Tensor created within your Dataset?

@lucadiliello
Copy link
Author

No, I'm getting this error only when using MPS. With CUDA it works fine.

@ejguan
Copy link
Contributor

ejguan commented Oct 27, 2022

It would be good if you can provide a minimum reproducible code for us.

And, wondering if it's the problem that MPS tensor shared by multiprocessing?
cc: @albanD

@albanD albanD added the module: mps Related to Apple Metal Performance Shaders framework label Oct 31, 2022
@albanD
Copy link
Collaborator

albanD commented Oct 31, 2022

Ho I haven't looked into that in details. It might be a problem on MPS indeed.
Given the error just above the one reported here, I guess this code is special cased for CUDA to never reach here and we need to add a similar special case for MPS.

@lucacorbucci
Copy link

lucacorbucci commented Nov 3, 2022

Hi, I have the same error when using MPS. Everything works fine on CPU and on CUDA. I already checked and the number of workers in dataloader is 0.

I attach the error:

Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
RuntimeError: _share_filename_: only available on CPU
RuntimeError: _share_filename_: only available on CPU
RuntimeError: _share_filename_: only available on CPU
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU

@ejguan
Copy link
Contributor

ejguan commented Nov 3, 2022

I already checked and the number of workers in dataloader is 0.

With number of workers as 0, why does the multiprocessing get involved? Do you mean number of workers larger than 0?

@ejguan
Copy link
Contributor

ejguan commented Nov 3, 2022

@lucacorbucci
Could you please share a minimum repro script for us?

@jeffreykthomas
Copy link

I experienced this same error after cloning the mnist hogwild example, so could work as a minimum reproducible example: pytorch/examples#1105

@lucadiliello
Copy link
Author

I was able to solve the issue by adding this argument to the DataLoader:

multiprocessing_context='fork' if torch.backends.mps.is_available() else None

@kulinseth
Copy link
Collaborator

Interesting, nice find @lucadiliello . This solution of using multiprocessing_context='fork' seems outside of tensor being on mps. Did you see this issue on CPU as well for Mac?

@Neptune-Trojans
Copy link

Neptune-Trojans commented Oct 16, 2023

This issue happens only if try to use mps, if using cpu on mac it not happens.

@picografix
Copy link

I was able to solve the issue by adding this argument to the DataLoader:

multiprocessing_context='fork' if torch.backends.mps.is_available() else None

It worked, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: data torch.utils.data module: dataloader Related to torch.utils.data.DataLoader and Sampler module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants