RuntimeError: _share_filename_: only available on CPU with num_workers>0 #87688

lucadiliello · 2022-10-25T11:00:13Z

🐛 Describe the bug

I'm getting the following error when setting the number of workers in the DataLoader to be greater than 0.

  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 437, in __iter__
    return self._get_iterator()
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 383, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1036, in __init__
    w.start()
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/Users/lucadiliello/anaconda3/envs/native/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU

I can try to create a working example if needed, but since the bug comes from a big project it will take some time to remove everything not publishable.

Versions

PyTorch version: 1.14.0.dev20221025
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.6.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.102)
CMake version: version 3.24.1
Libc version: N/A

Python version: 3.9.12 (main, Jun 1 2022, 06:34:44) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-12.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.3
[pip3] pytorch-lightning==1.7.7
[pip3] torch==1.14.0.dev20221025
[pip3] torchmetrics==0.10.0
[conda] numpy 1.23.3 pypi_0 pypi
[conda] pytorch-lightning 1.7.7 pypi_0 pypi
[conda] torch 1.14.0.dev20221025 pypi_0 pypi
[conda] torchmetrics 0.10.0 pypi_0 pypi

cc @ssnl @VitalyFedyunin @ejguan @NivekT @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev

The text was updated successfully, but these errors were encountered:

ejguan · 2022-10-27T14:51:24Z

This Error comes from

pytorch/torch/csrc/StorageSharing.cpp

Lines 90 to 91 in aaba0bd

    
           reinterpret_cast<THPStorage*>(_self)->cdata->device_type() == at::kCPU, 
        
           "_share_filename_: only available on CPU");

Do you have any cuda Tensor created within your Dataset?

lucadiliello · 2022-10-27T17:03:43Z

No, I'm getting this error only when using MPS. With CUDA it works fine.

ejguan · 2022-10-27T17:07:14Z

It would be good if you can provide a minimum reproducible code for us.

And, wondering if it's the problem that MPS tensor shared by multiprocessing?
cc: @albanD

albanD · 2022-10-31T14:25:00Z

Ho I haven't looked into that in details. It might be a problem on MPS indeed.
Given the error just above the one reported here, I guess this code is special cased for CUDA to never reach here and we need to add a similar special case for MPS.

lucacorbucci · 2022-11-03T15:05:16Z

Hi, I have the same error when using MPS. Everything works fine on CPU and on CUDA. I already checked and the number of workers in dataloader is 0.

I attach the error:

Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
RuntimeError: _share_filename_: only available on CPU
RuntimeError: _share_filename_: only available on CPU
RuntimeError: _share_filename_: only available on CPU
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU
  File "/Users/lucacorbucci/.pyenv/versions/hierarchical_FL/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 355, in reduce_storage
    metadata = storage._share_filename_cpu_()
RuntimeError: _share_filename_: only available on CPU

ejguan · 2022-11-03T15:13:00Z

I already checked and the number of workers in dataloader is 0.

With number of workers as 0, why does the multiprocessing get involved? Do you mean number of workers larger than 0?

ejguan · 2022-11-03T15:16:36Z

@lucacorbucci
Could you please share a minimum repro script for us?

jeffreykthomas · 2023-01-25T23:24:06Z

I experienced this same error after cloning the mnist hogwild example, so could work as a minimum reproducible example: pytorch/examples#1105

lucadiliello · 2023-06-06T10:49:38Z

I was able to solve the issue by adding this argument to the DataLoader:

multiprocessing_context='fork' if torch.backends.mps.is_available() else None

kulinseth · 2023-06-08T05:55:35Z

Interesting, nice find @lucadiliello . This solution of using multiprocessing_context='fork' seems outside of tensor being on mps. Did you see this issue on CPU as well for Mac?

Neptune-Trojans · 2023-10-16T10:43:57Z

This issue happens only if try to use mps, if using cpu on mac it not happens.

picografix · 2024-02-28T12:40:22Z

I was able to solve the issue by adding this argument to the DataLoader:
multiprocessing_context='fork' if torch.backends.mps.is_available() else None

It worked, thanks!

bdhirsh added module: dataloader Related to torch.utils.data.DataLoader and Sampler triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: data torch.utils.data labels Oct 27, 2022

albanD added the module: mps Related to Apple Metal Performance Shaders framework label Oct 31, 2022

ewtang mentioned this issue Jan 25, 2023

MNIST Hogwild on Apple Silicon pytorch/examples#1105

Open

lucadiliello closed this as completed Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: _share_filename_: only available on CPU with num_workers>0 #87688

RuntimeError: _share_filename_: only available on CPU with num_workers>0 #87688

lucadiliello commented Oct 25, 2022 •

edited by pytorch-bot bot

ejguan commented Oct 27, 2022

lucadiliello commented Oct 27, 2022

ejguan commented Oct 27, 2022

albanD commented Oct 31, 2022

lucacorbucci commented Nov 3, 2022 •

edited

ejguan commented Nov 3, 2022

ejguan commented Nov 3, 2022

jeffreykthomas commented Jan 25, 2023

lucadiliello commented Jun 6, 2023

kulinseth commented Jun 8, 2023

Neptune-Trojans commented Oct 16, 2023 •

edited

picografix commented Feb 28, 2024

RuntimeError: _share_filename_: only available on CPU with num_workers>0 #87688

RuntimeError: _share_filename_: only available on CPU with num_workers>0 #87688

Comments

lucadiliello commented Oct 25, 2022 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

ejguan commented Oct 27, 2022

lucadiliello commented Oct 27, 2022

ejguan commented Oct 27, 2022

albanD commented Oct 31, 2022

lucacorbucci commented Nov 3, 2022 • edited

ejguan commented Nov 3, 2022

ejguan commented Nov 3, 2022

jeffreykthomas commented Jan 25, 2023

lucadiliello commented Jun 6, 2023

kulinseth commented Jun 8, 2023

Neptune-Trojans commented Oct 16, 2023 • edited

picografix commented Feb 28, 2024

lucadiliello commented Oct 25, 2022 •

edited by pytorch-bot bot

lucacorbucci commented Nov 3, 2022 •

edited

Neptune-Trojans commented Oct 16, 2023 •

edited