Skip to content

cuda runtime error (3): we're not detecting bad forks #17357

@mrshenli

Description

@mrshenli

🐛 Bug

I am not sure if this is reproducible for every environment, but I hit the following error when trying to set cuda device in processes. What is weird is that the error disappears if I remove the line x = torch.rand(20, 2).cuda() right after the for loop.

Traceback (most recent call last):
  File "/home/shenli/local/miniconda/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/shenli/local/miniconda/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "dist_bug.py", line 5, in run
    torch.cuda.set_device(rank)
  File "/home/shenli/project/pytorch/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (3) : initialization error at ../torch/csrc/cuda/Module.cpp:33

Edit: based on the discussion below, the solution should be fixing the "bad fork" error detection, which is duplicated with #17359.

To Reproduce

import torch
from torch.multiprocessing import Process

def run(rank):
    torch.cuda.set_device(rank)

if __name__ == "__main__":
    size = 2
    processes = []
    for rank in range(size):
        # it would work fine without the line below
        x = torch.rand(20, 2).cuda()
        p = Process(target=run, args=(rank,))
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

Environment

PyTorch version: 1.1.0a0+63214b5
Is debug build: No
CUDA used to build PyTorch: 9.2.88

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
CMake version: version 3.12.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.2.88
GPU models and configuration:
GPU 0: Tesla M40
GPU 1: Tesla M40

Nvidia driver version: 396.26
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.15.4
[pip] torch==1.1.0a0+63214b5
[conda] blas 1.0 mkl
[conda] mkl 2019.1 144
[conda] mkl-include 2019.1 144
[conda] mkl_fft 1.0.6 py37hd81dba3_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] torch 1.1.0a0+63214b5 dev_0

Metadata

Metadata

Labels

module: cudaRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions