Skip to content

[utils.bottleneck] Bottleneck crashes with multi-threaded data loader #6313

@fmassa

Description

@fmassa

torch.utils.bottleneck doesn't work properly when the code contains a data loader that uses more than 0 threads.

Minimum reproducible example (mwe.py):

import argparse
import torch
import torch.utils.data

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='mwe')
    parser.add_argument('--num-workers', default=0, type=int)
    args = parser.parse_args()

    data = torch.rand(10, 1000)
    target = torch.rand(10)
    dataset = torch.utils.data.TensorDataset(data, target)
    data_loader = torch.utils.data.DataLoader(dataset,
        batch_size=2, num_workers=args.num_workers)
    for i, batch in enumerate(data_loader):
        pass

Running the script via:

python -m torch.utils.bottleneck -- mwe.py --num-workers 0

works fine, while

python -m torch.utils.bottleneck -- mwe2.py --num-workers 1

crashes with the following stack trace:

Traceback (most recent call last):
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 280, in <module>
    main()
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 261, in main
    autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 155, in run_autograd_prof
    result.append(run_prof(use_cuda=True))
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 149, in run_prof
    exec(code, globs, None)
  File "mwe2.py", line 15, in <module>
    for i, batch in enumerate(data_loader):
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 285, in __next__
    return self._process_next_batch(batch)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 306, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 40, in __getitem__
    return tuple(tensor[index] for tensor in self.tensors)
  File "/private/home/fmassa/.conda/envs/detectron_v2/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 40, in <genexpr>
    return tuple(tensor[index] for tensor in self.tensors)
RuntimeError: /private/home/fmassa/github/pytorch/torch/csrc/autograd/profiler.h:52: initialization error

assigning this to @zou3519 , even thought I'm not sure if it's a problem in the profiler or in the bottleneck tool.

pytorch version '0.4.0a0+b21e135'

cc @ezyang @gchanan @zou3519 @ssnl

Metadata

Metadata

Assignees

Labels

module: bottleneckRelated to torch.utils.bottleneckmodule: dataloaderRelated to torch.utils.data.DataLoader and Samplerquansight-nackHigh-prio issues that have been reviewed by Quansight and are judged to be not actionable.triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions