Skip to content

test_dataloader fails in restricted CPU environments #44368

@Flamefire

Description

@Flamefire

🐛 Bug

The test at

class TestSetAffinity(TestCase):
uses a call to os.sched_setaffinity(0, [2]) which fails if the current user has no access to CPU id 2. This happens often in shared environments where multiple users share a CPU and each gets assigned a different core(s)

What I don't get: Why does this test modify a global property at all? Judging from the (very simple) test it would be enough if some local (or global) variable would be set instead.

Or if the affinity is really required to be set: Why isn't it reset afterwards?

At the least I'd suggest to use os.sched_getaffinity and use one of the existing ids and not a random, fixed one.

To Reproduce

Steps to reproduce the behavior:

  1. Use cgroups to restrict the current user/process to only CPU ids except 2
  2. Run test_dataloader

Environment

  • PyTorch Version (e.g., 1.0): 1.6.0
  • OS (e.g., Linux): Linux
## Additional context
======================================================================
ERROR: test_set_affinity_in_worker_init (__main__.TestSetAffinity)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_dataloader.py", line 2004, in test_set_affinity_in_worker_init
    for sample in dataloader:
  File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
    return self._process_data(data)
  File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
    data.reraise()
  File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 142, in _worker_loop
    init_fn(worker_id)
  File "test_dataloader.py", line 1992, in worker_set_affinity
    os.sched_setaffinity(0, [2])
OSError: [Errno 22] Invalid argumen

cc @ssnl @VitalyFedyunin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: dataloaderRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions