-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Closed
Labels
module: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
The test at
pytorch/test/test_dataloader.py
Line 2139 in 106459a
class TestSetAffinity(TestCase): |
os.sched_setaffinity(0, [2])
which fails if the current user has no access to CPU id 2. This happens often in shared environments where multiple users share a CPU and each gets assigned a different core(s)
What I don't get: Why does this test modify a global property at all? Judging from the (very simple) test it would be enough if some local (or global) variable would be set instead.
Or if the affinity is really required to be set: Why isn't it reset afterwards?
At the least I'd suggest to use os.sched_getaffinity
and use one of the existing ids and not a random, fixed one.
To Reproduce
Steps to reproduce the behavior:
- Use cgroups to restrict the current user/process to only CPU ids except 2
- Run test_dataloader
Environment
- PyTorch Version (e.g., 1.0): 1.6.0
- OS (e.g., Linux): Linux
## Additional context
======================================================================
ERROR: test_set_affinity_in_worker_init (__main__.TestSetAffinity)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_dataloader.py", line 2004, in test_set_affinity_in_worker_init
for sample in dataloader:
File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/tmp/eb-wcTuci/tmpc5vukK/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 142, in _worker_loop
init_fn(worker_id)
File "test_dataloader.py", line 1992, in worker_set_affinity
os.sched_setaffinity(0, [2])
OSError: [Errno 22] Invalid argumen
Metadata
Metadata
Assignees
Labels
module: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module