-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Open
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixhas workaroundmodule: elasticRelated to torch.distributed.elasticRelated to torch.distributed.elasticmodule: testsIssues related to tests (not the torch.testing module)Issues related to tests (not the torch.testing module)oncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
My cmd in test/distributed/elastic/multiprocessing directory:
PYTORCH_TESTING_DEVICE_ONLY_FOR='cuda' python -m pytest redirects_test.py
Output:
=============================================================================================================================================== test session starts ================================================================================================================================================
platform linux -- Python 3.10.8, pytest-8.1.1, pluggy-1.4.0
rootdir: /projs/framework/fooooo/code/pytorch_new
configfile: pytest.ini
plugins: hypothesis-6.15.0, rerunfailures-14.0, flakefinder-1.1.0, xdist-3.3.1
collected 6 items
Running 6 items in this shard
redirects_test.py first stdout from c
first stderr from c
F.FFbar first from c
bar first from cmd
Ffoo first from c
foo first from cmd
F [100%]
===================================================================================================================================================== FAILURES =====================================================================================================================================================
_________________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_both _________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
self._callTestMethod(testMethod)
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
method()
File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 97, in test_redirect_both
with redirect_stdout(stdout_log), redirect_stderr(stderr_log):
File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
----------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------------------------------------------
first stdout from python
----------------------------------------------------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------------------------------------------------
first stderr from python
____________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_large_buffer_c ____________________________________________________________________________________________________________________________________
Traceback (most recent call last):
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
self._callTestMethod(testMethod)
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
method()
File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 140, in test_redirect_large_buffer_c
self._redirect_large_buffer(c_print)
File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 121, in _redirect_large_buffer
with redirect_stdout(stdout_log):
File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
___________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_large_buffer_py ____________________________________________________________________________________________________________________________________
Traceback (most recent call last):
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
self._callTestMethod(testMethod)
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
method()
File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 134, in test_redirect_large_buffer_py
self._redirect_large_buffer(py_print)
File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 121, in _redirect_large_buffer
with redirect_stdout(stdout_log):
File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
________________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_stderr ________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
self._callTestMethod(testMethod)
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
method()
File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 72, in test_redirect_stderr
with redirect_stderr(stderr_log):
File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
----------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------------------------------------------
bar first from python
________________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_stdout ________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
yield
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
self._callTestMethod(testMethod)
File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
method()
File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 47, in test_redirect_stdout
with redirect_stdout(stdout_log):
File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
----------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------------------------------------------
foo first from python
============================================================================================================================================= short test summary info ==============================================================================================================================================
FAILED [0.0016s] redirects_test.py::RedirectsTest::test_redirect_both - io.UnsupportedOperation: fileno
FAILED [0.0005s] redirects_test.py::RedirectsTest::test_redirect_large_buffer_c - io.UnsupportedOperation: fileno
FAILED [0.0005s] redirects_test.py::RedirectsTest::test_redirect_large_buffer_py - io.UnsupportedOperation: fileno
FAILED [0.0035s] redirects_test.py::RedirectsTest::test_redirect_stderr - io.UnsupportedOperation: fileno
FAILED [0.0020s] redirects_test.py::RedirectsTest::test_redirect_stdout - io.UnsupportedOperation: fileno
=========================================================================================================================================== 5 failed, 1 passed in 8.13s ============================================================================================================================================
It seems to be caused by the same reason as #115069
Versions
main
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @mruberry @ZainRizvi @dzhulgakov @rohan-varma
Metadata
Metadata
Assignees
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixhas workaroundmodule: elasticRelated to torch.distributed.elasticRelated to torch.distributed.elasticmodule: testsIssues related to tests (not the torch.testing module)Issues related to tests (not the torch.testing module)oncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module