Skip to content

Many cases in distributed/elastic/multiprocessing/redirects_test.py fails when use pytest #124906

@cdzhan

Description

@cdzhan

🐛 Describe the bug

My cmd in test/distributed/elastic/multiprocessing directory:

PYTORCH_TESTING_DEVICE_ONLY_FOR='cuda' python -m pytest redirects_test.py

Output:

=============================================================================================================================================== test session starts ================================================================================================================================================
platform linux -- Python 3.10.8, pytest-8.1.1, pluggy-1.4.0
rootdir: /projs/framework/fooooo/code/pytorch_new
configfile: pytest.ini
plugins: hypothesis-6.15.0, rerunfailures-14.0, flakefinder-1.1.0, xdist-3.3.1
collected 6 items
Running 6 items in this shard

redirects_test.py first stdout from c
first stderr from c
F.FFbar first from c
bar first from cmd
Ffoo first from c
foo first from cmd
F                                                                                                                                                                                                                                                                                     [100%]

===================================================================================================================================================== FAILURES =====================================================================================================================================================
_________________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_both _________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 97, in test_redirect_both
    with redirect_stdout(stdout_log), redirect_stderr(stderr_log):
  File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
    std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
----------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------------------------------------------
first stdout from python
----------------------------------------------------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------------------------------------------------
first stderr from python
____________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_large_buffer_c ____________________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 140, in test_redirect_large_buffer_c
    self._redirect_large_buffer(c_print)
  File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 121, in _redirect_large_buffer
    with redirect_stdout(stdout_log):
  File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
    std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
___________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_large_buffer_py ____________________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 134, in test_redirect_large_buffer_py
    self._redirect_large_buffer(py_print)
  File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 121, in _redirect_large_buffer
    with redirect_stdout(stdout_log):
  File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
    std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
________________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_stderr ________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 72, in test_redirect_stderr
    with redirect_stderr(stderr_log):
  File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
    std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
----------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------------------------------------------
bar first from python
________________________________________________________________________________________________________________________________________ RedirectsTest.test_redirect_stdout ________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/python3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/projs/framework/fooooo/code/pytorch_new/test/distributed/elastic/multiprocessing/redirects_test.py", line 47, in test_redirect_stdout
    with redirect_stdout(stdout_log):
  File "/usr/local/python3.10/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/projs/framework/fooooo/code/pytorch_new/torch/distributed/elastic/multiprocessing/redirects.py", line 88, in redirect
    std_fd = python_std.fileno()
io.UnsupportedOperation: fileno
----------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------------------------------------------
foo first from python
============================================================================================================================================= short test summary info ==============================================================================================================================================
FAILED [0.0016s] redirects_test.py::RedirectsTest::test_redirect_both - io.UnsupportedOperation: fileno
FAILED [0.0005s] redirects_test.py::RedirectsTest::test_redirect_large_buffer_c - io.UnsupportedOperation: fileno
FAILED [0.0005s] redirects_test.py::RedirectsTest::test_redirect_large_buffer_py - io.UnsupportedOperation: fileno
FAILED [0.0035s] redirects_test.py::RedirectsTest::test_redirect_stderr - io.UnsupportedOperation: fileno
FAILED [0.0020s] redirects_test.py::RedirectsTest::test_redirect_stdout - io.UnsupportedOperation: fileno
=========================================================================================================================================== 5 failed, 1 passed in 8.13s ============================================================================================================================================

It seems to be caused by the same reason as #115069

Versions

main

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @mruberry @ZainRizvi @dzhulgakov @rohan-varma

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNot as big of a feature, but technically not a bug. Should be easy to fixhas workaroundmodule: elasticRelated to torch.distributed.elasticmodule: testsIssues related to tests (not the torch.testing module)oncall: distributedAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions