You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MacOS tests of StatefulDataLoader CI action fail intermittently during shutdown. on Mac it also takes a lot longer than both windows and ubuntu to shut down (10 minutes vs 1s). I'm not sure what causes Github Actions to mark the test as failed, but
Created an issue here on actions/setup-python but still no response: actions/setup-python#857
Although we still get positive signals from the test, it shows up as an X in Github
Versions
Nightly / main branch in CI,
The text was updated successfully, but these errors were encountered:
I've been trying to isolate the problem here on this branch #1255. I'm unable to repro on my mac laptop, so i'm just trying to bisect it by kicking off so far it's definitely due to test_state_dict.py.
The best sign I get is sometimes the complete jobs or cleanup python logs will show a bunch of Terminate orphan process: pid (xxxxx) (torch_shm_manager).
It seems like on MacOS, multiprocessing fork is more like a spawn and requires importing all the modules again. Something about increasing the total number of worker subprocesses in the test causes massive slowdowns in clean up. The simplest thing to do at this point is to shard the tests. I'll probably give this a shot tomorrow
🐛 Describe the bug
MacOS tests of StatefulDataLoader CI action fail intermittently during shutdown. on Mac it also takes a lot longer than both windows and ubuntu to shut down (10 minutes vs 1s). I'm not sure what causes Github Actions to mark the test as failed, but
Created an issue here on actions/setup-python but still no response: actions/setup-python#857
Although we still get positive signals from the test, it shows up as an X in Github
Versions
Nightly / main branch in CI,
The text was updated successfully, but these errors were encountered: