Force early initialization of OpenMP in forked children #29006

peterbell10 · 2019-10-31T23:01:16Z

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader worker_init_fn.

peterbell10 · 2019-11-01T00:11:59Z

@pytorchbot rebase this please

ssnl · 2019-11-01T03:41:21Z

torch/__init__.py

+    torch.get_num_threads()
+
+import multiprocessing as _mp
+_mp.util.register_after_fork(_torch_at_fork, _torch_at_fork)


mp.util is not automatically imported. so you should do something like from multiprocessing.util import register_after_fork.

CI is still failing after making that change. Looks like it's actually a python 2 issue.

I suppose this will need to be done using pthread_atfork in c++ unless there's a way to do it in python 2.

The handler definitely is a thing in python 2, see https://github.com/pytorch/pytorch/blob/master/torch/multiprocessing/reductions.py#L8. I think it is the python 2 import system. Likely a from __future__ import absolute_import will fix.

Okay, we can see if that fixes it.

ezyang

Sure. Waiting on CI.

peterbell10 · 2019-11-01T19:18:08Z

I don't think register_fork_handler works with os.fork(). See second part of #23401 (comment)

ezyang · 2019-11-03T21:26:38Z

@peterbell10 What would you like to do, in that case?

peterbell10 · 2019-11-03T21:52:09Z

I think it would be better to reset to f494962e2ad5738c792ece722df119d1e467cf5a which achieves the same effect using pthread_atfork.

ezyang · 2019-11-07T15:00:04Z

Hmm but this doesn't pass tests

Nov 06 20:54:09 OMP: Error #13: Assertion failure at z_Linux_util.cpp(2338).
Nov 06 20:54:09 OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
Nov 06 20:54:09 Traceback (most recent call last):

ezyang

ci failing

yf225 · 2019-11-07T16:29:33Z

@peterbell10 Would you like to look at the CI error? Thanks!

peterbell10 · 2019-11-07T16:40:53Z

Yes, I'm looking into it now.

mrshenli · 2019-11-08T19:17:02Z

I hit similar errors in rpc tests:

OMP: Error #13: Assertion failure at kmp_csupport.cpp(675).                                                                                                           
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system ve
rsions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/
support/.                                                                                                                                                             
Fatal Python error: Aborted

and

Nov 07 23:04:05 test_nested_remote (__main__.RpcTestWithFork) ... OMP: Error #13: Assertion failure at kmp_runtime.cpp(1407).
Nov 07 23:04:05 OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
Nov 07 23:09:05 ERROR

RPC does not use OMP but does fork processes. @peterbell10 any suggestions on how to debug this?

peterbell10 · 2019-11-08T21:50:13Z

@mrshenli I was able to find the source code which might be useful to match up with the error messages. For example, it looks like your second error corresponds to an assert in __kmp_fork_call.

    KMP_DEBUG_ASSERT(
        __kmp_init_serial); // AC: potentially unsafe, not in sync with shutdown

Python 2 does not have mp.util.register_after_fork so dropping down to native pthreads looks like the only option.

This reverts commit f494962e2ad5738c792ece722df119d1e467cf5a.

peterbell10 · 2019-11-10T16:46:46Z

I think that the failures were caused by the pthread_atfork handler being called before openmp's handler. Thus, the openmp runtime is in an invalid state when we call omp_get_max_threads.

Since we can't specify the ordering of pthread_atfork handlers, I don't think there's any way to go ahead with that approach. Instead I've gone back to registering the callback in python and conditionally use os.register_at_fork if it is available. So, we solve the os.fork() issue for python 3.7 and above.

ezyang · 2019-11-11T01:58:28Z

I double checked and confirmed that Python's atfork functionality is not implement with pthread_atfork

ssnl · 2019-11-11T02:02:33Z

One workaround (that is also used by the python signal handlers) is to simply set a flag in the pthread_atfork handler, and actually do the thing when the control is returned to us, e.g., in at::globalContext() or something.

ezyang · 2019-11-11T02:05:30Z

I know this sort of thing is a pain to test, but this kind of subtle thing needs a test.

…loader-affinity

peterbell10 · 2019-11-14T18:17:33Z

Okay, the test is ready and I've manually confirmed that it fails on master. This took a while because my pytorch build randomly stopped using Intel OpenMP and I couldn't reproduce the issue. Was able to work around that using LD_PRELOAD as suggested in #12535.

ezyang · 2019-11-14T20:31:20Z

Windows failure is real

19:18:44 ======================================================================
19:18:44 ERROR: test_set_affinity_in_worker_init (__main__.TestSetAffinity)
19:18:44 ----------------------------------------------------------------------
19:18:44 Traceback (most recent call last):
19:18:44   File "test_dataloader.py", line 1829, in test_set_affinity_in_worker_init
19:18:44     for sample in dataloader:
19:18:44   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\utils\data\dataloader.py", line 278, in __iter__
19:18:44     return _MultiProcessingDataLoaderIter(self)
19:18:44   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\utils\data\dataloader.py", line 682, in __init__
19:18:44     w.start()
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 105, in start
19:18:44     self._popen = self._Popen(self)
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\context.py", line 223, in _Popen
19:18:44     return _default_context.get_context().Process._Popen(process_obj)
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\context.py", line 322, in _Popen
19:18:44     return Popen(process_obj)
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
19:18:44     reduction.dump(process_obj, to_child)
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\reduction.py", line 60, in dump
19:18:44     ForkingPickler(file, protocol).dump(obj)
19:18:44 AttributeError: Can't pickle local object 'TestSetAffinity.test_set_affinity_in_worker_init.<locals>.worker_init_fn'
19:18:44 
19:18:44 ----------------------------------------------------------------------

ezyang · 2019-11-14T20:31:39Z

OS X failure is real too

Nov 14 11:26:33 ======================================================================
Nov 14 11:26:33 ERROR: test_set_affinity_in_worker_init (__main__.TestSetAffinity)
Nov 14 11:26:33 ----------------------------------------------------------------------
Nov 14 11:26:33 Traceback (most recent call last):
Nov 14 11:26:33   File "test_dataloader.py", line 1829, in test_set_affinity_in_worker_init
Nov 14 11:26:33     for sample in dataloader:
Nov 14 11:26:33   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
Nov 14 11:26:33     return self._process_data(data)
Nov 14 11:26:33   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
Nov 14 11:26:33     data.reraise()
Nov 14 11:26:33   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
Nov 14 11:26:33     raise self.exc_type(msg)
Nov 14 11:26:33 AttributeError: Caught AttributeError in DataLoader worker process 0.
Nov 14 11:26:33 Original Traceback (most recent call last):
Nov 14 11:26:33   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 135, in _worker_loop
Nov 14 11:26:33     init_fn(worker_id)
Nov 14 11:26:33   File "test_dataloader.py", line 1825, in worker_init_fn
Nov 14 11:26:33     os.sched_setaffinity(0, [2])
Nov 14 11:26:33 AttributeError: module 'os' has no attribute 'sched_setaffinity'
Nov 14 11:26:33

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Fixes pytorch#28389 Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`. Pull Request resolved: pytorch#29006 Differential Revision: D18782456 Pulled By: ezyang fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3

peterbell10 added the open source label Oct 31, 2019

peterbell10 requested a review from ezyang October 31, 2019 23:01

peterbell10 force-pushed the dataloader-affinity branch from 63b3f27 to 58355db Compare November 1, 2019 00:17

ssnl reviewed Nov 1, 2019

View reviewed changes

peterbell10 force-pushed the dataloader-affinity branch from 28c1aa2 to f494962 Compare November 1, 2019 13:25

ezyang approved these changes Nov 1, 2019

View reviewed changes

ezyang approved these changes Nov 7, 2019

View reviewed changes

ezyang requested changes Nov 7, 2019

View reviewed changes

yf225 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 7, 2019

ezyang mentioned this pull request Nov 8, 2019

RpcTestWithFork.test_nested_rref is flaky in CI #29382

Closed

peterbell10 force-pushed the dataloader-affinity branch from 2c90f8b to 078fde1 Compare November 10, 2019 16:41

peterbell10 added 6 commits November 10, 2019 16:42

Force early initialization of OpenMP in forked children

9896a7c

Fix import

f51dff3

Move atfork handler into C++

bfc3de8

Python 2 does not have mp.util.register_after_fork so dropping down to native pthreads looks like the only option.

Revert "Move atfork handler into C++"

112c436

This reverts commit f494962e2ad5738c792ece722df119d1e467cf5a.

Use absolute_import

a7211b2

Revert to pthread_atfork

0d3f5ac

peterbell10 force-pushed the dataloader-affinity branch 2 times, most recently from 6e92fd9 to 9a097e2 Compare November 10, 2019 16:44

peterbell10 force-pushed the dataloader-affinity branch from 9a097e2 to d5e0637 Compare November 10, 2019 19:55

Use python atfork handling but handle os.fork() for python >= 3.7

191276d

peterbell10 force-pushed the dataloader-affinity branch from d5e0637 to 191276d Compare November 10, 2019 20:27

ezyang approved these changes Nov 11, 2019

View reviewed changes

peterbell10 added 2 commits November 14, 2019 18:05

Add test that setting affinity is preserved in dataloader

69cf8c6

Merge branch 'master' of https://github.com/pytorch/pytorch into data…

2054192

…loader-affinity

pep8 fix

b4d0ec7

Fix tests

d81fdc1

facebook-github-bot reviewed Dec 3, 2019

View reviewed changes

facebook-github-bot closed this in dcd1216 Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force early initialization of OpenMP in forked children #29006

Force early initialization of OpenMP in forked children #29006

peterbell10 commented Oct 31, 2019

peterbell10 commented Nov 1, 2019

ssnl Nov 1, 2019

peterbell10 Nov 1, 2019

ssnl Nov 1, 2019

peterbell10 Nov 1, 2019

ezyang left a comment

peterbell10 commented Nov 1, 2019

ezyang commented Nov 3, 2019

peterbell10 commented Nov 3, 2019

ezyang commented Nov 7, 2019

ezyang left a comment

yf225 commented Nov 7, 2019

peterbell10 commented Nov 7, 2019

mrshenli commented Nov 8, 2019

peterbell10 commented Nov 8, 2019

peterbell10 commented Nov 10, 2019

ezyang commented Nov 11, 2019

ssnl commented Nov 11, 2019

ezyang commented Nov 11, 2019

peterbell10 commented Nov 14, 2019

ezyang commented Nov 14, 2019

ezyang commented Nov 14, 2019

facebook-github-bot left a comment

Force early initialization of OpenMP in forked children #29006

Force early initialization of OpenMP in forked children #29006

Conversation

peterbell10 commented Oct 31, 2019

peterbell10 commented Nov 1, 2019

ssnl Nov 1, 2019

Choose a reason for hiding this comment

peterbell10 Nov 1, 2019

Choose a reason for hiding this comment

ssnl Nov 1, 2019

Choose a reason for hiding this comment

peterbell10 Nov 1, 2019

Choose a reason for hiding this comment

ezyang left a comment

Choose a reason for hiding this comment

peterbell10 commented Nov 1, 2019

ezyang commented Nov 3, 2019

peterbell10 commented Nov 3, 2019

ezyang commented Nov 7, 2019

ezyang left a comment

Choose a reason for hiding this comment

yf225 commented Nov 7, 2019

peterbell10 commented Nov 7, 2019

mrshenli commented Nov 8, 2019

peterbell10 commented Nov 8, 2019

peterbell10 commented Nov 10, 2019

ezyang commented Nov 11, 2019

ssnl commented Nov 11, 2019

ezyang commented Nov 11, 2019

peterbell10 commented Nov 14, 2019

ezyang commented Nov 14, 2019

ezyang commented Nov 14, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment