Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force early initialization of OpenMP in forked children #29006

Closed
wants to merge 11 commits into from

Conversation

peterbell10
Copy link
Collaborator

Fixes #28389

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader worker_init_fn.

@peterbell10
Copy link
Collaborator Author

@pytorchbot rebase this please

torch.get_num_threads()

import multiprocessing as _mp
_mp.util.register_after_fork(_torch_at_fork, _torch_at_fork)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mp.util is not automatically imported. so you should do something like from multiprocessing.util import register_after_fork.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is still failing after making that change. Looks like it's actually a python 2 issue.

I suppose this will need to be done using pthread_atfork in c++ unless there's a way to do it in python 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handler definitely is a thing in python 2, see https://github.com/pytorch/pytorch/blob/master/torch/multiprocessing/reductions.py#L8. I think it is the python 2 import system. Likely a from __future__ import absolute_import will fix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we can see if that fixes it.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Waiting on CI.

@peterbell10
Copy link
Collaborator Author

I don't think register_fork_handler works with os.fork(). See second part of #23401 (comment)

@ezyang
Copy link
Contributor

ezyang commented Nov 3, 2019

@peterbell10 What would you like to do, in that case?

@peterbell10
Copy link
Collaborator Author

I think it would be better to reset to f494962e2ad5738c792ece722df119d1e467cf5a which achieves the same effect using pthread_atfork.

@ezyang
Copy link
Contributor

ezyang commented Nov 7, 2019

Hmm but this doesn't pass tests

Nov 06 20:54:09 OMP: Error #13: Assertion failure at z_Linux_util.cpp(2338).
Nov 06 20:54:09 OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
Nov 06 20:54:09 Traceback (most recent call last):

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci failing

@yf225
Copy link
Contributor

yf225 commented Nov 7, 2019

@peterbell10 Would you like to look at the CI error? Thanks!

@yf225 yf225 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 7, 2019
@peterbell10
Copy link
Collaborator Author

Yes, I'm looking into it now.

@mrshenli
Copy link
Contributor

mrshenli commented Nov 8, 2019

I hit similar errors in rpc tests:

OMP: Error #13: Assertion failure at kmp_csupport.cpp(675).                                                                                                           
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system ve
rsions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/
support/.                                                                                                                                                             
Fatal Python error: Aborted       

and

Nov 07 23:04:05 test_nested_remote (__main__.RpcTestWithFork) ... OMP: Error #13: Assertion failure at kmp_runtime.cpp(1407).
Nov 07 23:04:05 OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
Nov 07 23:09:05 ERROR

RPC does not use OMP but does fork processes. @peterbell10 any suggestions on how to debug this?

@peterbell10
Copy link
Collaborator Author

@mrshenli I was able to find the source code which might be useful to match up with the error messages. For example, it looks like your second error corresponds to an assert in __kmp_fork_call.

    KMP_DEBUG_ASSERT(
        __kmp_init_serial); // AC: potentially unsafe, not in sync with shutdown

Python 2 does not have mp.util.register_after_fork so dropping down to native
pthreads looks like the only option.
This reverts commit f494962e2ad5738c792ece722df119d1e467cf5a.
@peterbell10 peterbell10 force-pushed the dataloader-affinity branch 2 times, most recently from 6e92fd9 to 9a097e2 Compare November 10, 2019 16:44
@peterbell10
Copy link
Collaborator Author

I think that the failures were caused by the pthread_atfork handler being called before openmp's handler. Thus, the openmp runtime is in an invalid state when we call omp_get_max_threads.

Since we can't specify the ordering of pthread_atfork handlers, I don't think there's any way to go ahead with that approach. Instead I've gone back to registering the callback in python and conditionally use os.register_at_fork if it is available. So, we solve the os.fork() issue for python 3.7 and above.

@ezyang
Copy link
Contributor

ezyang commented Nov 11, 2019

I double checked and confirmed that Python's atfork functionality is not implement with pthread_atfork

@ssnl
Copy link
Collaborator

ssnl commented Nov 11, 2019

One workaround (that is also used by the python signal handlers) is to simply set a flag in the pthread_atfork handler, and actually do the thing when the control is returned to us, e.g., in at::globalContext() or something.

@ezyang
Copy link
Contributor

ezyang commented Nov 11, 2019

I know this sort of thing is a pain to test, but this kind of subtle thing needs a test.

@peterbell10
Copy link
Collaborator Author

Okay, the test is ready and I've manually confirmed that it fails on master. This took a while because my pytorch build randomly stopped using Intel OpenMP and I couldn't reproduce the issue. Was able to work around that using LD_PRELOAD as suggested in #12535.

@ezyang
Copy link
Contributor

ezyang commented Nov 14, 2019

Windows failure is real

19:18:44 ======================================================================
19:18:44 ERROR: test_set_affinity_in_worker_init (__main__.TestSetAffinity)
19:18:44 ----------------------------------------------------------------------
19:18:44 Traceback (most recent call last):
19:18:44   File "test_dataloader.py", line 1829, in test_set_affinity_in_worker_init
19:18:44     for sample in dataloader:
19:18:44   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\utils\data\dataloader.py", line 278, in __iter__
19:18:44     return _MultiProcessingDataLoaderIter(self)
19:18:44   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\utils\data\dataloader.py", line 682, in __init__
19:18:44     w.start()
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 105, in start
19:18:44     self._popen = self._Popen(self)
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\context.py", line 223, in _Popen
19:18:44     return _default_context.get_context().Process._Popen(process_obj)
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\context.py", line 322, in _Popen
19:18:44     return Popen(process_obj)
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
19:18:44     reduction.dump(process_obj, to_child)
19:18:44   File "C:\Jenkins\Miniconda3\lib\multiprocessing\reduction.py", line 60, in dump
19:18:44     ForkingPickler(file, protocol).dump(obj)
19:18:44 AttributeError: Can't pickle local object 'TestSetAffinity.test_set_affinity_in_worker_init.<locals>.worker_init_fn'
19:18:44 
19:18:44 ----------------------------------------------------------------------

@ezyang
Copy link
Contributor

ezyang commented Nov 14, 2019

OS X failure is real too

Nov 14 11:26:33 ======================================================================
Nov 14 11:26:33 ERROR: test_set_affinity_in_worker_init (__main__.TestSetAffinity)
Nov 14 11:26:33 ----------------------------------------------------------------------
Nov 14 11:26:33 Traceback (most recent call last):
Nov 14 11:26:33   File "test_dataloader.py", line 1829, in test_set_affinity_in_worker_init
Nov 14 11:26:33     for sample in dataloader:
Nov 14 11:26:33   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
Nov 14 11:26:33     return self._process_data(data)
Nov 14 11:26:33   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
Nov 14 11:26:33     data.reraise()
Nov 14 11:26:33   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
Nov 14 11:26:33     raise self.exc_type(msg)
Nov 14 11:26:33 AttributeError: Caught AttributeError in DataLoader worker process 0.
Nov 14 11:26:33 Original Traceback (most recent call last):
Nov 14 11:26:33   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 135, in _worker_loop
Nov 14 11:26:33     init_fn(worker_id)
Nov 14 11:26:33   File "test_dataloader.py", line 1825, in worker_init_fn
Nov 14 11:26:33     os.sched_setaffinity(0, [2])
Nov 14 11:26:33 AttributeError: module 'os' has no attribute 'sched_setaffinity'
Nov 14 11:26:33 

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
Summary:
Fixes pytorch#28389

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`.
Pull Request resolved: pytorch#29006

Differential Revision: D18782456

Pulled By: ezyang

fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The affinity of a worker process is reset after torch.randperm is called
6 participants