-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force early initialization of OpenMP in forked children #29006
Conversation
@pytorchbot rebase this please |
63b3f27
to
58355db
Compare
torch/__init__.py
Outdated
torch.get_num_threads() | ||
|
||
import multiprocessing as _mp | ||
_mp.util.register_after_fork(_torch_at_fork, _torch_at_fork) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mp.util is not automatically imported. so you should do something like from multiprocessing.util import register_after_fork
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is still failing after making that change. Looks like it's actually a python 2 issue.
I suppose this will need to be done using pthread_atfork
in c++ unless there's a way to do it in python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handler definitely is a thing in python 2, see https://github.com/pytorch/pytorch/blob/master/torch/multiprocessing/reductions.py#L8. I think it is the python 2 import system. Likely a from __future__ import absolute_import
will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, we can see if that fixes it.
28c1aa2
to
f494962
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Waiting on CI.
I don't think |
@peterbell10 What would you like to do, in that case? |
I think it would be better to reset to f494962e2ad5738c792ece722df119d1e467cf5a which achieves the same effect using |
Hmm but this doesn't pass tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ci failing
@peterbell10 Would you like to look at the CI error? Thanks! |
Yes, I'm looking into it now. |
I hit similar errors in rpc tests:
and
RPC does not use OMP but does fork processes. @peterbell10 any suggestions on how to debug this? |
@mrshenli I was able to find the source code which might be useful to match up with the error messages. For example, it looks like your second error corresponds to an assert in KMP_DEBUG_ASSERT(
__kmp_init_serial); // AC: potentially unsafe, not in sync with shutdown |
2c90f8b
to
078fde1
Compare
Python 2 does not have mp.util.register_after_fork so dropping down to native pthreads looks like the only option.
This reverts commit f494962e2ad5738c792ece722df119d1e467cf5a.
6e92fd9
to
9a097e2
Compare
I think that the failures were caused by the Since we can't specify the ordering of |
9a097e2
to
d5e0637
Compare
d5e0637
to
191276d
Compare
I double checked and confirmed that Python's atfork functionality is not implement with |
One workaround (that is also used by the python signal handlers) is to simply set a flag in the pthread_atfork handler, and actually do the thing when the control is returned to us, e.g., in at::globalContext() or something. |
I know this sort of thing is a pain to test, but this kind of subtle thing needs a test. |
Okay, the test is ready and I've manually confirmed that it fails on master. This took a while because my pytorch build randomly stopped using Intel OpenMP and I couldn't reproduce the issue. Was able to work around that using |
Windows failure is real
|
OS X failure is real too
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Fixes pytorch#28389 Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`. Pull Request resolved: pytorch#29006 Differential Revision: D18782456 Pulled By: ezyang fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
Fixes #28389
Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader
worker_init_fn
.