-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: distutils, add compatibility python parallelization #9050
Conversation
Placing them all under the same name in the top level folder breaks when using the parallel extension compilation option of python 3.5.
should fix the scipy part of gh-7139 |
cdd2303
to
c0ceda3
Compare
Python 3.5 also added build parallelization at the extension level instead of the file leve numpy uses. This causes two problems: - numpy.distutils is not threadsafe with duplicated source files When source files are duplicated in multiple extensions the output objects are overwritten which can truncate in a parallel context. This is fixed by keeping track of the files being worked on and wait for completion if another thread is already using the object name. - The parallelization on two nested levels causes oversubscription. When building multiple extensions with multiple source files the number of jobs running is multiplied. This is fixed by adding a semaphore that limits the number of jobs numpy starts to the defined amount. closes numpygh-7139
c0ceda3
to
67bfabf
Compare
added a commit that fixes the numpy problem without losing the benefits, a bit more complex than the sledge hammer lock solution, but imo reasonable. |
technically the second commit makes the first obsolete, but using subfolders for fortranobject is probably a good idea in any case. |
try: | ||
# retrieve slot from our #job semaphore and build | ||
with _job_semaphore: | ||
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any situation when this semaphore wouldn't be acquired? Aren't there the same number of pool workers as available semaphores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I get it now - CCompiler_compile
itself can be called from multiple threads simultaneously, resulting in multiple pools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
an alternative implementation could have one global pool that each compile thread shares
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, am I right in thinking that we end up allocating N
pools of N
workers, but then sharing an N
-item semaphore between all N^2
of them? That seems pretty wasteful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it are a few hundred threads in the worst case, so not too bad.
With a global pool we can't close it again but that shouldn't be an issue and I don't think map is guaranteed to be threadsafe (though it probably is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation LGTM
should probably still go into 1.13, any other opinions on semaphore vs global pool? |
Thanks Julian. |
BUG: distutils, place fortranobject files in subfolder
Placing them all under the same name in the top level folder breaks when
using the parallel extension compilation option of python 3.5.
BUG: distutils, add compatiblity python parallelization
Python 3.5 also added build parallelization at the extension level
instead of the file leve numpy uses.
This causes two problems:
numpy.distutils is not threadsafe with duplicated source files
When source files are duplicated in multiple extensions the output
objects are overwritten which can truncate in a parallel context.
This is fixed by keeping track of the files being worked on and wait
for completion if another thread is already using the object name.
The parallelization on two nested levels causes oversubscription.
When building multiple extensions with multiple source files the number
of jobs running is multiplied.
This is fixed by adding a semaphore that limits the number of jobs numpy
starts to the defined amount.
closes gh-7139