BUG: distutils, add compatibility python parallelization #9050

juliantaylor · 2017-05-04T19:30:01Z

BUG: distutils, place fortranobject files in subfolder

Placing them all under the same name in the top level folder breaks when
using the parallel extension compilation option of python 3.5.

BUG: distutils, add compatiblity python parallelization

Python 3.5 also added build parallelization at the extension level
instead of the file leve numpy uses.
This causes two problems:

numpy.distutils is not threadsafe with duplicated source files
When source files are duplicated in multiple extensions the output
objects are overwritten which can truncate in a parallel context.
This is fixed by keeping track of the files being worked on and wait
for completion if another thread is already using the object name.
The parallelization on two nested levels causes oversubscription.
When building multiple extensions with multiple source files the number
of jobs running is multiplied.
This is fixed by adding a semaphore that limits the number of jobs numpy
starts to the defined amount.

closes gh-7139

Placing them all under the same name in the top level folder breaks when using the parallel extension compilation option of python 3.5.

juliantaylor · 2017-05-04T19:33:03Z

should fix the scipy part of gh-7139
shared source files as in numpy are still an issue, we could fix that by just doing our compilation under a lock.
That would solve the running thread multiplication we current have and would keep our in-extension parallelization but lose some efficiency for small extensions with large files like our mtrand.

Python 3.5 also added build parallelization at the extension level instead of the file leve numpy uses. This causes two problems: - numpy.distutils is not threadsafe with duplicated source files When source files are duplicated in multiple extensions the output objects are overwritten which can truncate in a parallel context. This is fixed by keeping track of the files being worked on and wait for completion if another thread is already using the object name. - The parallelization on two nested levels causes oversubscription. When building multiple extensions with multiple source files the number of jobs running is multiplied. This is fixed by adding a semaphore that limits the number of jobs numpy starts to the defined amount. closes numpygh-7139

juliantaylor · 2017-05-04T21:02:48Z

added a commit that fixes the numpy problem without losing the benefits, a bit more complex than the sledge hammer lock solution, but imo reasonable.

juliantaylor · 2017-05-04T21:04:27Z

technically the second commit makes the first obsolete, but using subfolders for fortranobject is probably a good idea in any case.

eric-wieser · 2017-05-05T14:39:20Z

numpy/distutils/ccompiler.py

+        try:
+            # retrieve slot from our #job semaphore and build
+            with _job_semaphore:
+                self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)


Is there any situation when this semaphore wouldn't be acquired? Aren't there the same number of pool workers as available semaphores?

Nevermind, I get it now - CCompiler_compile itself can be called from multiple threads simultaneously, resulting in multiple pools

correct
an alternative implementation could have one global pool that each compile thread shares

Right now, am I right in thinking that we end up allocating N pools of N workers, but then sharing an N-item semaphore between all N^2 of them? That seems pretty wasteful

it are a few hundred threads in the worst case, so not too bad.

With a global pool we can't close it again but that shouldn't be an issue and I don't think map is guaranteed to be threadsafe (though it probably is)

eric-wieser

Implementation LGTM

juliantaylor · 2017-05-09T15:54:52Z

should probably still go into 1.13, any other opinions on semaphore vs global pool?

charris · 2017-05-10T14:41:47Z

Thanks Julian.

BUG: distutils, place fortranobject files in subfolder

7d134a3

Placing them all under the same name in the top level folder breaks when using the parallel extension compilation option of python 3.5.

juliantaylor force-pushed the fortranobj-path branch from cdd2303 to c0ceda3 Compare May 4, 2017 20:58

juliantaylor changed the title ~~BUG: distutils, place fortranobject files in subfolder~~ BUG: distutils, add compatiblity python parallelization May 4, 2017

juliantaylor force-pushed the fortranobj-path branch from c0ceda3 to 67bfabf Compare May 4, 2017 21:00

eric-wieser reviewed May 5, 2017

View reviewed changes

eric-wieser approved these changes May 5, 2017

View reviewed changes

charris added 01 - Enhancement component: build component: numpy.distutils labels May 5, 2017

juliantaylor added this to the 1.13.0 release milestone May 9, 2017

juliantaylor added the 58 - Ready for review label May 9, 2017

charris merged commit bbb2b81 into numpy:master May 10, 2017

juliantaylor deleted the fortranobj-path branch May 10, 2017 14:42

This was referenced May 10, 2017

ENH: Implement take_along_axis as described in #8708 #8714

Closed

BUG: Fix pinv for stacked matrices #8827

Merged

ENH: random: Add multivariate_hypergeometric function. #8056

Closed

charris changed the title ~~BUG: distutils, add compatiblity python parallelization~~ BUG: distutils, add compatibility python parallelization Dec 12, 2017

s-sajid-ali mentioned this pull request Jul 30, 2018

Fix py-scipy build with intel compilers spack/spack#8817

Closed

charris mentioned this pull request Sep 17, 2018

Trying to install Numpy with ATLAS already present fails with "mismatched quote #11974

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: distutils, add compatibility python parallelization #9050

BUG: distutils, add compatibility python parallelization #9050

juliantaylor commented May 4, 2017 •

edited

Loading

juliantaylor commented May 4, 2017 •

edited

Loading

juliantaylor commented May 4, 2017

juliantaylor commented May 4, 2017

eric-wieser May 5, 2017

eric-wieser May 5, 2017

juliantaylor May 5, 2017 •

edited

Loading

eric-wieser May 5, 2017 •

edited

Loading

juliantaylor May 5, 2017

eric-wieser left a comment

juliantaylor commented May 9, 2017

charris commented May 10, 2017

BUG: distutils, add compatibility python parallelization #9050

BUG: distutils, add compatibility python parallelization #9050

Conversation

juliantaylor commented May 4, 2017 • edited Loading

juliantaylor commented May 4, 2017 • edited Loading

juliantaylor commented May 4, 2017

juliantaylor commented May 4, 2017

eric-wieser May 5, 2017

Choose a reason for hiding this comment

eric-wieser May 5, 2017

Choose a reason for hiding this comment

juliantaylor May 5, 2017 • edited Loading

Choose a reason for hiding this comment

eric-wieser May 5, 2017 • edited Loading

Choose a reason for hiding this comment

juliantaylor May 5, 2017

Choose a reason for hiding this comment

eric-wieser left a comment

Choose a reason for hiding this comment

juliantaylor commented May 9, 2017

charris commented May 10, 2017

juliantaylor commented May 4, 2017 •

edited

Loading

juliantaylor commented May 4, 2017 •

edited

Loading

juliantaylor May 5, 2017 •

edited

Loading

eric-wieser May 5, 2017 •

edited

Loading