Skip to content

Conversation

juliantaylor
Copy link
Contributor

Allow extensions using numpy.distutils to compile in parallel.
By passing --jobs=n or -j n to setup.py build the compilation of
extensions is now performed in n parallel processes.
Additionally the environment variable NUMPY_NUM_BUILD_JOBS is used as
the default value, if its unset the default is serial compilation.

The parallelization is limited to within the files of an extension, so
only numpy multiarraymodule really profits but its still a nice
improvement when you have 2-4 cores.
Unfortunately Cython will not profit at all as it tends to build one
module per file.

Currently only CCompiler adapted, but adding Fortran and C++ should be
straightforward.

@juliantaylor
Copy link
Contributor Author

is the exec_command.py file a remnant from the time before python had the subprocess module? its looks very ugly and seems to performing a task that is very simple to do with subprocess

@njsmith
Copy link
Member

njsmith commented Oct 8, 2014

The file says "Created: 11 January 2003"; subprocess was first shipped in
the stdlib in python 2.4, which was released in Nov 2004. So, yes.

On Wed, Oct 8, 2014 at 7:32 PM, Julian Taylor notifications@github.com
wrote:

is the exec_command.py file a remnant from the time before python had the
subprocess module? its looks very ugly and seems to performing a task that
is very simple to do with subprocess


Reply to this email directly or view it on GitHub
#5161 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we name most of these NPY_*? No idea to be honest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right strides and separate start with NPY, will change

@juliantaylor
Copy link
Contributor Author

the fortran compile loop has this comment:

    # build any sources in same order as they were originally specified
    #   especially important for fortran .f90 files using modules

which might mean it can't be parallelized as easily but then the loop is over build.keys() which is undeterministic in its order, so I#m assuming the comment is wrong or only applies to the outer loop. Can a fortran expert confirm?

@juliantaylor
Copy link
Contributor Author

oh nevermind its a double loop to remove the nondeterministic dictionary, I guess one can still parallelize f77 without extra dependency handling

@charris
Copy link
Member

charris commented Oct 10, 2014

@juliantaylor Is this still in progress?

@juliantaylor
Copy link
Contributor Author

hm it seems to be working well for me, just need to fix the env variable then it could be merged.
Its probably best to ping the mailing list about the interface, possibly also ping python-dev so they don't clobber our flags in future distutils updates.

I'd like to parallelize fortran too, but I don't know what the comment about ordering is about. Do you know what is meant?

@charris
Copy link
Member

charris commented Oct 10, 2014

I have no idea.

I'm thinking of going through the f2py tickets and maybe I'll learn something in the process.

@juliantaylor juliantaylor force-pushed the parallel-distutils branch 3 times, most recently from e078139 to de7c59e Compare October 13, 2014 23:54
@juliantaylor
Copy link
Contributor Author

updated, now also supports fortran77 but also got a bit more hacky to support the argument also for the build_ext and build_clib targets, ideally we'd pass the jobs to the compiler but that might break custom compiler classes
also updated the install instructions a bit

@juliantaylor juliantaylor force-pushed the parallel-distutils branch 3 times, most recently from 3e1dd2a to 59830e6 Compare October 14, 2014 00:11
@juliantaylor
Copy link
Contributor Author

no negative reply on the mailing list, anymore comments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well put this in a tuple rather than use \.

@charris
Copy link
Member

charris commented Oct 27, 2014

LGTM, although this is not something I know much about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A for loop might be more straightforward here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the symmetry! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm an old fuddy duddy ;) I find it strange to build a list just to consume an iterator. OTOH, it might be the new Python3 fashion.

Allow extensions using numpy.distutils to compile in parallel.
By passing `--jobs=n` or `-j n` to `setup.py build` the compilation of
extensions is now performed in `n` parallel processes.
Additionally the environment variable NPY_NUM_BUILD_JOBS is used as
the default value, if its unset the default is serial compilation.

The parallelization is limited to within the files of an extension, so
only numpy multiarraymodule really profits but its still a nice
improvement when you have 2-4 cores.
Unfortunately Cython will not profit at all as it tends to build one
module per file.
@juliantaylor
Copy link
Contributor Author

thanks for looking, merging it.

juliantaylor added a commit that referenced this pull request Nov 2, 2014
ENH: support parallel compilation of extensions
@juliantaylor juliantaylor merged commit 066be28 into numpy:master Nov 2, 2014
@juliantaylor juliantaylor deleted the parallel-distutils branch November 2, 2014 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants