New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set number of OpenMP threads during runtime #739

Merged
merged 13 commits into from Oct 22, 2015

Conversation

Projects
None yet
7 participants
@Garyfallidis
Member

Garyfallidis commented Oct 18, 2015

This should be working now but it maybe need a few more checks with the tests. I will do that asap.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 19, 2015

@fmorency, @jchoude and @arokem this should resolve issue #608. Try it and let me know if it works for you. If it does the job we should have this in ready for the release which will happen in the next days.

@arokem

This comment has been minimized.

Member

arokem commented Oct 19, 2015

Works on my machine for nlmeans. As you said, we might want a few tests for the changes in bundlemin and streamlinear as well. Let me know when you want me to take a look.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 19, 2015

Of course. I wanted some feedback first just in case I missed something big. Are you testing this on a Linux or a Mac?

@arokem

This comment has been minimized.

Member

arokem commented Oct 19, 2015

Yep - no problem. I'm testing this on a Mac.

On Mon, Oct 19, 2015 at 11:57 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Of course. I wanted some feedback first just in case I missed something
big. Are you testing this on a Linux or a Mac?


Reply to this email directly or view it on GitHub
#739 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 19, 2015

Cool. That means that you can compile openmp with no problem. Correct? Is your compiler clang or gcc?

@arokem

This comment has been minimized.

Member

arokem commented Oct 19, 2015

clang. I do get errors like this one:


gcc -fno-strict-aliasing -I/Users/arokem/anaconda/envs/py2/include -arch
x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
-I/Users/arokem/anaconda/envs/py2/include/python2.7 -c test.c -o test.o
-fopenmp

test.c:1:10: fatal error: 'omp.h' file not found

#include <omp.h>

         ^

1 error generated.

Flags ['-fopenmp', '-fopenmp'] omitted because of compile or link error

When I compile dipy on my machine. I'm not sure that's a great sign.

On Mon, Oct 19, 2015 at 12:11 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Cool. That means that you can compile openmp with no problem. Correct? Is
your compiler clang or gcc?


Reply to this email directly or view it on GitHub
#739 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 19, 2015

That means that you are not using openmp in your machine. So, you cannot provide much feedback right now. Except if you compile and link openmp by yourself. But, I suppose this is secondary right now we need to figure out first that travis problem for the viz PR.

@arokem

This comment has been minimized.

Member

arokem commented Oct 19, 2015

Yup

On Mon, Oct 19, 2015 at 12:17 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

That means that you are not using openmp in your machine. So, you cannot
provide much feedback right now. Except if you compile and link openmp by
yourself. But, I suppose this is secondary right now we need to figure out
first that travis problem.


Reply to this email directly or view it on GitHub
#739 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 19, 2015

Later you can try to compile OpenMP http://openmp.llvm.org
If you do that depending on the number of cores in your machine you will get a descent speedup with nlmeans and bundle registration.

@arokem

This comment has been minimized.

Member

arokem commented Oct 19, 2015

Whoa. It would be good to add these instructions to the docs.

On Mon, Oct 19, 2015 at 12:20 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Later you can try to compile OpenMP here http://openmp.llvm.org
If you do that depending on the number of cores in your machine you will
get a descent speedup with nlmeans and bundle registration.


Reply to this email directly or view it on GitHub
#739 (comment).

@arokem

This comment has been minimized.

Member

arokem commented Oct 19, 2015

LGTM. @fmorency, if you could please take a look, that would be great!

@samuelstjean

This comment has been minimized.

Contributor

samuelstjean commented Oct 20, 2015

Well, doesn't that men that all mac users will need to recompile stuff with their own version with openmp support, so a minority of people will bother? Or it works the same on al platform provided you have openmp installed and linked?

@jchoude

This comment has been minimized.

Contributor

jchoude commented Oct 20, 2015

@samuelstjean You were not in the lab's discussion, but normally, at least on OSX 10.11, clang is supposed to ship with openmp enabled (according to memory). So that's one case solved.

For the rest, yes, probably it would be needed to compile it or use some version of gcc from MacPorts.

That raises a question: is there a plan to provide wheels for dipy? That would be a perfect use case... Maybe there are some technical considerations that I don't know.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 20, 2015

Okay @fmorency will give us feedback today about this. He just told me. :) So we can merge this at some point tonight 💃

@fmorency

This comment has been minimized.

fmorency commented Oct 20, 2015

@Garyfallidis there are some strange stuff happening at runtime. First, the cpu usage when using a mask is much lower (±2.5 cores) than without it (8 cores).

Also, if I call nlmeans(num_threads=1), the number of core used is 1, which is okay. However, if I call nlmeans() afterwards, the number of core used is still 1. I can reproduce it for any number of core.

My tests confirm that the resulting data is not affected by the number of threads.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 20, 2015

Cool, I can replicate the second issue. I am on it.

For the first it's probably normal because the computation is much easier with the mask. I don't think there are any racing conditions trying to access the same amount of memory and skipping threads. It looks very unlikely but will double check.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 20, 2015

Hi @fmorency. I believe this fix will solve what you have reported. If None then all the cores of the system will be used independently of what was set before. Give it a go and gives us thumbs up to move forward please! Thx!

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 20, 2015

Also, what you see with mask is normal. Probably some cycles were moving to L1/L2 and you wouldn't see any further speedups. You will stop seeing this effect when you go to using bigger masks and datasets. For now I have disabled dynamic scheduling. I think we will better of for this stage. As we play more with openmp in the future we can decide which scheduling is better for which algorithms. But that needs some good time investment. I hope that helps.

@arokem

This comment has been minimized.

Member

arokem commented Oct 20, 2015

@fmorency - if you give the thumbs up here, I can go ahead and merge this.

@fmorency

This comment has been minimized.

fmorency commented Oct 21, 2015

@Garyfallidis the num_threads parameter doesn't do anything now. All the cores are used all the time. I tried to make clean and make ext and it doesn't fix the issue.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 21, 2015

Hi @fmorency,

This PR still works well for me. I tried it in a different computer now with 4 cores.
If you can run the following test function you should clearly see initially only 1 cpu taking the load and then all of them (for me its 4) and then again only 2 of the cpus working.

import numpy as np
from dipy.denoise.nlmeans import nlmeans
def test_nlmeans_4d_3dsigma_and_threads():
    # Input is 4D data and 3D sigma
    data = np.ones((50, 50, 50, 5))
    sigma = np.ones(data.shape[:3])
    mask = np.zeros(data.shape[:3])
    mask[:] = 1

    print('1')
    new_data = nlmeans(data, sigma, mask, num_threads=1)
    print('All (for me all 4)')
    new_data2 = nlmeans(data, sigma, mask, num_threads=None)
    print('2')
    new_data3 = nlmeans(data, sigma, mask, num_threads=2)

if __name__ == '__main__':

    test_nlmeans_4d_3dsigma_and_threads()

Give it a go please. @arokem maybe you can confirm too in a Linux box where omp is installed?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 21, 2015

Just to clarify here. This should work in any machine with OpenMP installed. Independent of OS. At least this is what it should happen.

@arokem

This comment has been minimized.

Member

arokem commented Oct 22, 2015

🚢

arokem added a commit that referenced this pull request Oct 22, 2015

Merge pull request #739 from Garyfallidis/omp_num_threads
Set number of OpenMP threads during runtime

@arokem arokem merged commit 05d135a into nipy:master Oct 22, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@mdesco

This comment has been minimized.

Contributor

mdesco commented Oct 23, 2015

Guys, this does not work in MacOS 10.11.
gcc -fno-strict-aliasing -I/Users/desm2239/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/desm2239/anaconda/include/python2.7 -c test.c -o test.o -fopenmp
test.c:1:10: fatal error: 'omp.h' file not found
#include <omp.h>
^
1 error generated.

@arokem

This comment has been minimized.

Member

arokem commented Oct 25, 2015

Yup. Same here.

I am going to try to follow these instructions: https://clang-omp.github.io/
and I'll report back

On Fri, Oct 23, 2015 at 12:17 PM, Maxime Descoteaux <
notifications@github.com> wrote:

Guys, this does not work in MacOS 10.11.
gcc -fno-strict-aliasing -I/Users/desm2239/anaconda/include -arch x86_64
-DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
-I/Users/desm2239/anaconda/include/python2.7 -c test.c -o test.o -fopenmp
test.c:1:10: fatal error: 'omp.h' file not found
#include
^
1 error generated.


Reply to this email directly or view it on GitHub
#739 (comment).

@arokem

This comment has been minimized.

Member

arokem commented Oct 26, 2015

OK - that was a dead-end. But I did eventually find a solution: it turns out that you can get an OMP-enabled gcc from homebrew, which I use for package management on my laptop. It requires that you install it without multilib (http://stackoverflow.com/questions/30049486/what-does-gcc-without-multilib-mean):

brew reinstall gcc --without-multilib

It takes a little while (about 45 minutes), and when it is done, you need to over-ride your system gcc to instead point to the homebrew cellar (in my case it's /usr/local/Cellar/gcc/5.2.0/bin/gcc-5). To do that, I set:

ln -s /usr/local/Cellar/gcc/5.2.0/bin/gcc-5 ~/usr/bin/gcc

(and I have export PATH="/Users/arokem/usr/bin:/usr/local/sbin:$PATH" in my bash_profile file, so that I can do things like this)

@arokem

This comment has been minimized.

Member

arokem commented Oct 26, 2015

Needless to say (I hope), but caveat emptor. This might affect other things that use gcc!

@samuelstjean

This comment has been minimized.

Contributor

samuelstjean commented Oct 26, 2015

Well, so does it means that it does not compile anymore for mac users without building their own gcc version? I don't have a mac, so I'm asking in case you would have tons of people flooding the mailing list saying they can't build it anymore.

@arokem

This comment has been minimized.

Member

arokem commented Oct 26, 2015

As far as I can tell, this is not a new thing on the 10.11.

I don't think that I ever had an openmp-enabled gcc on any mac (certainly
not since I've had clang, which is a couple of years...!)

And I didn't build my own gcc - I installed it from homebrew.

On Mon, Oct 26, 2015 at 2:38 AM, Samuel St-Jean notifications@github.com
wrote:

Well, so does it means that it does not compile anymore for mac users
without building their own gcc version? I don't have a mac, so I'm asking
in case you would have tons of people flooding the mailing list saying they
can't build it anymore.


Reply to this email directly or view it on GitHub
#739 (comment).

@samuelstjean

This comment has been minimized.

Contributor

samuelstjean commented Oct 26, 2015

So, it does not work either with the default clang? And homebrew is just a fancy package manager where it does build stuff for you or you can grab precompiled versions like any linux repo (or else I don't understand why it took 45 minutes to install)? Anywa, I'll let you mac users talk about it, I just though it would be strange if people cannot build anymore the dev version without a compiler with openmp support. And also they don't get any multithreading support, which is sad.

@arokem

This comment has been minimized.

Member

arokem commented Oct 26, 2015

Just to correct the impression: you can build the dev version without an
openmp-supported comiler, but you won't get the benefits of openmp. That
is, everything runs single-thread.

On Mon, Oct 26, 2015 at 8:24 AM, Samuel St-Jean notifications@github.com
wrote:

So, it does not work either with the default clang? And homebrew is just a
fancy package manager where it does build stuff for you or you can grab
precompiled versions like any linux repo (or else I don't understand why it
took 45 minutes to install)? Anywa, I'll let you mac users talk about it, I
just though it would be strange if people cannot build anymore the dev
version without a compiler with openmp support. And also they don't get any
multithreading support, which is sad.


Reply to this email directly or view it on GitHub
#739 (comment).

@jchoude

This comment has been minimized.

Contributor

jchoude commented Oct 26, 2015

@arokem I'll let @mdesco confirm, but I don't think that's what happened to him... The error he pasted last week happened when he tried to make ext with the basic setup on his OSX 10.11, and it just failed... So it seems there might a problem with the omp check somewhere... He was not able to build the dev version at that point.

@arokem

This comment has been minimized.

Member

arokem commented Oct 26, 2015

Hmm - I didn't understand that. I had seen exactly the same issue (see that I pasted exactly the same error earlier in this very thread!), but in my case clang just kept chugging through, compiling all the extensions to work just fine (but without multithreading support)

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 26, 2015

Yes, it is very depressing that Apple decided to switch to their new compiler without OpenMP support and that it takes them so much time to release this properly. This change is severely affecting thousands of software projects around the globe. Thx Apple!
But at least if we can figure out a nice alternative with a) compiling OpenMP for Clang or b) build and using the standard GCC and not Clang we can at least provide the right information to our users.

@jchoude last week when I checked with @mdesco his El Capitan machine there was no omp.h in his system but it was compiling single threaded.

@arokem

This comment has been minimized.

Member

arokem commented Oct 26, 2015

OK - I think we're all on the same page then. I think that we should instruct Apple users to choose one of the following options (in order of difficulty):

  1. Run things single thread
  2. Use homebrew/macports to get an openmp-enabled compiler
  3. Build your own openmp compiler following the instructions on https://clang-omp.github.io/
@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 26, 2015

Yes @arokem I can confirm that this was what it happened to Maxime's computer too. we had another look late at Friday night last week. I should have reported here. That means of course that we need to change the installation docs. But let's make sure that we know the best way to have openmp in osx.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 26, 2015

Have you tried the Xcode solution?
brew install clang-omp

@jchoude

This comment has been minimized.

Contributor

jchoude commented Oct 26, 2015

@Garyfallidis @arokem Oh, ok, I was missing the last update about @mdesco 's situation. Thanks for letting me know.

@arokem

This comment has been minimized.

Member

arokem commented Oct 26, 2015

I did try brew install clang-omp. It did not work (could not compile at all).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 26, 2015

Should we ask for advice from the Cython lads?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 26, 2015

Hmm.. here the answer is exactly what @arokem is proposing. Use GCC! So maybe that should be our advice. http://stackoverflow.com/questions/28010801/compiling-parallelized-cython-with-clang
@jchoude were you able to compile clang with openmp?

@samuelstjean

This comment has been minimized.

Contributor

samuelstjean commented Oct 26, 2015

So, does it still builds with people who don't want to bother building clang/gcc themselves? Ensuring that it still works single threaded without openmp is probably more important than having it only work multithreaded.

@arokem

This comment has been minimized.

Member

arokem commented Oct 26, 2015

Yep - it compiles fine (but single threaded!) out of the box --- with the clang that ships with xcode. That's what I had until yesterday.

@stephanmeesters

This comment has been minimized.

Contributor

stephanmeesters commented Oct 28, 2015

Hi, I've had success compiling DiPy with Clang-OpenMP

First install clang-omp with Homebrew
brew install clang-omp

Then add this specific compiler to the setup.py
os.environ['CC'] = '/usr/local/bin/clang-omp'

Now it should compile. If not, try updating your DiPy.

@arokem

This comment has been minimized.

Member

arokem commented Oct 28, 2015

Happy to hear there's another way to do this. When I try that, I get the
following (indicating something's screwed up about include paths for this
compiler?):


(py2)D-69-91-185-47:dipy (master) $python setup.py build_ext --inplace

running build_ext

/usr/local/bin/clang-omp -fno-strict-aliasing
-I/Users/arokem/anaconda/envs/py2/include -arch x86_64 -DNDEBUG -g -fwrapv
-O3 -Wall -Wstrict-prototypes
-I/Users/arokem/anaconda/envs/py2/include/python2.7 -c test.c -o test.o
/arch:SSE2

clang-3.5: error: no such file or directory: '/arch:SSE2'

Flags ['/arch:SSE2'] omitted because of compile or link error

/usr/local/bin/clang-omp -fno-strict-aliasing
-I/Users/arokem/anaconda/envs/py2/include -arch x86_64 -DNDEBUG -g -fwrapv
-O3 -Wall -Wstrict-prototypes
-I/Users/arokem/anaconda/envs/py2/include/python2.7 -c test.c -o test.o
-msse2 -mfpmath=sse

/usr/local/bin/clang-omp -bundle -undefined dynamic_lookup
-L/Users/arokem/anaconda/envs/py2/lib -arch x86_64 -arch x86_64 test.o
-L/Users/arokem/anaconda/envs/py2/lib -o libtestlib.so

ld: library not found for -lbundle1.o

clang-3.5: error: linker command failed with exit code 1 (use -v to see
invocation)

Flags ['-msse2', '-mfpmath=sse'] omitted because of compile or link error

/usr/local/bin/clang-omp -fno-strict-aliasing
-I/Users/arokem/anaconda/envs/py2/include -arch x86_64 -DNDEBUG -g -fwrapv
-O3 -Wall -Wstrict-prototypes
-I/Users/arokem/anaconda/envs/py2/include/python2.7 -c test.c -o test.o
-fopenmp

In file included from test.c:1:

/usr/local/opt/libiomp/include/libiomp/omp.h:134:13: fatal error: 'stdlib.h'

      file not found

#   include <stdlib.h>

            ^

1 error generated.

Flags ['-fopenmp', '-fopenmp'] omitted because of compile or link error

skipping 'dipy/reconst/peak_direction_getter.c' Cython extension
(up-to-date)

skipping 'dipy/reconst/recspeed.c' Cython extension (up-to-date)

skipping 'dipy/reconst/vec_val_sum.c' Cython extension (up-to-date)

skipping 'dipy/reconst/quick_squash.c' Cython extension (up-to-date)

skipping 'dipy/tracking/distances.c' Cython extension (up-to-date)

skipping 'dipy/tracking/streamlinespeed.c' Cython extension (up-to-date)

building 'dipy.tracking.streamlinespeed' extension

/usr/local/bin/clang-omp -fno-strict-aliasing
-I/Users/arokem/anaconda/envs/py2/include -arch x86_64 -DNDEBUG -g -fwrapv
-O3 -Wall -Wstrict-prototypes
-I/Users/arokem/anaconda/envs/py2/lib/python2.7/site-packages/numpy/core/include
-Isrc -Ibuild -I/Users/arokem/anaconda/envs/py2/include/python2.7 -c
dipy/tracking/streamlinespeed.c -o
build/temp.macosx-10.5-x86_64-2.7/dipy/tracking/streamlinespeed.o

In file included from dipy/tracking/streamlinespeed.c:16:

/Users/arokem/anaconda/envs/py2/include/python2.7/Python.h:33:10: fatal
error:

      'stdio.h' file not found

#include <stdio.h>

         ^

1 error generated.

error: command '/usr/local/bin/clang-omp' failed with exit status 1

On Wed, Oct 28, 2015 at 7:16 AM, stephanmeesters notifications@github.com
wrote:

Hi, I've had success compiling DiPy with Clang-OpenMP

First install clang-omp with Homebrew
brew install clang-omp

Then add this specific compiler to the setup.py
os.environ['CC'] = '/usr/local/bin/clang-omp'

Now it should compile. If not, try updating your DiPy.


Reply to this email directly or view it on GitHub
#739 (comment).

@stephanmeesters

This comment has been minimized.

Contributor

stephanmeesters commented Oct 28, 2015

Some things come to mind, can you try

sudo xcodebuild -license

If that doesn't help, try

xcode-select --install

Otherwise maybe homebrew has some error messages, try brew doctor

I am using Homebrew's Python instead of Anaconda, but I'm not sure if that makes any difference.

@arokem

This comment has been minimized.

Member

arokem commented Oct 28, 2015

On Wed, Oct 28, 2015 at 1:12 PM, stephanmeesters notifications@github.com
wrote:

Some things come to mind, can you try

sudo xcodebuild -license

If that doesn't help, try

xcode-select --install

Otherwise maybe homebrew has some error messages, try brew doctor

I am using Homebrew's Python instead of Anaconda, but I'm not sure if that
makes any difference.

It might make a difference. Out of curiosity: using the homebrew Python,
how did you install vtk?


Reply to this email directly or view it on GitHub
#739 (comment).

@stephanmeesters

This comment has been minimized.

Contributor

stephanmeesters commented Oct 28, 2015

VTK was the main reason I switched to the Homebrew Python

brew install vtk --with-python

@arokem

This comment has been minimized.

Member

arokem commented Oct 28, 2015

Nice.

On Wed, Oct 28, 2015 at 2:55 PM, stephanmeesters notifications@github.com
wrote:

VTK was the main reason I switched to the Homebrew Python

homebrew install vtk --with-python


Reply to this email directly or view it on GitHub
#739 (comment).

@arokem arokem referenced this pull request Feb 21, 2016

Closed

Wheels are built without OMP #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment