New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: expand OpenMP utilities and move from denspeed.pyx to dipy.utils #1050

Merged
merged 5 commits into from Jan 2, 2017

Conversation

Projects
None yet
4 participants
@grlee77
Contributor

grlee77 commented May 13, 2016

I was recently working on adding OpenMP support to some functions in dipy.align and needed some utilities similar to the ones defined within denoise/denspeed.pyx.

I propose moving these to a common location dipy.utils.omp where they can be more easily found and reused.

The only change in functionality to nlmeans would be the default number of threads set when num_threads is not specified,but the user has also defined the OMP_NUM_THREADS environement variable. Previously the default number of threads was always openmp.omp_get_num_procs(), but now it will try to default to the OMP_NUM_THREADS instead. At least on my system, omp_get_num_procs returns the total number of possible threads including hyperthreading which actually makes things slower than just using the number of physical cores.

@riddhishb

This comment has been minimized.

Contributor

riddhishb commented May 14, 2016

@grlee77 this is a good idea, just a quick question.
Does this mean that when nlmeans is run without specifying num_threads, it would use all the cores, I believe thats what OMP_NUM_THREADS does right?
I am not clear on that part

@grlee77

This comment has been minimized.

Contributor

grlee77 commented May 14, 2016

By default OMP_NUM_THREADS is probably not set on most systems unless the user has explicitly done so. In that case the max number of threads returned by omp_get_num_procs would be used.

OMP_NUM_THREADS is an environment variable that can be set by the user (e.g. via export OMP_NUM_THREADS=4 in a bash terminal on linux or osx prior to starting python). For the laptop I am on now there are 4 physical CPU cores, but 8 maximum threads. In practice I find using 4 threads is faster than 8 so I set OMP_NUM_THREADS=4 and would like dipy to honor that choice rather than always forcing to 8.

However, looking more closely at these docs (https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fNUM_005fTHREADS.html), I see that technically OMP_NUM_THREADS can also be set to a comma separated list. I have not supported that case here. The code as implemented here would fail to convert the list to an integer and fallback to the maximum number of threads as returned by omp_get_num_procs.

Edit
MSVC docs don't mention the possibility of comma separated OMP_NUM_THREADS values, so maybe that case is particular to gcc or is only in a more recent OpenMP standard
https://msdn.microsoft.com/en-us/library/yw6c0z19.aspx

@riddhishb

This comment has been minimized.

Contributor

riddhishb commented May 14, 2016

Got it! :) and this is only when the num_threads keyword is not specified, right?

@grlee77

This comment has been minimized.

Contributor

grlee77 commented May 14, 2016

Right.

@arokem

This comment has been minimized.

Member

arokem commented Oct 21, 2016

@omarocegueda : would you mind giving this one a look as well? Is it still relevant?

if have_openmp and num_threads is not None:
openmp.omp_set_num_threads(all_cores)
if num_threads is not None:
restore_default_num_threads()

This comment has been minimized.

@omarocegueda

omarocegueda Oct 21, 2016

Contributor

Just out of curiosity, is it possible that this logic has unexpected side effects? What I understand is that we are "restoring" the number of threads after finishing the parallel block. But what if the number of threads was not the default when we called this function?

This comment has been minimized.

@grlee77

grlee77 Oct 21, 2016

Contributor

I think this would happen if the user had manually set a number of threads via openmp.omp_set_num_threads on their own prior to calling the function. Then after exiting they would be back to the default number of threads rather than whatever was set upon entering the function.

However it seems this was also the case prior to this PR (it always restores to the max number of threads at the end).

Perhaps it would be better to just record the current number of threads upon entering the function and then restore to that?

This comment has been minimized.

@omarocegueda

omarocegueda Oct 24, 2016

Contributor

Yes, I think that would have been better. Resetting the number of threads to the default seems to me as unexpected as not restoring it at all. Anyway, that issue is unrelated to this PR =).

@arokem

This comment has been minimized.

Member

arokem commented Dec 13, 2016

Other than your previous comment, @omarocegueda : what do you think here?

@arokem

This comment has been minimized.

Member

arokem commented Dec 18, 2016

OK - this seems ready to go, so unless someone pipes up, I will merge it in a couple of days.

@arokem arokem merged commit d489c86 into nipy:master Jan 2, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

ShreyasFadnavis pushed a commit to ShreyasFadnavis/dipy that referenced this pull request Sep 20, 2018

Merge pull request nipy#1050 from grlee77/omp_utils
ENH: expand OpenMP utilities and move from denspeed.pyx to dipy.utils
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment