-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] BUILD: check OpenMP support and add a switch to build without it #13543
Conversation
38d9a1e
to
56f1786
Compare
I think I came up with something working. There's still one issue. The common test I'm not sure how we want to deal with that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's better if scikit-learn can still be built without OpenMP support if needed. Setting that flag via an env variable sounds reasonable.
" and rerun the build command. Notice however that it will hurt the " | ||
"performances of some estimators, which will run in sequential mode " | ||
"and for which the `n_jobs` parameter will have no effect anymore." | ||
"\n\n ***\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve readability better to use a multi-line string wrapped in textwrap.dedent
openmp_supported = False | ||
|
||
finally: | ||
os.chdir(start_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe clean up the temp folder here?
printf("nthreads=%d\\n", omp_get_num_threads()); | ||
return 0; | ||
} | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe wrap it in textwrap.dendent
to keep correct indentation.
"- Make sure you have followed the installation instructions:\n\n" | ||
" " | ||
"https://scikit-learn.org/dev/developers/advanced_installation.html" | ||
"\n\n- If your compiler should support OpenMP but the build still " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"should support" -> "supports"
" " | ||
"https://scikit-learn.org/dev/developers/advanced_installation.html" | ||
"\n\n- If your compiler should support OpenMP but the build still " | ||
"fails, please submit a bug report here:\n\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"bug report at"
"fails, please submit a bug report here:\n\n" | ||
" " | ||
"https://github.com/scikit-learn/scikit-learn/issues\n\n" | ||
"- If you explicitly want to build scikit-learn without OpenMP " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "explicitly"
"support, you can set the following environment variable:\n\n" | ||
" SKLEARN_NO_OPENMP=TRUE\n\n" | ||
" and rerun the build command. Notice however that it will hurt the " | ||
"performances of some estimators, which will run in sequential mode " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove "hurt the performance", people needing to build without OpenMP support should know what OpenMP does.
|
||
|
||
def get_openmp_flag(compiler): | ||
if os.getenv('SKLEARN_NO_OPENMP') == 'TRUE': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, for binary env variables, their existence should interpreted as True, i.e. if os.getenv('SKLEARN_NO_OPENMP')
. That removes the need to think about how to define boolean True (or "True"
string) in environment variable in BASH and Windows CMD: e.g. whether it's true
, "TRUE"
, True
or maybe 'True'
...
I think it's fair to fail if the variable isn't there anymore. It makes sense to force So we are certain that even openmp-specific code like e.g. even if openMP isn't installed on the machine? It's one thing to compile without the openMP flags, but it's another thing to run without openMP being installed. Also, do we want a CI that doesn't have openMP to make sure everything works? |
It's not only related to
Needs to be tested indeed, but I don't see how it would be possible that the program have access to the openmp symbols if it's not linked to openmp. So my guess is that it will work.
I'm +0 on that. |
Sorry I'm don't I understand what you mean. Using #ifdef _OPENMP
#include <omp.h>
#endif /* _OPENMP */
/* ... */
__pyx_v_n_threads = omp_get_max_threads(); How does this compile if |
Honestly I don't know, but if |
Oh, actually I just noticed that the #include <omp.h>
/* ... */
#ifdef _OPENMP
#include <omp.h>
#endif /* _OPENMP */
/* ... */
__pyx_v_n_threads = omp_get_max_threads(); Now I'm just really confused. |
I'm testing on the macos job where I removed the installation of libomp. |
You were right :) But then there's something I don't understand. Why is this flag necessary ? is Anyway, this is a problem, and I guess we can't switch to no OpenMP build mode. |
Thanks for the test! I think I have a vague idea of what's going on: If your code uses #include <omp.h> If you only use e.g. #ifdef _OPENMP
#include <omp.h>
#endif /* _OPENMP */ If you use both, it will add both. So if you don't import anything with #ifdef _OPENMP
#pragma omp for firstprivate(__pyx_v_thread_idx) lastprivate(__pyx_v_thread_idx)
#endif /* _OPENMP */
for (__pyx_t_11 = 0; __pyx_t_11 < __pyx_t_3; __pyx_t_11++){ You must have tested without using |
Agreed :/ |
Is there any way to guard the cimport with the ifdef explicitly?
|
(we can't Not without changing Cython since it's Cython that generates the .c code. The only solution I can think of is the one I mentioned before: Basically, for every single function that you can That sounds like a bad idea to me |
Actually there's a compile time conditional statement in cython. The syntax is So one possibility would be to protect every direct call to openmp, IF SKLEARN_OPENMP:
num_threads = openmp.omp_get_num_threads()
ELSE:
num_threads = 1
Do you think it's worth ? |
If we support this we need a CI testing it
My 2c
|
No, see #13390 (comment) (We're having the same discussion here and on #13390, might be better to stick to the issue) |
Yes I think it'd be clearer. Maybe |
I added some documentation. Let me know what you think. |
It is possible to build scikit-learn without OpenMP support by setting the | ||
``SKLEARN_NO_OPENMP`` environment variable. This is not recommanded since it | ||
will force some estimators to run in sequential mode and their ``n_jobs`` | ||
parameter will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this needs to happen before cythonizing
doc/developers/performance.rst
Outdated
@@ -68,7 +68,7 @@ following: | |||
with the gold standard, easy to debug Python version. | |||
|
|||
5. Once the code is optimized (not simple bottleneck spottable by | |||
profiling), check whether it is possible to have **coarse grained | |||
profiling), check whether it is possible to have **coarse graineds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did that end up here ?
Co-Authored-By: jeremiedbb <34657725+jeremiedbb@users.noreply.github.com>
…into build-no-openmp
@@ -0,0 +1,9 @@ | |||
# the build will fail if OpenMP support is not handled correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explicitly describe the cases where this file fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't know exactly. I know one case of failure: building with a compiler which does not support openmp.
But with the added functionalities, either the build crashes before, with an explicit error message or one has set SKLEARN_NO_OPENMP
and the build passes.
It's more like a unit test for the build. I'm not sure we should keep it. But there might be other reasons of failure that we don't know yet (there always are :/).
What explanation do you have in mind ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What explanation do you have in mind ?
Well, I asked because I can't think of any reason to keep it either. It seems redundant with check_openmp_support()
.
there might be other reasons of failure that we don't know yet (there always are :/).
Definitely agree. However, the fact that it's here makes me think there is a reason I don't understand, and it's making everything confusing ^^
I would suggest to remove it then, along with the changes to test_configure()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function check_openmp is never called? Yes, I think if can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to remove it then, along with the changes to test_configure().
I removed it. However the changes to test_configure()
are unrelated to that. Without those, the test would fail on osx with default apple clang compiler. We don't want to have to set all the env variables to enable openmp on default apple clang to run the test suite. One can use sklearn on a system which could not build scikit learn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without those, the test would fail on osx with default apple clang compiler.
Which tests would fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_configure
, the one we are talking about :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand it correctly, the only use-case for this are mac users that:
- want to run the tests
- don't need to build scikit-learn (else they would have set the env variables already and the checks would work)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly because they would have set the env variable when building but it won't be set anymore when testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks
sklearn/tests/test_common.py
Outdated
|
||
# OpenMP checks are build checks done before cythonization to allow | ||
# fast fail. Since sklearn can be built and tested on 2 different | ||
# machines, we disable it for a configure check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this addresses of very specific use-case, I think we need to be very clear on the explanation here. This has been confusing me for a while (I might still be missing something but feel free to correct me). Here's my attempt:
This test will run every setup.py and eventually call check_openmp_support(), which tries to compile a C file that uses openmp, unless SKLEARN_NO_OPENMP is set. Some users might want to run the tests without having build-support for openmp. In particular, mac users need to set some env variables to build with openmp support, and these might not be set anymore at test time. We thus temporarily set SKLEARN_NO_OPENMP, so that this test runs smoothly.
# users need to set some environment variables to build with openmp | ||
# support, and these might not be set anymore at test time. We thus | ||
# temporarily set SKLEARN_NO_OPENMP, so that this test runs smoothly. | ||
old_env = os.getenv('SKLEARN_NO_OPENMP') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this old_sklearn_no_openmp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick but LGTM
Thanks a lot @jeremiedbb
Let's just do this and see how it goes |
Thanks to you both! |
scikit-learn#13543)" This reverts commit bc1e414.
scikit-learn#13543)" This reverts commit bc1e414.
Fixes #13390
This PR adds a check for OpenMP support at the beginning of the build. If the check fails, the build crashes with an explicit error message.
It also add the possibility to build without OpenMP support by explicitly setting the environment variable
SKLEARN_NO_OPENMP
.