-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limited to single CPU core on v16.x+ #91098
Comments
|
@llvm/issue-subscribers-openmp Author: None (kmpanilla)
We run ImageMagick with OpenMP linked to leverage multiple CPUs. Have done it for years without issue.
We recently upgraded some of our servers to FreeBSD 13.3 or 14.0 which leverage llvm-16.0.6 (freebsd 14.0) and llvm-17.0.6 (freebsd 13.3). Previously, we were on FreeBSD 13.2 which leveraged llvm-14.0.5.
After upgrading to llvm-16 or llvm-17 we're noticing that ImageMagick when linked with OpenMP is stuck on a single CPU core. Tracked it down to the libomp.so version we're leveraging from the OS, /usr/lib/libomp.so.. Manually installing llvm15 or llvm14 and copying over the older libomp.so from either and the problem goes away. Question is, what changed between llvm15 and llvm16 that would cause us to be limited to a single CPU core? Our application is called from PHP using pecl-imagick and ImageMagick7 (compiled with OpenMP). PHP version doesnt matter (we've reproduced on 8.1, 8.2 and 8.3). Only when LLVM gets upgrade from 14 to 16 or 17 do we have the issue. In fact, the hot copy of v15 or v14 libomp.so back to /usr/lib and we're working as expected again. Here's the breakdown of working vs broken: Anything I can supply to help narrow this down? I'd love to get something pushed to all the branches of 16 and later, if we can solve the issue. Thanks. |
|
@kmpanilla , Do you mind running the app with both these environment variables set: for both a broken OpenMP runtime and a working OpenMP runtime and attaching the OpenMP runtime outputs here? |
@jpeyton52, See attached for the outputs.. 15-output.txt = 15.x (working fine), 17-output.txt = 17.x (broken). We did these on a single CPU (4 core) server to minimize the output. Same behavior for a multi-cpu server though. Let me know what else I can supply. Thank you! |
|
Thanks for the logs. Can you try setting KMP_AFFINITY=none with one of the broken OpenMP runtimes and seeing if the desired behavior is restored? |
|
Setting KMP_AFFINITY=none as suggested, made all the CPU cores start working as expected on previously broken versions! Any reason why this would be required for proper functionality on FreeBSD with v16+, and what would be a permanent fix? |
|
It's a bug in the |
When a child process is forked with OpenMP already initialized, the child process resets its affinity mask and sets proc-bind-var to false so that the entire original affinity mask is used. This patch corrects an issue with the affinity initialization code setting affinity to compact instead of none for this special case of forked children. Fixes: llvm#91098
|
Your PR fix works great for me when added to v18.x. Thanks! |
When a child process is forked with OpenMP already initialized, the child process resets its affinity mask and sets proc-bind-var to false so that the entire original affinity mask is used. This patch corrects an issue with the affinity initialization code setting affinity to compact instead of none for this special case of forked children. The test trying to catch this only testing explicit setting of KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting. Fixes: #91098
When a child process is forked with OpenMP already initialized, the child process resets its affinity mask and sets proc-bind-var to false so that the entire original affinity mask is used. This patch corrects an issue with the affinity initialization code setting affinity to compact instead of none for this special case of forked children. The test trying to catch this only testing explicit setting of KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting. Fixes: llvm#91098 (cherry picked from commit 73bb8d9)
When a child process is forked with OpenMP already initialized, the child process resets its affinity mask and sets proc-bind-var to false so that the entire original affinity mask is used. This patch corrects an issue with the affinity initialization code setting affinity to compact instead of none for this special case of forked children. The test trying to catch this only testing explicit setting of KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting. Fixes: llvm#91098 (cherry picked from commit 73bb8d9)
We run ImageMagick with OpenMP linked to leverage multiple CPUs. Have done it for years without issue.
We recently upgraded some of our servers to FreeBSD 13.3 or 14.0 which leverage llvm-16.0.6 (freebsd 14.0) and llvm-17.0.6 (freebsd 13.3). Previously, we were on FreeBSD 13.2 which leveraged llvm-14.0.5.
After upgrading to llvm-16 or llvm-17 we're noticing that ImageMagick when linked with OpenMP is stuck on a single CPU core. Tracked it down to the libomp.so version we're leveraging from the OS, /usr/lib/libomp.so.. Manually installing llvm15 or llvm14 and copying over the older libomp.so from either and the problem goes away.
Question is, what changed between llvm15 and llvm16 that would cause us to be limited to a single CPU core?
Our application is called from PHP using pecl-imagick and ImageMagick7 (compiled with OpenMP). PHP version doesnt matter (we've reproduced on 8.1, 8.2 and 8.3). Only when LLVM gets upgrade from 14 to 16 or 17 do we have the issue. In fact, the hot copy of v15 or v14 libomp.so back to /usr/lib and we're working as expected again.
Here's the breakdown of working vs broken:
llvm14-14.0.6 - WORKS
llvm15-15.0.7 - WORKS
llvm16-16.0.6 - BROKE
llvm17-17.0.6 (13.3 default) - BROKE
llvm19-19.0.d20240426 - BROKE
Anything I can supply to help narrow this down? I'd love to get something pushed to all the branches of 16 and later, if we can solve the issue.
Thanks.
The text was updated successfully, but these errors were encountered: