[OpenMP] kernel performance regression on AMDGPU #60300

ye-luo · 2023-01-25T21:42:59Z

regression caused by 5d1dc9f

Using https://github.com/ye-luo/miniqmc testing commit 6f526b6062682ec892fb02d2919484c8b4db0875

mkdir build_llvm_offload_real; cd build_llvm_offload_real
cmake -DCMAKE_CXX_COMPILER=clang++ -DENABLE_OFFLOAD=ON -DOFFLOAD_TARGET=amdgcn-amdhsa -DOFFLOAD_ARCH=gfx906 ..
make -j32 check_spo_batched; OMP_NUM_THREADS=8 rocprof --stats ./bin/check_spo_batched -m 2 -g "2 2 1" -w 80 -n 1
cat results.stats.csv

"Name","Calls","TotalDurationNs","AverageNs","Percentage"
"__omp_offloading_10304_1c0438c__ZN11qmcplusplus17einspline_spo_ompIfE18multi_evaluate_vghERKSt6vectorIPNS_6SPOSetESaIS4_EERKS2_IPNS_11ParticleSetESaISA_EEib_l406.kd",1536,320975426,208968,99.85018277279889

time 320975426 is much larger than previous one 192408385

The text was updated successfully, but these errors were encountered:

ye-luo · 2023-01-25T21:59:44Z

Adding CMAKE_CXX_FLAGS=-foffload-lto, I got 320684008.

llvmbot · 2023-01-25T22:18:14Z

@llvm/issue-subscribers-openmp

jdoerfert · 2023-01-26T00:48:58Z

@jhuber6 will fix this and backport it.

jhuber6 · 2023-01-26T19:25:17Z

Please backport 6185246 and 0bdde9d.

tstellar · 2023-01-26T22:55:02Z

/cherry-pick 6185246 0bdde9d

llvmbot · 2023-01-26T23:01:47Z

/branch llvm/llvm-project-release-prs/issue60300

The `OpenMPOpt` pass is pivotal to the performance of many OpenMP offloading programs. When we perform non-LTO builds with OpenMP we used to link the OpenMP deviceRTL individually for each TU. This lead to us getting an additional attributor run on the combined runtime and user code. When we used LTO we lost a run and suffered a large performance degradation. This patch simply adds in the extra `OpenMPOpt` pass that we miss into the LTO pipeline. This patch fixes the performance regression shown in applications that used OpenMP offloading in LTO mode. Previously, this wasn't legal to do as we could emit new runtime calls into the module. That was fixed by D142646. Depends on D142646 Fixes llvm/llvm-project#60300 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142650 (cherry picked from commit 6185246)

llvmbot · 2023-01-26T23:09:18Z

/pull-request llvm/llvm-project-release-prs#239

The `OpenMPOpt` pass is pivotal to the performance of many OpenMP offloading programs. When we perform non-LTO builds with OpenMP we used to link the OpenMP deviceRTL individually for each TU. This lead to us getting an additional attributor run on the combined runtime and user code. When we used LTO we lost a run and suffered a large performance degradation. This patch simply adds in the extra `OpenMPOpt` pass that we miss into the LTO pipeline. This patch fixes the performance regression shown in applications that used OpenMP offloading in LTO mode. Previously, this wasn't legal to do as we could emit new runtime calls into the module. That was fixed by D142646. Depends on D142646 Fixes llvm/llvm-project#60300 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142650 (cherry picked from commit 6185246)

ye-luo assigned jdoerfert and jhuber6 Jan 25, 2023

github-actions bot added the new issue label Jan 25, 2023

EugeneZelenko added openmp performance and removed new issue labels Jan 25, 2023

jhuber6 closed this as completed in 6185246 Jan 26, 2023

jhuber6 added this to the LLVM 16.0.0 Release milestone Jan 26, 2023

EugeneZelenko reopened this Jan 26, 2023

EugeneZelenko added the release:backport label Jan 26, 2023

llvmbot mentioned this issue Jan 26, 2023

PR for llvm/llvm-project#60300 llvm/llvm-project-release-prs#239

Merged

CarlosAlbertoEnciso closed this as completed in SNSystems/llvm-debuginfo-analyzer@f5d8d17 Jan 27, 2023

nikic reopened this Jan 27, 2023

tstellar closed this as completed in llvm/llvm-project-release-prs#239 Jan 28, 2023

EugeneZelenko added the release:merged label Jan 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenMP] kernel performance regression on AMDGPU #60300

[OpenMP] kernel performance regression on AMDGPU #60300

ye-luo commented Jan 25, 2023 •

edited

ye-luo commented Jan 25, 2023

llvmbot commented Jan 25, 2023

jdoerfert commented Jan 26, 2023

jhuber6 commented Jan 26, 2023

tstellar commented Jan 26, 2023

llvmbot commented Jan 26, 2023

llvmbot commented Jan 26, 2023

[OpenMP] kernel performance regression on AMDGPU #60300

[OpenMP] kernel performance regression on AMDGPU #60300

Comments

ye-luo commented Jan 25, 2023 • edited

ye-luo commented Jan 25, 2023

llvmbot commented Jan 25, 2023

jdoerfert commented Jan 26, 2023

jhuber6 commented Jan 26, 2023

tstellar commented Jan 26, 2023

llvmbot commented Jan 26, 2023

llvmbot commented Jan 26, 2023

ye-luo commented Jan 25, 2023 •

edited