Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP] kernel performance regression on AMDGPU #60300

Closed
ye-luo opened this issue Jan 25, 2023 · 7 comments · Fixed by llvm/llvm-project-release-prs#239
Closed

[OpenMP] kernel performance regression on AMDGPU #60300

ye-luo opened this issue Jan 25, 2023 · 7 comments · Fixed by llvm/llvm-project-release-prs#239

Comments

@ye-luo
Copy link
Contributor

ye-luo commented Jan 25, 2023

regression caused by 5d1dc9f

Using https://github.com/ye-luo/miniqmc testing commit 6f526b6062682ec892fb02d2919484c8b4db0875

mkdir build_llvm_offload_real; cd build_llvm_offload_real
cmake -DCMAKE_CXX_COMPILER=clang++ -DENABLE_OFFLOAD=ON -DOFFLOAD_TARGET=amdgcn-amdhsa -DOFFLOAD_ARCH=gfx906 ..
make -j32 check_spo_batched; OMP_NUM_THREADS=8 rocprof --stats ./bin/check_spo_batched -m 2 -g "2 2 1" -w 80 -n 1
cat results.stats.csv
"Name","Calls","TotalDurationNs","AverageNs","Percentage"
"__omp_offloading_10304_1c0438c__ZN11qmcplusplus17einspline_spo_ompIfE18multi_evaluate_vghERKSt6vectorIPNS_6SPOSetESaIS4_EERKS2_IPNS_11ParticleSetESaISA_EEib_l406.kd",1536,320975426,208968,99.85018277279889

time 320975426 is much larger than previous one 192408385

@ye-luo
Copy link
Contributor Author

ye-luo commented Jan 25, 2023

Adding CMAKE_CXX_FLAGS=-foffload-lto, I got 320684008.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 25, 2023

@llvm/issue-subscribers-openmp

@jdoerfert
Copy link
Member

@jhuber6 will fix this and backport it.

@jhuber6 jhuber6 added this to the LLVM 16.0.0 Release milestone Jan 26, 2023
@jhuber6
Copy link
Contributor

jhuber6 commented Jan 26, 2023

Please backport 6185246 and 0bdde9d.

@tstellar
Copy link
Collaborator

/cherry-pick 6185246 0bdde9d

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 26, 2023

/branch llvm/llvm-project-release-prs/issue60300

llvmbot pushed a commit to llvm/llvm-project-release-prs that referenced this issue Jan 26, 2023
The `OpenMPOpt` pass is pivotal to the performance of many OpenMP
offloading programs. When we perform non-LTO builds with OpenMP we used
to link the OpenMP deviceRTL individually for each TU. This lead to us
getting an additional attributor run on the combined runtime and user
code. When we used LTO we lost a run and suffered a large performance
degradation. This patch simply adds in the extra `OpenMPOpt` pass that
we miss into the LTO pipeline. This patch fixes the performance
regression shown in applications that used OpenMP offloading in LTO
mode.

Previously, this wasn't legal to do as we could emit new runtime calls
into the module. That was fixed by D142646.

Depends on D142646

Fixes llvm/llvm-project#60300

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142650

(cherry picked from commit 6185246)
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 26, 2023

/pull-request llvm/llvm-project-release-prs#239

@nikic nikic reopened this Jan 27, 2023
tstellar pushed a commit to llvm/llvm-project-release-prs that referenced this issue Jan 28, 2023
The `OpenMPOpt` pass is pivotal to the performance of many OpenMP
offloading programs. When we perform non-LTO builds with OpenMP we used
to link the OpenMP deviceRTL individually for each TU. This lead to us
getting an additional attributor run on the combined runtime and user
code. When we used LTO we lost a run and suffered a large performance
degradation. This patch simply adds in the extra `OpenMPOpt` pass that
we miss into the LTO pipeline. This patch fixes the performance
regression shown in applications that used OpenMP offloading in LTO
mode.

Previously, this wasn't legal to do as we could emit new runtime calls
into the module. That was fixed by D142646.

Depends on D142646

Fixes llvm/llvm-project#60300

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142650

(cherry picked from commit 6185246)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment