Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spack environment with ROCm 5.6 fails to build PETSc #65

Open
2 of 12 tasks
nkoukpaizan opened this issue Nov 7, 2023 · 4 comments
Open
2 of 12 tasks

Spack environment with ROCm 5.6 fails to build PETSc #65

nkoukpaizan opened this issue Nov 7, 2023 · 4 comments
Assignees

Comments

@nkoukpaizan
Copy link
Collaborator

Issue type

  • New feature
  • Bug
  • Discussion
  • Other

Relates to

  • OPFLOW
  • SOPFLOW
  • SCOPFLOW
  • TCOPFLOW
  • CMake build system
  • Spack configuration
  • Manual
  • Web docs
  • [] Other

Summary

While attempting to upgrade to ROCm 5.6 on Frontier (see nicholson/frontier-rocm5.6), PETSc fails to build.

The error is an ICE (Internal Compiler Error):
>> 2973 fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.
>> 3021 clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation)

Full log: spack-build-out.txt

I'll try a few more things. Building PETSc from source outside of the Spack environment seems to work fine with ROCm 5.6.

@nkoukpaizan nkoukpaizan self-assigned this Nov 7, 2023
@nkoukpaizan
Copy link
Collaborator Author

Tracked the issue down to an architecture-specific portion of the PETSc code. The Spack environment has target: [zen3], such that it appends -march=znver3 -mtune=znver3 to the compiler flags. With these flags, the compiler throws the error on a portion of code guarded by #if defined(__AVX2__) && defined(__FMA__) .... The code compiles fine if that section of code is not compiled (-march=x86-64 -mtune=generic when target: [x86-64] in Spack).

I am now able to reproduce the issue outside of my Spack environment by adding --CFLAGS="-march=znver3 -mtune=znver3" to the PETSc configuration line. My reproducer compiles with amdclang from amd/5.2 through amd/5.5.1, but not with amd/5.6 and amd/5.7. That tells me it's a regression in the compiler.

I'll simplify the reproducer so that I can file a bug report with OLCF and AMD. I am also rebuilding our software stack with target: [x86-64], though that may have a negative impact on performance.

CC: @pelesh @cameronrutherford

@cameronrutherford
Copy link
Contributor

Huh. Fascinating. Hopefully we can fix this in the newest version of that compiler toolchain. cc @balay

@balay
Copy link

balay commented Nov 8, 2023

cc: @jczhang07

@jczhang07
Copy link

@balay it failed on code not related to GPU. As mentioned above, it seems like a compiler bug. I am not sure how we can do at petsc side to work around that (with --CFLAGS="-march=znver3 -mtune=znver3", and amdclang 5.6+)

fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /opt/rocm-5.6.0/llvm/bin/clang -march=znver3 -mtune=znver3 -I/lustre/orion/csc359/proj-shared/nkouk/spack-cache/build-stage/spack-stage-petsc-3.19.6-a75cj3sn6y5jpcseh5zemwmih53g2oto/spack-src/include -I/lustre/orion/csc359/proj-shared/nkouk/spack-cache/build-stage/spack-stage-petsc-3.19.6-a75cj3sn6y5jpcseh5zemwmih53g2oto/spack-src/arch-linux-c-opt/include -I/opt/rocm-5.6.0/include -I/opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/include -I/lustre/orion/csc359/proj-shared/nkouk/spack-install/linux-sles15-zen3/gcc-12.2.0-mixed/openblas-0.3.20-7hydqmqje2llj2tehcwgr55bhtp5bul2/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/llvm/include -I/opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/include -c -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3 -MMD -MP /lustre/orion/csc359/proj-shared/nkouk/spack-cache/build-stage/spack-stage-petsc-3.19.6-a75cj3sn6y5jpcseh5zemwmih53g2oto/spack-src/src/mat/impls/baij/seq/baij2.c -o arch-linux-c-opt/obj/mat/impls/baij/seq/baij2.o

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants