Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation issue: HIP package sets excessive environment variables in dependent package modulefiles #44295

Open
2 of 4 tasks
hagertnl opened this issue May 21, 2024 · 0 comments

Comments

@hagertnl
Copy link
Contributor

hagertnl commented May 21, 2024

Steps to reproduce the issue

ROCm-enabled Kokkos modulefile is pasted below. The bottom 12 lines of the modulefile are a duplicate of things set by a rocm modulefile. Presumably, users will also have a rocm modulefile loaded when compiling a Kokkos code, so it is a race to see whether Kokkos or ROCm modulefile is loaded last, and you hope that they're trying to load the same ROCm version. See comment box at the end of this ticket for proposition.

prepend_path("PATH","/sw/frontier/spack-envs/base/opt/cray-sles15-zen3/cce-15.0.0/kokkos-3.6.00-coe525zpyny2uj2qzosexagb2uf6idcf/bin")
prepend_path("LD_LIBRARY_PATH","/sw/frontier/spack-envs/base/opt/cray-sles15-zen3/cce-15.0.0/kokkos-3.6.00-coe525zpyny2uj2qzosexagb2uf6idcf/lib64")
prepend_path("CMAKE_PREFIX_PATH","/sw/frontier/spack-envs/base/opt/cray-sles15-zen3/cce-15.0.0/kokkos-3.6.00-coe525zpyny2uj2qzosexagb2uf6idcf/")
prepend_path("PATH","/sw/frontier/spack-envs/base/opt/cray-sles15-zen3/cce-15.0.0/kokkos-3.6.00-coe525zpyny2uj2qzosexagb2uf6idcf/./bin")
prepend_path("CMAKE_PREFIX_PATH","/sw/frontier/spack-envs/base/opt/cray-sles15-zen3/cce-15.0.0/kokkos-3.6.00-coe525zpyny2uj2qzosexagb2uf6idcf/./")
setenv("ROCM_PATH","/opt/rocm-5.3.0")
setenv("HIP_PLATFORM","amd")
setenv("HIP_COMPILER","clang")
setenv("HIP_CLANG_PATH","/opt/rocm-5.3.0/llvm/bin")
setenv("HSA_PATH","/opt/rocm-5.3.0/hsa")
setenv("ROCMINFO_PATH","/opt/rocm-5.3.0")
setenv("DEVICE_LIB_PATH","/opt/rocm-5.3.0/amdgcn/bitcode")
setenv("HIP_DEVICE_LIB_PATH","/opt/rocm-5.3.0/amdgcn/bitcode")
setenv("HIP_PATH","/opt/rocm-5.3.0")
setenv("LLVM_PATH","/opt/rocm-5.3.0/llvm")
append_path{"HIPCC_COMPILE_FLAGS_APPEND","--rocm-path=/opt/rocm-5.3.0",delim=" "}
setenv("HCC_AMDGPU_TARGET","gfx90a")

See

for why this is happening.

Error message

Error message
No Error message, just a bad modulefile.

Information on your system

N/A for this issue. Can be discussed directly based on package source code.

Additional information

Maintainers @haampie @renjithravindrankannath @srekolam, CC @becker33 since we talked about this at CUG24.

Current behavior: the HIP package is setting all the variables traditionally defined in a rocm modulefile in the modulefile of all dependent packages. Kokkos+rocm is a very simple example of this. This poses possible issues and undefined behavior -- suppose a user loads rocm/5.7.1, then loads kokkos, which had been built with ROCm/5.3.0 and contains all those ROCm/5.3.0 environment variables in the modulefile. Now the environment has a mix of ROCm versions, and the user is actually getting ROCm/5.3.0 libraries loaded since Kokkos was loaded last.

Proposed behavior: when hip is enabled by +rocm, autoload/prereq the rocm module (ideally the specific version of rocm that is expected) instead of setting all the environment variables. This results in much cleaner modulefiles.

General information

  • I have run spack debug report and reported the version of Spack/Python/Platform
  • I have run spack maintainers <name-of-the-package> and @mentioned any maintainers
  • I have uploaded the build log and environment files
  • I have searched the issues of this repo and believe this is not a duplicate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant