-
Notifications
You must be signed in to change notification settings - Fork 26
Building KHARMA
First, be sure to check out all of KHARMA's dependencies by running
$ git submodule update --init --recursiveThis will grab KHARMA's two main dependencies (as well as some incidental things):
-
The Parthenon AMR framework from LANL (accompanying documentation). Note KHARMA actually uses a fork of Parthenon, see here.
-
The Kokkos performance-portability library, originally from SNL. If they are not present here, many common questions and problems can be answered by the Kokkos wiki and tutorials. Parthenon includes a list of the Parthenon-specific wrappers for Kokkos functions in their developer guide.
The dependencies KHARMA needs from the system are ~exactly the same as Parthenon and Kokkos:
- A C++14 compliant compiler with OpenMP (tested on several of GCC >= 8, Intel >= 19, nvc++ (formerly PGI) >= 21.9, clang++ and derivatives >= 12)
- MPI of some sort
- Parallel HDF5 compiled against said MPI
And optionally
- CUDA >= 10.2 and a CUDA-supported C++ compiler
OR
- Intel oneAPI < 21.4 and a compatible OpenCL compiler for Intel GPUs
All of these should come either as distribution packages for local Linux systems, or as modules on HPC systems, with the exception of parallel HDF5. Luckily it is quite easy to compile manually (a script is planned to be included with KHARMA soon). Once compiled, the installation location can be specified with the PREFIX_PATH variable when building KHARMA, as described below.
KHARMA uses cmake for building, and has a small set of bash scripts to handle loading the correct modules and giving the correct arguments to cmake on specific systems. Contributions with additional machine-specific code are welcome, see the examples in machines/.
Generally, on systems with a parallel HDF5 module, one can then run the following to compile for CPU with OpenMP:
./make.sh cleanYou may be able to use the following to compile for GPU with CUDA:
./make.sh clean cudacmake will check default directories for each dependency, e.g. /usr/local, and provide detailed error messages about which libraries it is missing. In many cases, you will only need to add the path to your parallel HDF5:
PREFIX_PATH=/absolute/path/to/phdf5 HOST_ARCH=CPUVER ./make.sh cleanIf you need to specify multiple custom-installed dependencies (e.g. CUDA), you can set PREFIX_PATH="/path/to/one;/path/to/two". PREFIX_PATH does not support spaces in paths, because shell escapes are hard.
Several notes:
- After compiling once successfully, you do not need to specify
cleanany longer, unless you change the compiler or the location of dependencies. Invocations withoutcleanwill recompile only the files you've changed in KHARMA. - Since many
condaenvironments include a serial version of HDF5, having acondaenvironment loaded can preventcmakecorrectly finding the parallel version. Unload your conda environments before compiling code! - To avoid adding the prefix variables during every compile, create a file
machines/hostname.shin the style of other files in that directory, setting any necessary environment variables and/or loading any necessary modules.
There are two additional useful arguments to make.sh (in addition to clean and cuda described above).
-
debugwill enable theDEBUGflag in the code, and more importantly enable bounds-checking in all Kokkos arrays. Useful for very weird undesired behavior and segfaults. Note, however, that most KHARMA checks, prints, and debugging output are actually enabled at runtime, under the<debug>section of the input deck. -
tracewill print each part of a step tostderras it is being run (technically, anywhere with aFLAG()statement in the code). This is useful for pinning down where segfaults are occurring, without manually bisecting the whole code with print statements.
The build script make.sh tries to guess an architecture when compiling, defaulting to code which will be reasonably fast on modern machines. However, you can manually specify a host and/or device architecture. For example, when compiling for CUDA:
PREFIX_PATH=/absolute/path/to/phdf5 HOST_ARCH=CPUVER DEVICE_ARCH=GPUVER ./make.sh clean cudaWhere CPUVER and GPUVER are the strings used by Kokkos to denote a particular architecture & set of compile flags, e.g. "SKX" for Skylake-X, "HSW" for Haswell, or "AMDAVX" for Ryzen/EPYC processors, and VOLTA70, TURING75, or AMPERE80 for Nvidia GPUs. A list of a few common architecture strings is provided in make.sh, and a full (usually) up-to-date list is kept in the Kokkos documentation. (Note make.sh needs only the portion of the flag after Kokkos_ARCH_).
KHARMA (or, especially Kokkos and Parthenon) push C++14 to its limits, out where some compiler issues or new & untested backend incompatibilities can be exposed. Here's an incomplete list of known bad combinations:
- GCC version 7.3.0 exactly has a bug making it incapable of compiling a particular Parthenon function. It is for unfathomable reasons very widely deployed as the specific compiler on machines.
- NVHPC toolkit versions 21.1 through 21.7 with the nvc/nvc++ compiler do not compile Parthenon well. nvc++ works again in v. 21.9+, which are much more commonly deployed as of even 2/2022.
- Intel SYCL backend, any version before 2022. SYCL is a moving target in general, YMMV
Error looks like
kharma/external/parthenon/src/../../variant/include/mpark/variant.hpp(1613): error: parameter pack "Ts" was referenced but not expanded
This is because of a bug detailed here but fixed only for the Intel compiler. You can monkey-patch a similar change by making the following edit to external/variant/include/mpark/config.hpp:
-#if __has_builtin(__type_pack_element) && !(defined(__ICC))
+//#if __has_builtin(__type_pack_element) && !(defined(__ICC))
+#if 0
#define MPARK_TYPE_PACK_ELEMENT
#endifThis is due to throwing an error, which SYCL disallows in device code. Removing the throw statement restores the compile, diff soon^TM