In order to get Nvtx working, the nvhpc/21.9
module must be available and loaded. In the mkmf.template
files, we have been using an additional variable, ACCFLAGS
to set options for the nvfortran
compiler.
To run the DART get_close
kernel, these are the additional compiler flags:
ACCFLAGS = -acc -ta=tesla:cc70,deepcopy,pinned -Minfo=accel -Mnofma -r8
deepcopy
is of particular concern for us. DART has a lot of nested derived types:type%type%type
. The compiler was not reliably able to determine that the nested types needed copying to the GPU. The deepcopy flag forces this, but ideally you would not force a deep copy on everything. Improvements to the compiler would be needed to fix this. There is a workaround for forcing the correct copy in the code, which is adding a loop around the openACC directives. However, this is not good for code readability as it looks like a pointless loop.Mnofma
was to force less optimization while debugging.r8
was to force double precision type conversions. This was a sanity check while debugging memory problems. It was not needed in the end.Minfo=accel
prints out at compile time what the compiler was able to parallelize. It is similar to the old intel-vec-report
flag.cc70
is the compute capability, so this depends on the graphics card. Ascent (Oak Ridge's machine) and Casper are V100 gpus so you usecc70
. Perlmutter is A100 (same as Derecho) so you usecc80
. This is not intuitive at all for users.
ACCFLAGS = -acc -ta=tesla:cc80,deepcopy
(3x)
ACCFLAGS = -acc -ta=tesla:cc80,deepcopy,pinned
(15x)