[WIP] GPU Abstraction to also target HIP in cudacpp impementation & Profiling infrastructure #718

Jooorgen · 2023-06-21T17:31:50Z

Current work from the GPU abstraction I have been working on. Also includes the profiling infrastructure, but bear in mind the SYCL portion is not working as intended currently. Need some documentation from Nathan on how to compile it first.

Just ran it successfully on LUMI-G with MI250x GPUs. On a sidenote I seem to have an issue running on the GPUs we got from LHCb, but on LUMI it apperantly runs fine.

LHCb development GPUs (MI250x):

[jteig@n4051701 P1_epem_mupmum]$ ./gcheck.exe 32 32 10

ERROR! assertGpu: 'no ROCm-capable device is detected' (100) in ./GpuRuntime.h:62

gcheck.exe: ./GpuRuntime.h:21: void assertGpu(hipError_t, const char *, int, bool): Assertion `code == gpuSuccess' failed.

Aborted (core dumped)

VS

LUMI-G GPUs (MI250x):

teigjorg@nid005027:~/madgraph4gpu/epochX/cudacpp/ee_mumu.mad/SubProcesses/P1_epem_mupmum> ./gcheck.exe 32 32 10

..........

teigjorg@nid005027:~/madgraph4gpu/epochX/cudacpp/ee_mumu.mad/SubProcesses/P1_epem_mupmum>

…el launches

…tion

…to gpu_abstraction

CI checks

…UFLAGS to GPUFLAGS in cudacpp_src.mk

…ased on gg_ttgg.mad

[CODEGEN] Regenerate all 5 .sa/.mad processes based on gg_ttgg.mad

…p_src.mk

…xpected

…rc.mk and cudacpp.mk

…in code

Removed all warnings in HIP compilation

I am working now in branch valassi/jthip24. This is my first commit over joorgen/gpu_abstraction branch (also known as valassi/jthip), as of commit 229ffeb (Tue Aug 15 11:33:01 2023 +0200). This branch contains fewer features than jooorgen/master (PR madgraph5#718), but it is more advanced than joorgen/gpu_abstraction_only (PR madgraph5#774). I will probably need some of the commits here to fix PR madgraph5#774 in branch valassi/jt774. I regenerate the ten processes as follows: ./CODEGEN/generateAndCompare.sh ee_mumu --mad ./CODEGEN/generateAndCompare.sh gg_tt --mad ./CODEGEN/generateAndCompare.sh gg_ttg --mad ./CODEGEN/generateAndCompare.sh gg_ttgg --mad ./CODEGEN/generateAndCompare.sh gg_ttggg --mad ./CODEGEN/generateAndCompare.sh ee_mumu ./CODEGEN/generateAndCompare.sh gg_tt ./CODEGEN/generateAndCompare.sh gg_ttg ./CODEGEN/generateAndCompare.sh gg_ttgg ./CODEGEN/generateAndCompare.sh gg_ttggg These are the five processes that would get conflicts when I merge upstream/master here, so I guess that these are the only ten processes touched in this branch. The fact that I can regenerate them and there are no real differences (except for irrelevant stuff like me5_configuration.txt, aloha_file.inc, py3_model.pkl) shows that ALL IMPORTANT CHANGES BY JORGEN HERE ARE IN THE CODEGEN. I can therefore merge upstream/master, fix conflicts in CODEGEN and regenerate.

Jooorgen and others added 30 commits March 31, 2023 17:11

Removed status badges because they eventually have to get added again

7eb8da7

Merge branch 'madgraph5:master' into master

27cb7d4

Added script for starting a container running the profiling

8946a0c

Merge branch 'master' of https://github.com/Jooorgen/madgraph4gpu

abe5064

Reverted changes to the sycl directory

c5d0fb7

Merge branch 'madgraph5:master' into master

f531206

Remove CVMFS from profiler workflows

cc39fe8

Merge branch 'master' of https://github.com/Jooorgen/madgraph4gpu

ac6112f

Removed CVMFS from CXX variable

d98e8f6

Changed GCC version in CUDA a100 Profiler to whats in container

6848061

Testing abstraction of CUDA function to seperate header file

5d87a1e

Fleshed out HIP macros and added missing macros in code

9608778

Added new GpuRuntime to replace CudaRuntime and added macros for kern…

e4dc25e

…el launches

Added macro for __CUDACC_

39836e0

Changed name of gpu_abstraction to fall inline with naming scheme

98d02bb

Changed name of GPUCC macro to MGONGPUCPP_GPUIMPL also did some cleanup

ecee14d

Added GPU abstraction in src directory as well

78a8119

Merge branch 'madgraph5:master' into master

4f98bcd

Added some WIP changes to compile with HIP

098219a

Merge branch 'master' into gpu_abstraction

9ecf523

Dont know what happened here

11e392d

Cleanup of sync with master

e8779eb

More cleanup from sync with master

e5f1070

Added first round of fixes from sync with master

a7da6ef

Added back include for abstraction in mgOnGpuConfig.h

d69762d

Added some fixes

5459bbc

Made a change to tthe cudaccpp.mk file

abbc9af

Made small fix to makefile

58174f6

Added compilation for HIP in makefile

a46b3f9

Removed typo

ce8a20c

Jooorgen added 28 commits August 9, 2023 14:17

[CODEGEN] Regenerated all .sa and .mad processes with new HIP compila…

e89ae42

…tion

Reverted changes to FC env variable

b79fe00

Merge branch 'gpu_abstraction' of github.com:Jooorgen/madgraph4gpu in…

fafcb95

…to gpu_abstraction

Merge pull request #9 from Jooorgen/gpu_abstraction

9bb4d3f

CI checks

Changed position of exporting GPUCC and GPUFLAGS in cudacpp.mk

3b8ce7e

Added correct HIP_PLATFORM when compiling for HIP in cudacpp_src.mk

609d548

Moved "-c -x cu" behind an ifeq nvcc

e1bb745

Changed ifdef back to __CUDACC__ in mgOnGpuCxtypes.h

5097e9d

Revert changes from last commit because it is handled elsewhere inn code

762122b

Added back -c to HIP compilation in src mkfile

ab3a60b

Fix for compilation error with std::complex using cxsmpl

e3e11dc

Export HIPARCHFLAGS and set AMD ARCH in cudacpp_src.mk, also change C…

87031ad

…UFLAGS to GPUFLAGS in cudacpp_src.mk

[CODEGEN] Added changes from gg_ttgg.mad to code generator

e859159

[CODEGEN] Added export of GPUCC and GPUFLAGS to codegen

2920ff2

[CODEGEN] Regenerate all 5 .sa/.mad processes (ee_mumu -> gg_ttggg) b…

be4bf04

…ased on gg_ttgg.mad

Merge pull request #10 from Jooorgen/gpu_abstraction

ac8a2e8

[CODEGEN] Regenerate all 5 .sa/.mad processes based on gg_ttgg.mad

Fixed warning and changed HIPARCHFLAGS export so it exports to cudacp…

4defb73

…p_src.mk

Added -std=c++17 to GPUFLAGS in cudacpp_src.mk

088b329

Revert changes to cudacpp_src.mk, exporting GPUFLAGS now working as e…

1899fe3

…xpected

[CODEGEN] Fixed error in runTest.cc and reverted changes in cudacpp_s…

d38ba00

…rc.mk and cudacpp.mk

Added changes from CODEGEN into gg_ttgg.mad

62b3e36

[CODEGEN] Regeneratead all 5 .sa/.mad processes ro remove all errors …

efef15d

…in code

Merge pull request #11 from Jooorgen/gpu_abstraction

dbe3240

Removed all warnings in HIP compilation

Merge remote-tracking branch 'upstream/master'

8949571

Added warnings if name prefix variable is not set

d889085

Merge remote-tracking branch 'upstream/master'

3821d30

Merge latest remote-tracking branch 'upstream/master'

a30343c

Merge branch 'madgraph5:master' into master

2f97348

valassi mentioned this pull request Jan 25, 2024

NOT TO BE MERGED - ("jthip24") Jorgen's earlier work on HIP before PR #774 and my fixes on that #800

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] GPU Abstraction to also target HIP in cudacpp impementation & Profiling infrastructure #718

[WIP] GPU Abstraction to also target HIP in cudacpp impementation & Profiling infrastructure #718

Jooorgen commented Jun 21, 2023

[WIP] GPU Abstraction to also target HIP in cudacpp impementation & Profiling infrastructure #718

Are you sure you want to change the base?

[WIP] GPU Abstraction to also target HIP in cudacpp impementation & Profiling infrastructure #718

Conversation

Jooorgen commented Jun 21, 2023