Rename AMD GPU architectures #6266

Rombur · 2023-07-06T20:51:35Z

This is a proposal to rename the AMD architectures in Kokkos. Currently to enable an AMD GPU we need to use names like Kokkos_ARCH_VEGA90A, i.e. VEGA + the GFX flag or NAVI + the GFX flag. This works pretty well for the NAVI cards but it's terrible for the VEGA cards. The code name Vega corresponds to the MI50/60 cards. The code name for MI100 is Arcturus and the code name for the MI200 series is Aldebaran. Every year the Vega name becomes more and more obsolete, so we need a new name. At first, I was thinking about using MI50, MI100, etc. The problem is that an MI50 and an MI60 are identical from our point of view. There is a similar issue with the MI200 series. CMake also has support for AMD GPU and they simply use GFX90A. We could mimic what they are doing and use Kokkos_ARCH_GFX90A. I don't find it obvious that we are compiling for an AMD gpu. Instead, inspired by what we are doing for Intel, I propose that we use Kokkos_ARCH_AMD_GFX90A. This makes it clear that we are using an AMD GPU and the GFX flag matches what we are giving to the compiler. I also propose to drop our VEGA and NAVI macros that were rarely used and instead introduce an AMD_GPU macro similar to what we do for Intel. Users can keep using the current names but new GPU will only support the new naming conventions.

This PR is a draft until we agree on the new naming convention.

Still missing in the PR: setting Kokkos_ARCH_AMD_GFX90A should define both KOKKOS_ARCH_AMD_GFX90A and KOKKOS_ARCH_VEGA90A and vice versa.

cc: @arghdos

skyreflectedinmirrors

I agree with the change, and I think the arch name works well (and is more descriptive than what we currently have)

skyreflectedinmirrors · 2023-07-10T14:11:00Z

Makefile.kokkos

@@ -13,7 +13,7 @@ KOKKOS_DEVICES ?= "Threads"
 # NVIDIA:   Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,Volta70,Volta72,Turing75,Ampere80,Ampere86,Ada89,Hopper90
 # ARM:      ARMv80,ARMv81,ARMv8-ThunderX,ARMv8-TX2,A64FX
 # IBM:      BGQ,Power7,Power8,Power9
-# AMD-GPUS: Vega906,Vega908,Vega90A,Navi1030
+# AMD-GPUS: Gfx906,Gfx908,Gfx90A,Gfx1030


Any reason for these to not be fully capitalized? I guess the rest in the list here aren't, but AFAICT this is the only case where they're not.

You are right, I'll change this.

ldh4 · 2023-07-12T18:00:50Z

Will this change break any existing build scripts if a user had manually specified a kokkos arch (i.e. -DKokkos_ARCH_VEGA906) in their cmake scripts?

Rombur · 2023-07-12T18:03:36Z

No this will be backward compatible once I fix the export of the macros.

nmm0 · 2023-07-12T19:03:06Z

Per the meeting, this will need the docs updated. @dalg24 suggested to put a * next to the old names, and have code names/easily searchable values

masterleinad · 2023-07-14T15:46:57Z

Makefile.kokkos

+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX906), 1)
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GFX906")
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GPU")
  KOKKOS_INTERNAL_HIP_ARCH_FLAG := --offload-arch=gfx906
 endif
-ifeq ($(KOKKOS_INTERNAL_USE_ARCH_VEGA908), 1)
-  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_VEGA908")
-  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_VEGA")
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX908), 1)
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GFX908")
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GPU")
  KOKKOS_INTERNAL_HIP_ARCH_FLAG := --offload-arch=gfx908
 endif
-ifeq ($(KOKKOS_INTERNAL_USE_ARCH_VEGA90A), 1)
-  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_VEGA90A")
-  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_VEGA")
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX90A), 1)
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GFX90A")
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GPU")
  KOKKOS_INTERNAL_HIP_ARCH_FLAG := --offload-arch=gfx90a
 endif
-ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NAVI1030), 1)
-  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_NAVI1030")
-  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_NAVI")
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1030), 1)
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GFX1030")
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GPU")
  KOKKOS_INTERNAL_HIP_ARCH_FLAG := --offload-arch=gfx1030
 endif
-ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NAVI1100), 1)
-  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_NAVI1100")
-  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_NAVI")
+ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AMD_GFX1100), 1)
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GFX1100")
+  tmp := $(call kokkos_append_header,"$H""define KOKKOS_ARCH_AMD_GPU")
  KOKKOS_INTERNAL_HIP_ARCH_FLAG := --offload-arch=gfx1100
 endif


This means we don't set the old macros anymore when using Makefiles but we do for CMake? Shouldn't we be consistent?

One more reason to switch to cmake

dalg24

Do you plan any replacement for the VEGA and NAVI macros (regroup family of similar architectures)

dalg24 · 2023-07-14T20:54:28Z

cmake/KokkosCore_config.h.in

+#cmakedefine KOKKOS_ARCH_AMD_GFX90A
+#cmakedefine KOKKOS_ARCH_AMD_GFX1030
+#cmakedefine KOKKOS_ARCH_AMD_GFX1100
+#cmakedefine KOKKOS_ARCH_AMD_GPU


The NVIDIA equivalent macro is an impl only (KOKKOS_IMPL_ARCH_NVIDIA_GPU)
while we have KOKKOS_ARCH_INTEL_GPU.
Please confirm you don't want it to be KOKKOS_IMPL_ARCH_AMD_GPU and if so explain why in a few words.

not sure that there is a good consistent value to be set for AMD which actually implies growing capability?

Please confirm you don't want it to be KOKKOS_IMPL_ARCH_AMD_GPU and if so explain why in a few words.

My idea was to have a macro for user to do sanity check or to use in their code if they want to do something special on AMD GPU.

not sure that there is a good consistent value to be set for AMD which actually implies growing capability?

No, I don't think so.

The gfxarch ID does at least indicate a card's capabilities as according to the compiler: https://llvm.org/docs/AMDGPUUsage.html#processors, though I agree that it won't necessarily align to 'growing capabilities'.

Edit: note that most of these are things a user code does not have to care about, with the large exception of Wave32 vs Wave64

dalg24 · 2023-07-14T21:03:14Z

cmake/KokkosCore_config.h.in

Maybe add // deprecated on the old macros

Rombur · 2023-07-17T00:00:27Z

Do you plan any replacement for the VEGA and NAVI macros ?

No I don't. Right now, the difference between VEGA and NAVI is that VEGA is using a wave front of 64 and NAVI use 32. However on some NAVI architectures, you can switch the wavefront to use 64. The current division VEGA/NAVI does not make sense and I don't feel confident introducing a new 32/64 wavefront macro, since we don't have any NAVI hardware to test the different configurations.

skyreflectedinmirrors · 2023-07-17T14:33:04Z

Do you plan any replacement for the VEGA and NAVI macros ?

No I don't. Right now, the difference between VEGA and NAVI is that VEGA is using a wave front of 64 and NAVI use 32. However on some NAVI architectures, you can switch the wavefront to use 64. The current division VEGA/NAVI does not make sense and I don't feel confident introducing a new 32/64 wavefront macro, since we don't have any NAVI hardware to test the different configurations.

I'm also a bit dubious of Wave64 support on Navi. Yes, it's supposed to work, but...

dalg24 · 2023-07-19T12:54:11Z

Makefile.kokkos

@stanmoore1 make sure you see this

We might need to be more cautious with that change.
Although I tend to agree with Bruno's sentiment https://github.com/kokkos/kokkos/pull/6266/files#r1263934282 but we need to protect downstream users.

Thanks for the note. Maintaining backwards compatibility for Makefiles would be nice, or at least put it into a deprecation cycle.

Thanks for the note. Maintaining backwards compatibility for Makefiles would be nice, or at least put it into a deprecation cycle.

The HIP old architecture names or the whole makefiles shenanigans?

Clarification, what changes is not the configuration keyword, these we will continue to honor.
In the current form of the PR we would not define the KOKKOS_ARCH_VEGA* macros which Bruno actually looked for in LAMMPS and they are not used.

Sorry about the confusion.

dalg24 · 2023-07-20T20:16:55Z

Makefile.kokkos

-KOKKOS_INTERNAL_USE_ARCH_VEGA908 := $(call kokkos_has_string,$(KOKKOS_ARCH),Vega908)
-KOKKOS_INTERNAL_USE_ARCH_VEGA90A := $(call kokkos_has_string,$(KOKKOS_ARCH),Vega90A)
-KOKKOS_INTERNAL_USE_ARCH_NAVI1030 := $(call kokkos_has_string,$(KOKKOS_ARCH),Navi1030)
+KOKKOS_INTERNAL_USE_ARCH_AMD_GFX906 := $(or $(call kokkos_has_string,$(KOKKOS_ARCH),VEGA906),$(call kokkos_has_string,$(KOKKOS_ARCH),AMD_GFX906))


Did you mean to change the case? Vega906 -> VEGA906

This needs fixing though.

I can change it back but it makes no difference because kokkos_has_string converts the lower case to upper case.

Rombur · 2023-07-24T19:48:18Z

There is a problem with the architecture auto-detection. I am working on fixing the issue.

Rombur · 2023-07-24T20:26:58Z

It should be good now. I just had to move function to a different place. I also updated the OpenACC and OpenMPTarget backends to use the new macros.

Rombur · 2023-08-02T15:16:05Z

Can we get this merged? I think @arghdos is waiting on this to be merged before creating a PR for MI300

skyreflectedinmirrors approved these changes Jul 10, 2023

View reviewed changes

skyreflectedinmirrors reviewed Jul 10, 2023

View reviewed changes

Rombur force-pushed the renaming_vega branch 3 times, most recently from 3fa8a0b to ce1d065 Compare July 14, 2023 15:39

masterleinad reviewed Jul 14, 2023

View reviewed changes

Rombur mentioned this pull request Jul 14, 2023

Add documentation for new AMD GPU naming convention kokkos/kokkos-core-wiki#438

Merged

dalg24 reviewed Jul 14, 2023

View reviewed changes

dalg24 mentioned this pull request Jul 15, 2023

OpenACC CMakechange Clacc #6250

Merged

dalg24 reviewed Jul 19, 2023

View reviewed changes

dalg24 reviewed Jul 20, 2023

View reviewed changes

dalg24 approved these changes Jul 22, 2023

View reviewed changes

Rombur added 2 commits July 24, 2023 16:23

Rename AMD GPU architectures

e421c33

Use new AMD gpus macros in OpenACC and OpenMPTarget

c17e904

Rombur force-pushed the renaming_vega branch from 6f984e2 to c17e904 Compare July 24, 2023 20:24

masterleinad approved these changes Aug 9, 2023

View reviewed changes

crtrott merged commit 7e91f11 into kokkos:develop Aug 9, 2023
26 of 28 checks passed

Rombur mentioned this pull request Jul 21, 2023

CHANGELOG: 4.2.0 #6197

Closed

Rombur deleted the renaming_vega branch September 7, 2023 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename AMD GPU architectures #6266

Rename AMD GPU architectures #6266

Rombur commented Jul 6, 2023

skyreflectedinmirrors left a comment

skyreflectedinmirrors Jul 10, 2023

Rombur Jul 10, 2023

ldh4 commented Jul 12, 2023

Rombur commented Jul 12, 2023

nmm0 commented Jul 12, 2023

masterleinad Jul 14, 2023

Rombur Jul 14, 2023

dalg24 left a comment

dalg24 Jul 14, 2023

crtrott Jul 14, 2023

Rombur Jul 17, 2023

skyreflectedinmirrors Jul 17, 2023 •

edited

dalg24 Jul 14, 2023

Rombur commented Jul 17, 2023

skyreflectedinmirrors commented Jul 17, 2023

dalg24 Jul 19, 2023

stanmoore1 Jul 20, 2023

dalg24 Jul 20, 2023

dalg24 Jul 20, 2023

dalg24 Jul 20, 2023

dalg24 Jul 20, 2023

Rombur Jul 20, 2023

Rombur commented Jul 24, 2023

Rombur commented Jul 24, 2023

Rombur commented Aug 2, 2023

Rename AMD GPU architectures #6266

Rename AMD GPU architectures #6266

Conversation

Rombur commented Jul 6, 2023

skyreflectedinmirrors left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ldh4 commented Jul 12, 2023

Rombur commented Jul 12, 2023

nmm0 commented Jul 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalg24 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skyreflectedinmirrors Jul 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rombur commented Jul 17, 2023

skyreflectedinmirrors commented Jul 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rombur commented Jul 24, 2023

Rombur commented Jul 24, 2023

Rombur commented Aug 2, 2023

skyreflectedinmirrors Jul 17, 2023 •

edited