AMDGPU: Add description for new atomicrmw metadata #85052

arsenm · 2024-03-13T09:42:53Z

Add a spec for yet-to-be-implemented metadata to allow the backend to
fully handle atomicrmw lowering. This is the base of an alternative
to #69229, which inverts the direction to be correct by default, and
extends to cover the peer device case.

Could use a better name

I couldn't figure out how to nicely embed a table within a table column. Copy the formatting that LangRef uses for metadata, and introduce a metadata section with subsections for each item. Also fix using subsection markers in place of section markers to avoid sphinx errors.

Add a spec for yet-to-be-implemented metadata to allow the backend to fully handle atomicrmw lowering. This is the base of an alternative to llvm#69229, which inverts the direction to be correct by default, and extends to cover the peer device case. Could use a better name

llvmbot · 2024-03-13T09:43:14Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Add a spec for yet-to-be-implemented metadata to allow the backend to
fully handle atomicrmw lowering. This is the base of an alternative
to #69229, which inverts the direction to be correct by default, and
extends to cover the peer device case.

Could use a better name

Full diff: https://github.com/llvm/llvm-project/pull/85052.diff

2 Files Affected:

(modified) llvm/docs/AMDGPUUsage.rst (+61-12)
(modified) llvm/docs/ReleaseNotes.rst (+2)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index fd9ad7fac19a95..a6556bbd1752b9 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1312,24 +1312,73 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
 
    List AMDGPU intrinsics.
 
+.. _amdgpu_metadata:
+
 LLVM IR Metadata
-------------------
+================
+
+The AMDGPU backend implements the following target custom LLVM IR
+metadata.
+
+.. _amdgpu_last_use:
+
+'``amdgpu.last.use``' Metadata
+------------------------------
+
+Sets TH_LOAD_LU temporal hint on load instructions that support it.
+Takes priority over nontemporal hint (TH_LOAD_NT). This takes no
+arguments.
+
+.. code-block:: llvm
+
+  %val = load i32, ptr %in, align 4, !amdgpu.last.use !{}
 
-The AMDGPU backend implements the following LLVM IR metadata.
+.. _amdgpu_no_access_location_types:
 
-.. list-table:: AMDGPU LLVM IR Metatdata
-  :name: amdgpu-llvm-ir-metadata-table
+'``amdgpu.no.access.location.types``' Metadata
+----------------------------------------------
 
-  * - Metadata Name
+Asserts a memory access does not access bytes residing in certain
+allocation kinds. This is intended for use with :ref:`atomicrmw
+<i_atomicrmw>` and other atomic instructions. This is required to emit
+a native hardware instruction for some :ref:`system scope
+<amdgpu-memory-scopes>` atomic operations on some subtargets. An
+:ref:`atomicrmw <i_atomicrmw>` without metadata will be treated
+conservatively as required to preserve the operation behavior in all
+cases.
+
+If the memory operation does access an address in an indicated region,
+any stored values and any returned results are :ref:`poison
+<poisonvalues>`. This has a single integer argument, interpreted as a
+bitfield. A 0 value is equivalent to removing the metadata.
+
+.. list-table::
+
+  * - Bit
     - Description
-    - Values
-  * - !amdgpu.last.use
-    - Sets TH_LOAD_LU temporal hint on load instructions that support it.
-      Takes priority over nontemporal hint (TH_LOAD_NT).
-    - {}
+  * - 0
+    - Not in fine-grained host memory.
+  * - 1
+    - Not in a remote connected peer device (address must be device local)
+
+.. code-block:: llvm
+
+  ; Indicates the access does not access fine-grained memory, or
+  ; remote device memory.
+  %old0 = atomicrmw sub ptr %ptr0, i32 1 acquire, !amdgpu.no.access.location.types !0
+
+  ; Indicates the access does not access fine-grained memory.
+  %old1 = atomicrmw sub ptr %ptr1, i32 1 acquire, !amdgpu.no.access.location.types !1
+
+  ; Indicates the access does not access peer device memory.
+  %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.access.location.types !2
+
+  !0 = !{i32 3}
+  !1 = !{i32 1}
+  !2 = !{i32 2}
 
 LLVM IR Attributes
-------------------
+==================
 
 The AMDGPU backend supports the following LLVM IR attributes.
 
@@ -1451,7 +1500,7 @@ The AMDGPU backend supports the following LLVM IR attributes.
      ======================================= ==========================================================
 
 Calling Conventions
--------------------
+===================
 
 The AMDGPU backend supports the following calling conventions:
 
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index b34a5f31c5eb0a..95ebbb74fbbd7f 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -71,6 +71,8 @@ Changes to the AMDGPU Backend
 -----------------------------
 
 * Implemented the ``llvm.get.fpenv`` and ``llvm.set.fpenv`` intrinsics.
+* Added ``!amdgpu.no.access.location.types`` metadata to control
+  atomic behavior.
 
 Changes to the ARM Backend
 --------------------------

yxsamliu · 2024-03-13T14:39:32Z

llvm/docs/AMDGPUUsage.rst

-      Takes priority over nontemporal hint (TH_LOAD_NT).
-    - {}
+  * - 0
+    - Not in fine-grained host memory.


I feel using a bit that is negatively defined may cause confusing, especially when we try to turn it on/off in clang by pragmas.

#pragma clang atomic begin non_fine_grained(off) non_remote(off) #pragma clang atomic end

I am wondering whether it is better to have positive bits:

- 0 - Maybe in fine-grained host memory - 1 - Maybe in a remote connected peer device

Different bits are or-ed.

Then in clang

#pragma clang atomic begin maybe-fine-grained(off) maybe-remote(off) #pragma clang atomic end

Negatively expressing it is the less confusing direction. Every attribute specified positively has found a way to be a problem at some point

pasaulais · 2024-03-14T16:36:46Z

llvm/docs/AMDGPUUsage.rst


-.. list-table:: AMDGPU LLVM IR Metatdata
-  :name: amdgpu-llvm-ir-metadata-table
+'``amdgpu.no.access.location.types``' Metadata


To make the name shorter, what about amdgpu.mem.constraint or amdgpu.access.constraint? Then the names of each constraint could be something like no_fine_grained_access and no_remote_access or similar.

pasaulais · 2024-03-14T16:49:19Z

llvm/docs/AMDGPUUsage.rst

+  ; Indicates the access does not access peer device memory.
+  %old2 = atomicrmw sub ptr %ptr2, i32 1 acquire, !amdgpu.no.access.location.types !2
+
+  !0 = !{i32 3}


For better clarity, I think it would help to have comments translating the bits to the actual values in documentation examples and lit tests. For example:

!0 = !{i32 3} ; no_fine_grained_access | no_remote_access

In general I think it would be more clear to use strings in the MD nodes (I am thinking of someone looking at the IR produced by the compiler and not knowing what the numbers mean without looking them up), but I am aware that would take away the simplicity of the current design.

IR readability isn't really a priority, but if we're going to split out this into named pieces I think it's better to just use separate top level metadata, so the latest revision does that

Alternatively, ignore.fpenv

… emit an instruction

yxsamliu · 2024-04-19T16:36:02Z

LGTM

llvm/docs/ReleaseNotes.rst

arsenm · 2024-04-19T20:27:15Z

Wondering if both pieces should end in "memory" or "memory.access" for consistency

yxsamliu · 2024-04-19T20:41:53Z

Wondering if both pieces should end in "memory" or "memory.access" for consistency

I think "access" is unnecessary.

b-sumner · 2024-04-19T20:45:55Z

Wondering if both pieces should end in "memory" or "memory.access" for consistency

I think "access" is unnecessary.

I tend to agree.

Add baseline tests which should comprehensively test the new atomic metadata. Test codegen / expansion, and preservation in a few transforms. New metadata defined in #85052

Add baseline tests which should comprehensively test the new atomic metadata. Test codegen / expansion, and preservation in a few transforms. New metadata defined in llvm#85052

AMDGPU reflect pass is needed to choose between safe and unsafe atomics at the libclc level. In the long run we will delete this patch as work is being done to ensure correct lowering of atomic instructions. See patches: llvm/llvm-project#85052 llvm/llvm-project#69229 This work is necessary as malloc shared atomics rely on PCIe atomics which can have patchy and unreliable support. We want to therefore be able to choose at compile time whether we should use safe atomics using CAS (which PCIe should support), or if we want to rely of the availability of the newest PCIe atomics, if malloc shared atomics are desired. Also changes the implementation of Or, And so that they can choose between the safe or unsafe version based on the AMDGPU reflect value.

… AMDGPU atomics (#11467) AMDGPU reflect pass is needed to choose between safe and unsafe atomics at the libclc level. In the long run we will delete this patch as work is being done to ensure correct lowering of atomic instructions. See patches: llvm/llvm-project#85052 llvm/llvm-project#69229 This work is necessary as malloc shared atomics rely on PCIe atomics which can have patchy and unreliable support. Therefore, we want to be able to choose at compile time whether we should use safe atomics using CAS (which PCIe should support), or if we want to rely of the availability of the newest PCIe atomics, if malloc shared atomics are desired. Also changes the implementation of `atomic_or`, `atomic_and` so that they can choose between the safe or unsafe version based on the AMDGPU reflect value.

arsenm added 2 commits March 13, 2024 14:19

arsenm added the backend:AMDGPU label Mar 13, 2024

arsenm requested review from yxsamliu, pasaulais and AlexVlx March 13, 2024 09:42

yxsamliu reviewed Mar 13, 2024

View reviewed changes

yxsamliu requested a review from b-sumner March 13, 2024 14:41

pasaulais requested changes Mar 14, 2024

View reviewed changes

pasaulais mentioned this pull request Mar 14, 2024

[AMDGPU] Add an option to disable unsafe uses of atomic xor #69229

Open

hdelan mentioned this pull request Apr 1, 2024

[SYCL][HIP] Add AMDGPU reflect pass to choose between safe and unsafe AMDGPU atomics intel/llvm#11467

Merged

arsenm added 3 commits April 15, 2024 12:31

Merge branch 'main' into amdgpu-atomic-metadata

852d895

Add comments to metadata examples

6155303

Split into separate metadata components

4c5c29f

arsenm changed the title ~~AMDGPU: Add description for amdgpu.no.access.location.types metadata~~ AMDGPU: Add description for new atomicrmw metadata Apr 15, 2024

arsenm added 5 commits April 16, 2024 13:34

Fix metadata reference links

b8f471c

Define denormal mode atomic metadata

2ee7ee6

Alternatively, ignore.fpenv

Drop host part of no fined grained metadata

9a15f6f

Add note that amdgpu.no.remote.memory.access is usually sufficient to…

ec02ead

… emit an instruction

Rename

79db05f

arsenm added 5 commits April 18, 2024 16:45

Reorder documentation sections

0678218

Clarify no remote implies host or peer device

e6cb77f

Consistently spell fine-grained

e359880

Another note about amdgpu.ignore.denormal.mode

f46a8aa

Merge branch 'main' into amdgpu-atomic-metadata

47c30e1

arsenm mentioned this pull request Apr 18, 2024

AMDGPU: Add tests for atomicrmw handling of new metadata #89248

Merged

arsenm requested review from pasaulais and yxsamliu April 19, 2024 13:53

b-sumner reviewed Apr 19, 2024

View reviewed changes

llvm/docs/ReleaseNotes.rst Outdated Show resolved Hide resolved

Fix release notes

0516ba2

arsenm added 2 commits April 22, 2024 11:35

Merge branch 'main' into amdgpu-atomic-metadata

1c949ff

Remove access from metadata name

5e68ec4

arsenm mentioned this pull request Apr 22, 2024

AMDGPU: Add amdgpu.no.remote.memory when upgrading old atomic intrinsics #89655

Open

arsenm mentioned this pull request Apr 25, 2024

[Issue]: enabling amdgpu-unsafe-fp-atomics for gfx90a ROCm/llvm-project#47

Open

arsenm mentioned this pull request May 2, 2024

AtomicExpand: Preserve metadata when expanding partword RMW #89769

Merged

arsenm added 5 commits May 3, 2024 11:18

Merge branch 'main' into amdgpu-atomic-metadata

8c5372d

Merge branch 'main' into amdgpu-atomic-metadata

e3e9985

Merge branch 'main' into amdgpu-atomic-metadata

3fe49d5

Merge branch 'main' into amdgpu-atomic-metadata

7e66bd2

Merge branch 'main' into amdgpu-atomic-metadata

6a286a3

arsenm mentioned this pull request Jun 6, 2024

[ROCDL] Add the global.atomic.fadd intrinsic in ROCDL #94486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDGPU: Add description for new atomicrmw metadata #85052

AMDGPU: Add description for new atomicrmw metadata #85052

arsenm commented Mar 13, 2024

llvmbot commented Mar 13, 2024

yxsamliu Mar 13, 2024

arsenm Mar 13, 2024

pasaulais Mar 14, 2024

pasaulais Mar 14, 2024

arsenm Apr 15, 2024

yxsamliu commented Apr 19, 2024

arsenm commented Apr 19, 2024

yxsamliu commented Apr 19, 2024

b-sumner commented Apr 19, 2024

AMDGPU: Add description for new atomicrmw metadata #85052

Are you sure you want to change the base?

AMDGPU: Add description for new atomicrmw metadata #85052

Conversation

arsenm commented Mar 13, 2024

llvmbot commented Mar 13, 2024

yxsamliu Mar 13, 2024

Choose a reason for hiding this comment

arsenm Mar 13, 2024

Choose a reason for hiding this comment

pasaulais Mar 14, 2024

Choose a reason for hiding this comment

pasaulais Mar 14, 2024

Choose a reason for hiding this comment

arsenm Apr 15, 2024

Choose a reason for hiding this comment

yxsamliu commented Apr 19, 2024

arsenm commented Apr 19, 2024

yxsamliu commented Apr 19, 2024

b-sumner commented Apr 19, 2024