Skip to content

Commit

Permalink
[mlir][AMDGPU] Use the correct values for OOB_SELECT on gfx10
Browse files Browse the repository at this point in the history
Differential Revision: https://reviews.llvm.org/D129320
  • Loading branch information
krzysz00 committed Jul 7, 2022
1 parent 516915b commit db59054
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 10 deletions.
16 changes: 10 additions & 6 deletions mlir/include/mlir/Dialect/AMDGPU/AMDGPU.td
Expand Up @@ -52,14 +52,17 @@ def AMDGPU_RawBufferLoadOp :
available on AMD GPUs, including extensions in newer GPUs.

The index into the buffer is computed as for `memref.load` with the additon
of `indexOffset` and `sgprOffset` (which is added after bounds checks and
includes any offset present on the memref type if it's non-zero).
of `indexOffset` and `sgprOffset` (which **may or may not** be considered
in bounds checks and includes any offset present on the memref type if it's
non-zero).

All indices and offsets are in units of the memref's data type and are
converted to bytes during lowering.

When a load is out of bounds, the instruction returns zero.
Vector instructions bounds check each component's address.
Partially-out of bounds have chipset-dependent behavior: whether reading
2 elements starting at index 7 of a `memref<8xf32>` returns the last element
in the first vector component depends on the architecture.

The memref struct is converted into a buffer resource (a V#) and the arguments
are translated to intrinsic arguments as follows:
Expand All @@ -71,7 +74,7 @@ def AMDGPU_RawBufferLoadOp :
- The offset enable bit is 1, the index enable bit is 0.
- The thread ID addition bit is off
- If `boundsCheck` is false and the target chipset is RDNA, OOB_SELECT is set
to 2 to disable bounds checks, otherwise it is 0
to 2 to disable bounds checks, otherwise it is 3
- The cache coherency bits are off
}];
let assemblyFormat = [{
Expand Down Expand Up @@ -108,8 +111,9 @@ def AMDGPU_RawBufferStoreOp :
All index components are in terms of the elements of the memref, not bytes,
and are scaled up appropriately.

Out of bounds stores are ignored in hardware, including the out of bounds
components of vector writes.
Out of bounds stores are ignored in hardware.
Wthether a vector write that includes some in-bounds and soeme out-of-bounds
components is partically completed is chipset-dependent.

See `amdgpu.raw_buffer_load` for a description of how the underlying
instruction is constructed.
Expand Down
6 changes: 3 additions & 3 deletions mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
Expand Up @@ -166,13 +166,13 @@ struct RawBufferOpLowering : public ConvertOpToLLVMPattern<GpuOp> {
// bit 24: Reserved to 1 (RDNA) or 0 (CDNA)
// bits 25-26: Reserved (0)
// bit 27: Buffer is non-volatile (CDNA only)
// bits 28-29: Out of bounds select (0 = structured, 1 = raw, 2 = none, 3 =
// swizzles) RDNA only
// bits 28-29: Out of bounds select (0 = structured, 1 = check index, 2 =
// none, 3 = either swizzles or testing against offset field) RDNA only
// bits 30-31: Type (must be 0)
uint32_t word3 = (7 << 12) | (4 << 15);
if (chipset.majorVersion == 10) {
word3 |= (1 << 24);
uint32_t oob = adaptor.getBoundsCheck() ? 1 : 2;
uint32_t oob = adaptor.getBoundsCheck() ? 3 : 2;
word3 |= (oob << 28);
}
Value word3Const = createI32Constant(rewriter, loc, word3);
Expand Down
2 changes: 1 addition & 1 deletion mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
Expand Up @@ -6,7 +6,7 @@ func.func @gpu_gcn_raw_buffer_load_i32(%buf: memref<64xi32>, %idx: i32) -> i32 {
// CHECK: %[[numRecords:.*]] = llvm.mlir.constant(256 : i32)
// CHECK: llvm.insertelement{{.*}}%[[numRecords]]
// CHECK: %[[word3:.*]] = llvm.mlir.constant(159744 : i32)
// RDNA: %[[word3:.*]] = llvm.mlir.constant(285372416 : i32)
// RDNA: %[[word3:.*]] = llvm.mlir.constant(822243328 : i32)
// CHECK: %[[resource:.*]] = llvm.insertelement{{.*}}%[[word3]]
// CHECK: %[[ret:.*]] = rocdl.raw.buffer.load %[[resource]], %{{.*}}, %{{.*}}, %{{.*}} : i32
// CHECK: return %[[ret]]
Expand Down

0 comments on commit db59054

Please sign in to comment.