[mlir][amdgpu] Add scaled_ext_packed{8,16} operations #159830

amd-eochoalo · 2025-09-19T18:36:05Z

No description provided.

llvmbot · 2025-09-19T18:49:48Z

@llvm/pr-subscribers-mlir-amdgpu
@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-backend-amdgpu

Author: Erick Ochoa Lopez (amd-eochoalo)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/159830.diff

2 Files Affected:

(modified) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td (+42)
(modified) mlir/test/Dialect/AMDGPU/ops.mlir (+55)

diff --git a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
index a24a918357f2d..d5ea737e229ff 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
@@ -112,6 +112,48 @@ def AMDGPU_ExtPackedFp8Op :
   }];
 }
 
+def AMDGPU_ScaledExtPacked8Op
+    : AMDGPU_Op<"scaled_ext_packed8", [Pure]>,
+      Arguments<(
+          ins VectorOfLengthAndType<[8], [F4E2M1FN,F8E4M3FN,F8E5M2]>:$source,
+          F32:$scale,
+          ConfinedAttr<I32Attr, [IntNonNegative, IntMaxValue<7>]>:$index)>,
+      Results<(
+          outs AnyTypeOf<[FixedVectorOfLengthAndType<[8], [F32]>,
+                          FixedVectorOfLengthAndType<[8], [F16]>,
+                          FixedVectorOfLengthAndType<[8], [BF16]>]>:$res)> {
+  let summary = "Extend a vector of packed floating point values";
+
+  let description = [{
+    Extend and scale eight packed floats in to eight floats and return them.
+  }];
+
+  let assemblyFormat = [{
+    attr-dict $source `,` $scale `[` $index `]` `:` type($source) `to` type($res)
+  }];
+}
+
+def AMDGPU_ScaledExtPacked16Op
+    : AMDGPU_Op<"scaled_ext_packed16", [Pure]>,
+      Arguments<(
+          ins VectorOfLengthAndType<[16], [F6E2M3FN, F6E3M2FN]>:$source,
+          F32:$scale,
+          ConfinedAttr<I32Attr, [IntNonNegative, IntMaxValue<7>]>:$index)>,
+      Results<(
+          outs AnyTypeOf<[FixedVectorOfLengthAndType<[16], [F32]>,
+                          FixedVectorOfLengthAndType<[16], [F16]>,
+                          FixedVectorOfLengthAndType<[16], [BF16]>]>:$res)> {
+  let summary = "Extend a vector of packed floating point values";
+
+  let description = [{
+    Extend and scale 16 packed floats to 16 floats and return them.
+  }];
+
+  let assemblyFormat = [{
+    attr-dict $source `,` $scale `[` $index `]` `:` type($source) `to` type($res)
+  }];
+}
+
 def AMDGPU_ScaledExtPackedOp
     : AMDGPU_Op<"scaled_ext_packed", [Pure]>,
       Arguments<(
diff --git a/mlir/test/Dialect/AMDGPU/ops.mlir b/mlir/test/Dialect/AMDGPU/ops.mlir
index 369e0fff538e1..1841c0815b435 100644
--- a/mlir/test/Dialect/AMDGPU/ops.mlir
+++ b/mlir/test/Dialect/AMDGPU/ops.mlir
@@ -221,6 +221,61 @@ func.func @scaled_ext_scalar_f4e2m1_bf16(%v: vector<2xf4E2M1FN>, %scale: f32) ->
   func.return %ret : vector<2xbf16>
 }
 
+// CHECK-LABEL: func.func @scaled_ext_packed8_fp4
+func.func @scaled_ext_packed8_fp4(%v: vector<8xf4E2M1FN>, %scale: f32) -> (vector<8xf16>, vector<8xbf16>, vector<8xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret0 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf4E2M1FN> to vector<8xf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret1 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf4E2M1FN> to vector<8xbf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret2 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf4E2M1FN> to vector<8xf32>
+  func.return %ret0, %ret1, %ret2 : vector<8xf16>, vector<8xbf16>, vector<8xf32>
+}
+
+// CHECK-LABEL: func.func @scaled_ext_packed8_fp8
+func.func @scaled_ext_packed8_fp8(%v: vector<8xf8E4M3FN>, %scale: f32) -> (vector<8xf16>, vector<8xbf16>, vector<8xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret0 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E4M3FN> to vector<8xf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret1 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E4M3FN> to vector<8xbf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret2 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E4M3FN> to vector<8xf32>
+  func.return %ret0, %ret1, %ret2 : vector<8xf16>, vector<8xbf16>, vector<8xf32>
+}
+
+// CHECK-LABEL: func.func @scaled_ext_packed8_bf8
+func.func @scaled_ext_packed8_bf8(%v: vector<8xf8E5M2>, %scale: f32) -> (vector<8xf16>, vector<8xbf16>, vector<8xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret0 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E5M2> to vector<8xf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret1 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E5M2> to vector<8xbf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret2 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E5M2> to vector<8xf32>
+  func.return %ret0, %ret1, %ret2 : vector<8xf16>, vector<8xbf16>, vector<8xf32>
+}
+
+// CHECK-LABEL: func.func @scaled_ext_packed16_fp6
+func.func @scaled_ext_packed16_fp6(%v: vector<16xf6E2M3FN>, %scale: f32) -> (vector<16xf16>, vector<16xbf16>, vector<16xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret0 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E2M3FN> to vector<16xf16>
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret1 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E2M3FN> to vector<16xbf16>
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret2 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E2M3FN> to vector<16xf32>
+  func.return %ret0, %ret1, %ret2 : vector<16xf16>, vector<16xbf16>, vector<16xf32>
+}
+
+// CHECK-LABEL: func.func @scaled_ext_packed16_bf16
+func.func @scaled_ext_packed16_bf16(%v: vector<16xf6E3M2FN>, %scale: f32) -> (vector<16xf16>, vector<16xbf16>, vector<16xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret0 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E3M2FN> to vector<16xf16>
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret1 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E3M2FN> to vector<16xbf16>
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret2 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E3M2FN> to vector<16xf32>
+  func.return %ret0, %ret1, %ret2 : vector<16xf16>, vector<16xbf16>, vector<16xf32>
+}
+
 // CHECK-LABEL: func.func @packed_scaled_trunc_f8e4m3_f32
 // CHECK: amdgpu.packed_scaled_trunc
 func.func @packed_scaled_trunc_f8e4m3_f32(%v: vector<2xf32>, %scale: f32) -> vector<4xf8E4M3FN> {

llvmbot · 2025-09-19T18:49:49Z

@llvm/pr-subscribers-mlir

Author: Erick Ochoa Lopez (amd-eochoalo)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/159830.diff

2 Files Affected:

(modified) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td (+42)
(modified) mlir/test/Dialect/AMDGPU/ops.mlir (+55)

diff --git a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
index a24a918357f2d..d5ea737e229ff 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
@@ -112,6 +112,48 @@ def AMDGPU_ExtPackedFp8Op :
   }];
 }
 
+def AMDGPU_ScaledExtPacked8Op
+    : AMDGPU_Op<"scaled_ext_packed8", [Pure]>,
+      Arguments<(
+          ins VectorOfLengthAndType<[8], [F4E2M1FN,F8E4M3FN,F8E5M2]>:$source,
+          F32:$scale,
+          ConfinedAttr<I32Attr, [IntNonNegative, IntMaxValue<7>]>:$index)>,
+      Results<(
+          outs AnyTypeOf<[FixedVectorOfLengthAndType<[8], [F32]>,
+                          FixedVectorOfLengthAndType<[8], [F16]>,
+                          FixedVectorOfLengthAndType<[8], [BF16]>]>:$res)> {
+  let summary = "Extend a vector of packed floating point values";
+
+  let description = [{
+    Extend and scale eight packed floats in to eight floats and return them.
+  }];
+
+  let assemblyFormat = [{
+    attr-dict $source `,` $scale `[` $index `]` `:` type($source) `to` type($res)
+  }];
+}
+
+def AMDGPU_ScaledExtPacked16Op
+    : AMDGPU_Op<"scaled_ext_packed16", [Pure]>,
+      Arguments<(
+          ins VectorOfLengthAndType<[16], [F6E2M3FN, F6E3M2FN]>:$source,
+          F32:$scale,
+          ConfinedAttr<I32Attr, [IntNonNegative, IntMaxValue<7>]>:$index)>,
+      Results<(
+          outs AnyTypeOf<[FixedVectorOfLengthAndType<[16], [F32]>,
+                          FixedVectorOfLengthAndType<[16], [F16]>,
+                          FixedVectorOfLengthAndType<[16], [BF16]>]>:$res)> {
+  let summary = "Extend a vector of packed floating point values";
+
+  let description = [{
+    Extend and scale 16 packed floats to 16 floats and return them.
+  }];
+
+  let assemblyFormat = [{
+    attr-dict $source `,` $scale `[` $index `]` `:` type($source) `to` type($res)
+  }];
+}
+
 def AMDGPU_ScaledExtPackedOp
     : AMDGPU_Op<"scaled_ext_packed", [Pure]>,
       Arguments<(
diff --git a/mlir/test/Dialect/AMDGPU/ops.mlir b/mlir/test/Dialect/AMDGPU/ops.mlir
index 369e0fff538e1..1841c0815b435 100644
--- a/mlir/test/Dialect/AMDGPU/ops.mlir
+++ b/mlir/test/Dialect/AMDGPU/ops.mlir
@@ -221,6 +221,61 @@ func.func @scaled_ext_scalar_f4e2m1_bf16(%v: vector<2xf4E2M1FN>, %scale: f32) ->
   func.return %ret : vector<2xbf16>
 }
 
+// CHECK-LABEL: func.func @scaled_ext_packed8_fp4
+func.func @scaled_ext_packed8_fp4(%v: vector<8xf4E2M1FN>, %scale: f32) -> (vector<8xf16>, vector<8xbf16>, vector<8xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret0 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf4E2M1FN> to vector<8xf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret1 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf4E2M1FN> to vector<8xbf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret2 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf4E2M1FN> to vector<8xf32>
+  func.return %ret0, %ret1, %ret2 : vector<8xf16>, vector<8xbf16>, vector<8xf32>
+}
+
+// CHECK-LABEL: func.func @scaled_ext_packed8_fp8
+func.func @scaled_ext_packed8_fp8(%v: vector<8xf8E4M3FN>, %scale: f32) -> (vector<8xf16>, vector<8xbf16>, vector<8xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret0 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E4M3FN> to vector<8xf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret1 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E4M3FN> to vector<8xbf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret2 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E4M3FN> to vector<8xf32>
+  func.return %ret0, %ret1, %ret2 : vector<8xf16>, vector<8xbf16>, vector<8xf32>
+}
+
+// CHECK-LABEL: func.func @scaled_ext_packed8_bf8
+func.func @scaled_ext_packed8_bf8(%v: vector<8xf8E5M2>, %scale: f32) -> (vector<8xf16>, vector<8xbf16>, vector<8xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret0 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E5M2> to vector<8xf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret1 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E5M2> to vector<8xbf16>
+  // CHECK: amdgpu.scaled_ext_packed8
+  %ret2 = amdgpu.scaled_ext_packed8 %v, %scale[0] : vector<8xf8E5M2> to vector<8xf32>
+  func.return %ret0, %ret1, %ret2 : vector<8xf16>, vector<8xbf16>, vector<8xf32>
+}
+
+// CHECK-LABEL: func.func @scaled_ext_packed16_fp6
+func.func @scaled_ext_packed16_fp6(%v: vector<16xf6E2M3FN>, %scale: f32) -> (vector<16xf16>, vector<16xbf16>, vector<16xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret0 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E2M3FN> to vector<16xf16>
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret1 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E2M3FN> to vector<16xbf16>
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret2 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E2M3FN> to vector<16xf32>
+  func.return %ret0, %ret1, %ret2 : vector<16xf16>, vector<16xbf16>, vector<16xf32>
+}
+
+// CHECK-LABEL: func.func @scaled_ext_packed16_bf16
+func.func @scaled_ext_packed16_bf16(%v: vector<16xf6E3M2FN>, %scale: f32) -> (vector<16xf16>, vector<16xbf16>, vector<16xf32>) {
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret0 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E3M2FN> to vector<16xf16>
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret1 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E3M2FN> to vector<16xbf16>
+  // CHECK: amdgpu.scaled_ext_packed16
+  %ret2 = amdgpu.scaled_ext_packed16 %v, %scale[0] : vector<16xf6E3M2FN> to vector<16xf32>
+  func.return %ret0, %ret1, %ret2 : vector<16xf16>, vector<16xbf16>, vector<16xf32>
+}
+
 // CHECK-LABEL: func.func @packed_scaled_trunc_f8e4m3_f32
 // CHECK: amdgpu.packed_scaled_trunc
 func.func @packed_scaled_trunc_f8e4m3_f32(%v: vector<2xf32>, %scale: f32) -> vector<4xf8E4M3FN> {

krzysz00 · 2025-09-19T18:54:37Z

mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td

+  }];
+}
+
+def AMDGPU_ScaledExtPacked16Op


Don't make distinct operations here. Instead, loosen the definition of scaled_ext_packed and add checks for chip compatibility to the lowering.

If that's not feasible, get back to me.

I did it here c3832b0 . Is the assembly format acceptable for you?

…ops" This reverts commit c3832b0.

krzysz00 · 2025-09-19T19:48:08Z

On further offline discussion, I'll need more context / we'll want to see if two separate ops are actually the better design here.

amd-eochoalo · 2025-09-19T20:01:59Z

between f92db34 and c3832b0 I prefer f92db34. By having two optional attributes which are in an XOR-relationship the constructors for this operation will always require a nullptr and getting the attributes will always get a std::optional<int32_t>.

Between f92db34 and merging these two operations into their distinct operation, one thing to notice is that we will need a verifier to make sure the types are correctly matched. (Not a big deal). We also need to choose a name for this operation since scaled_ext_packed is already taken. Maybe scaled_ext_packed_8_or_16?

[mlir][amdgpu] Add scaled_ext_packed{8,16} operations

f92db34

amd-eochoalo marked this pull request as ready for review September 19, 2025 18:49

llvmbot added backend:AMDGPU mlir:gpu mlir mlir:amdgpu labels Sep 19, 2025

amd-eochoalo requested review from krzysz00 and kuhar September 19, 2025 18:51

krzysz00 requested changes Sep 19, 2025

View reviewed changes

[mlir][amdgpu] Use existing scaled_ext_packed instead of new ops

c3832b0

amd-eochoalo requested a review from krzysz00 September 19, 2025 19:30

Revert "[mlir][amdgpu] Use existing scaled_ext_packed instead of new …

d34b02d

…ops" This reverts commit c3832b0.

merge into a single op

0d09fc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][amdgpu] Add scaled_ext_packed{8,16} operations #159830

[mlir][amdgpu] Add scaled_ext_packed{8,16} operations #159830

Uh oh!

amd-eochoalo commented Sep 19, 2025

Uh oh!

llvmbot commented Sep 19, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 19, 2025

Uh oh!

krzysz00 Sep 19, 2025

Uh oh!

amd-eochoalo Sep 19, 2025

Uh oh!

krzysz00 commented Sep 19, 2025

Uh oh!

amd-eochoalo commented Sep 19, 2025

Uh oh!

Uh oh!

[mlir][amdgpu] Add scaled_ext_packed{8,16} operations #159830

Are you sure you want to change the base?

[mlir][amdgpu] Add scaled_ext_packed{8,16} operations #159830

Uh oh!

Conversation

amd-eochoalo commented Sep 19, 2025

Uh oh!

llvmbot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 19, 2025

Uh oh!

krzysz00 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

amd-eochoalo Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Sep 19, 2025

Uh oh!

amd-eochoalo commented Sep 19, 2025

Uh oh!

Uh oh!

llvmbot commented Sep 19, 2025 •

edited

Loading