Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir][SVE] Add an e2e test for vectorization of linalg.matmul #70372

Merged
merged 1 commit into from
Oct 27, 2023

Conversation

banach-space
Copy link
Contributor

Adds an end-to-end test for scalable vectorization of linalg.matmul.

Adds an end-to-end test for scalable vectorization of linalg.matmul.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 26, 2023

@llvm/pr-subscribers-mlir-sve
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-linalg

Author: Andrzej Warzyński (banach-space)

Changes

Adds an end-to-end test for scalable vectorization of linalg.matmul.


Full diff: https://github.com/llvm/llvm-project/pull/70372.diff

1 Files Affected:

  • (added) mlir/test/Integration/Dialect/Linalg/CPU/ArmSVE/matmul.mlir (+68)
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSVE/matmul.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSVE/matmul.mlir
new file mode 100644
index 000000000000000..2024da2a585d99f
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSVE/matmul.mlir
@@ -0,0 +1,68 @@
+// RUN: mlir-opt %s -test-transform-dialect-interpreter -test-transform-dialect-erase-schedule \
+// RUN:   -one-shot-bufferize -func-bufferize -cse -canonicalize -convert-vector-to-scf -arm-sve-legalize-vector-storage \
+// RUN:   -convert-vector-to-llvm="enable-arm-sve" -test-lower-to-llvm | \
+// RUN: %mcr_aarch64_cmd -e=matmul_f32 -entry-point-result=void --march=aarch64 --mattr="+sve" -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils | \
+// RUN: FileCheck %s
+
+func.func @matmul_f32() {
+  // Matrix dimensions
+  %K = arith.constant 3 : index
+  %M = arith.constant 5 : index
+  %N = arith.constant 15 : index
+  %c0_f32 = arith.constant 0.0 : f32
+
+  // Allocate the matrices
+  %A_alloc = bufferization.alloc_tensor(%M, %K) : tensor<?x?xf32>
+  %B_alloc = bufferization.alloc_tensor(%K, %N) : tensor<?x?xf32>
+  %C_alloc = bufferization.alloc_tensor(%M, %N) : tensor<?x?xf32>
+
+  // Initialise the matrices
+  %pi = arith.constant  3.14 : f32
+  %A = linalg.fill ins(%pi : f32) outs(%A_alloc : tensor<?x?xf32>) -> tensor<?x?xf32>
+  %B = linalg.fill ins(%pi : f32) outs(%B_alloc : tensor<?x?xf32>) -> tensor<?x?xf32>
+  %C_in = linalg.fill ins(%c0_f32 : f32) outs(%C_alloc : tensor<?x?xf32>) -> tensor<?x?xf32>
+
+  // Matmul
+  %C_out = linalg.matmul ins(%A, %B: tensor<?x?xf32>, tensor<?x?xf32>) outs(%C_in: tensor<?x?xf32>) -> tensor<?x?xf32>
+
+  // Print and verify the output
+  // CHECK-LABEL: SVE: START OF TEST OUTPUT
+  vector.print str "SVE: START OF TEST OUTPUT"
+
+  // CHECK-NEXT: Unranked Memref {{.*}} rank = 2 offset = 0 sizes = [5, 15] strides = [15, 1] data =
+  // CHECK-COUNT-5: [29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788, 29.5788]
+  %xf = tensor.cast %C_out : tensor<?x?xf32> to tensor<*xf32>
+  call @printMemrefF32(%xf) : (tensor<*xf32>) -> ()
+
+  // CHECK-NEXT: SVE: END OF TEST OUTPUT
+  vector.print str "SVE: END OF TEST OUTPUT"
+
+  return
+}
+
+transform.sequence failures(propagate) {
+^bb1(%module_op: !transform.any_op):
+  // Step 1: Tile
+  %matmul = transform.structured.match ops{["linalg.matmul"]} in %module_op : (!transform.any_op) -> !transform.any_op
+  %func_op = get_parent_op %matmul : (!transform.any_op) -> !transform.op<"func.func">
+  %module_with_tiled_loops, %loops:3 = transform.structured.tile_using_for %matmul [2, [4], 1] : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op, !transform.any_op)
+
+  // Step 2: Vectorize
+  %tiled_matmul = transform.structured.match ops{["linalg.matmul"]} in %module_with_tiled_loops : (!transform.any_op) -> !transform.any_op
+  transform.structured.vectorize %tiled_matmul vector_sizes [2, [4], 1] : !transform.any_op
+
+  // Step 3: Lower vector.multi_reduction to vector.contract (+ some helpful patterns)
+  transform.apply_patterns to %func_op {
+    transform.apply_patterns.vector.reduction_to_contract
+    transform.apply_patterns.vector.transfer_permutation_patterns
+    transform.apply_patterns.vector.lower_masked_transfers
+  } : !transform.op<"func.func">
+
+  // Step 4: Lower vector.contract to vector.fma
+  transform.apply_patterns to %func_op {
+    transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
+    transform.apply_patterns.vector.lower_outerproduct
+  } : !transform.op<"func.func">
+}
+
+func.func private @printMemrefF32(%ptr : tensor<*xf32>)

@banach-space
Copy link
Contributor Author

@MacDue , rather than fixing #69592, I decided to upgrade it a bit (the bot failure made me realize that it should not depend on the SVE implementation). Hence a new PR. Apologies for the noise!

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, keeping the matrices a fixed-size seems like a more realistic test 👍

@banach-space banach-space merged commit 45e2e03 into llvm:main Oct 27, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants