[MLIR][Affine] Refuse affine-super-vectorize on non-unit access strides#196973
[MLIR][Affine] Refuse affine-super-vectorize on non-unit access strides#196973swjng wants to merge 1 commit into
Conversation
|
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-affine Author: Soowon Jeong (swjng) Changes[MLIR][Affine] Refuse affine-super-vectorize on non-unit access stridesSummary
This patch requires the IV coefficient in the varying memref dimension to be Reproducerfunc.func @<!-- -->non_unit_stride_store_unsupported(%arg0: memref<64xf32>) {
%cst = arith.constant 0.000000e+00 : f32
affine.for %i = 0 to 16 {
affine.store %cst, %arg0[%i * 4] : memref<64xf32>
}
return
}Apply: mlir-opt repro.mlir -affine-super-vectorize="virtual-vector-size=4"Before this patch, the pass can emit a WhyThe vectorization legality check treats the access as vectorizable because the For FixInline Composition is required to see through TestAdded Updated Ran: ninja -C build mlir-opt FileCheck count not
build/bin/llvm-lit -sv mlir/test/Dialect/Affine/Result: 70/70 tests pass (no regressions in How it was foundSurfaced by an in-progress SMT-solver-based analysis tool for MLIR affine Tool-use disclosureThis change was drafted with OpenAI Codex assistance per LLVM's AI Tool Use Full diff: https://github.com/llvm/llvm-project/pull/196973.diff 3 Files Affected:
diff --git a/mlir/lib/Dialect/Affine/Analysis/LoopAnalysis.cpp b/mlir/lib/Dialect/Affine/Analysis/LoopAnalysis.cpp
index 166d39e88d41e..caac64d86cc46 100644
--- a/mlir/lib/Dialect/Affine/Analysis/LoopAnalysis.cpp
+++ b/mlir/lib/Dialect/Affine/Analysis/LoopAnalysis.cpp
@@ -12,6 +12,7 @@
#include "mlir/Dialect/Affine/Analysis/LoopAnalysis.h"
+#include "mlir/Analysis/FlatLinearValueConstraints.h"
#include "mlir/Analysis/SliceAnalysis.h"
#include "mlir/Dialect/Affine/Analysis/AffineAnalysis.h"
#include "mlir/Dialect/Affine/Analysis/AffineStructures.h"
@@ -331,8 +332,25 @@ bool mlir::affine::isContiguousAccess(Value iv, LoadOrStoreOp memoryOp,
int uniqueVaryingIndexAlongIv = -1;
auto accessMap = memoryOp.getAffineMap();
SmallVector<Value, 4> mapOperands(memoryOp.getMapOperands());
+ // Inline affine.apply chains so the iv-coefficient check below sees iv even
+ // when it arrives through a derived value.
+ fullyComposeAffineMapAndOperands(&accessMap, &mapOperands);
unsigned numDims = accessMap.getNumDims();
+ unsigned numSymbols = accessMap.getNumSymbols();
for (unsigned i = 0, e = memRefType.getRank(); i < e; ++i) {
+ SmallVector<int64_t, 4> flattenedExpr;
+ if (failed(getFlattenedAffineExpr(accessMap.getResult(i), numDims,
+ numSymbols, &flattenedExpr)))
+ return false;
+
+ int64_t ivCoeff = 0;
+ for (unsigned dim = 0; dim < numDims; ++dim)
+ if (mapOperands[dim] == iv)
+ ivCoeff += flattenedExpr[dim];
+ for (unsigned sym = 0; sym < numSymbols; ++sym)
+ if (mapOperands[numDims + sym] == iv)
+ ivCoeff += flattenedExpr[numDims + sym];
+
// Gather map operands used in result expr 'i' in 'exprOperands'.
SmallVector<Value, 4> exprOperands;
auto resultExpr = accessMap.getResult(i);
@@ -352,6 +370,8 @@ bool mlir::affine::isContiguousAccess(Value iv, LoadOrStoreOp memoryOp,
uniqueVaryingIndexAlongIv = i;
}
}
+ if (uniqueVaryingIndexAlongIv == static_cast<int>(i) && ivCoeff != 1)
+ return false;
}
if (uniqueVaryingIndexAlongIv == -1)
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_non_unit_stride.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_non_unit_stride.mlir
new file mode 100644
index 0000000000000..d0b69078fad36
--- /dev/null
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_non_unit_stride.mlir
@@ -0,0 +1,17 @@
+// RUN: mlir-opt %s -affine-super-vectorize="virtual-vector-size=4" | FileCheck %s
+
+// A non-unit coefficient on the vectorized IV is not a contiguous access. A
+// plain vector.transfer_write would touch consecutive cells even though the
+// scalar loop writes only every fourth cell.
+
+// CHECK-LABEL: func.func @non_unit_stride_store_unsupported
+// CHECK: affine.for
+// CHECK: affine.store
+// CHECK-NOT: vector.transfer_write
+func.func @non_unit_stride_store_unsupported(%arg0: memref<64xf32>) {
+ %cst = arith.constant 0.000000e+00 : f32
+ affine.for %i = 0 to 16 {
+ affine.store %cst, %arg0[%i * 4] : memref<64xf32>
+ }
+ return
+}
diff --git a/mlir/test/Dialect/Affine/access-analysis.mlir b/mlir/test/Dialect/Affine/access-analysis.mlir
index 789de646a8f9e..1bb230aebbfe1 100644
--- a/mlir/test/Dialect/Affine/access-analysis.mlir
+++ b/mlir/test/Dialect/Affine/access-analysis.mlir
@@ -11,17 +11,16 @@ func.func @loop_simple(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// expected-remark@above {{invariant along loop 1}}
affine.load %A[%c0, 8 * %i + %j] : memref<?x?xf32>
// expected-remark@above {{contiguous along loop 1}}
- // Note/FIXME: access stride isn't being checked.
- // expected-remark@-3 {{contiguous along loop 0}}
+ // Access stride 8 along loop 0 in the last memref dim is no longer
+ // reported as contiguous.
// These are all non-contiguous along both loops. Nothing is emitted.
affine.load %A[%i, %c0] : memref<?x?xf32>
// expected-remark@above {{invariant along loop 1}}
- // Note/FIXME: access stride isn't being checked.
affine.load %A[%i, 8 * %j] : memref<?x?xf32>
- // expected-remark@above {{contiguous along loop 1}}
+ // Access stride 8 along loop 1; no contiguity remark emitted.
affine.load %A[%j, 4 * %i] : memref<?x?xf32>
- // expected-remark@above {{contiguous along loop 0}}
+ // Access stride 4 along loop 0 in the last memref dim; no remark.
}
}
return
@@ -70,8 +69,9 @@ func.func @tiled(%arg0: memref<*xf32>) {
// expected-remark@above {{invariant along loop 4}}
affine.store %0, %alloc_0[0, %arg1 * -16 + %arg4, 0, %arg3 * -16 + %arg5] : memref<1x16x1x16xf32>
// expected-remark@above {{contiguous along loop 4}}
- // expected-remark@above {{contiguous along loop 2}}
// expected-remark@above {{invariant along loop 1}}
+ // The %arg3 contribution to dim 3 is `%arg3 * -16`, so the
+ // access is not contiguous along loop 2.
}
}
affine.for %arg4 = #map(%arg1) to #map1(%arg1) {
@@ -79,7 +79,8 @@ func.func @tiled(%arg0: memref<*xf32>) {
affine.for %arg6 = #map(%arg3) to #map1(%arg3) {
%0 = affine.load %alloc_0[0, %arg1 * -16 + %arg4, -%arg2 + %arg5, %arg3 * -16 + %arg6] : memref<1x16x1x16xf32>
// expected-remark@above {{contiguous along loop 5}}
- // expected-remark@above {{contiguous along loop 2}}
+ // The %arg3 contribution to dim 3 is `%arg3 * -16`, so the
+ // access is not contiguous along loop 2.
affine.store %0, %alloc[0, %arg5, %arg6, %arg4] : memref<1x224x224x64xf32>
// expected-remark@above {{contiguous along loop 3}}
// expected-remark@above {{invariant along loop 0}}
|
`isContiguousAccess` checked only whether the candidate vector IV appears
in one memref dimension; it didn't check the IV's *coefficient*. As a
result `%a[%i * 4]` was treated as contiguous, and super-vectorize
emitted a `vector.transfer_write` that touches `{%i*4, %i*4+1, %i*4+2,
%i*4+3}` while the scalar loop only writes `{0, 4, 8, ...}`.
After composing intervening `affine.apply` chains with
`fullyComposeAffineMapAndOperands`, flatten each result expression and
require the IV-varying dimension's coefficient on the candidate IV to be
exactly `1`.
`mlir/test/Dialect/Affine/access-analysis.mlir` previously documented
this as a `Note/FIXME: access stride isn't being checked` next to
expected-remarks that asserted spurious "contiguous along loop N" for
stride-8 / stride-(-16) accesses. With the fix in place those remarks
are correctly suppressed; the FIXME annotations and the remarks they
guarded are dropped.
Adds a regression in
`mlir/test/Dialect/Affine/SuperVectorize/vectorize_non_unit_stride.mlir`.
96134f9 to
c68902c
Compare
Summary
affine-super-vectorizecan currently treat%a[%i * 4]as a contiguousaccess because the vectorized IV affects only one memref dimension. It then
emits a contiguous
vector.transfer_write, touching cells that the scalar loopnever writes.
This patch requires the IV coefficient in the varying memref dimension to be
1before treating an access as contiguous.Reproducer
Apply:
mlir-opt repro.mlir -affine-super-vectorize="virtual-vector-size=4"Before this patch, the pass can emit a
vector.transfer_writeat%i * 4.That writes
%i * 4 + 1,%i * 4 + 2, and%i * 4 + 3, which are not part ofthe scalar loop's memory footprint.
Why
The vectorization legality check treats the access as vectorizable because the
loop IV affects only one memref dimension. It does not require the IV's
coefficient in that dimension to be
1.For
%a[%i * 4], the scalar loop writes only{0, 4, 8, ...}. The vectorizedloop writes a full vector at
%i * 4, touching{%i*4, %i*4+1, %i*4+2, %i*4+3}. Those extra lanes are in-bounds but outside the source loop'siteration footprint.
Fix
Inline
affine.applychains viafullyComposeAffineMapAndOperandsso theaccess map's operand list refers to leaf SSA values. After the existing
per-dimension invariance scan picks the unique IV-varying dimension, flatten
that dimension's result expression and read off the IV coefficient. If it is
not
1, reject vectorization. Other dimensions never constrain stride, sothe coefficient computation is deferred until the varying dimension is
identified rather than run on every dimension.
Composition is required to see through
%a[%apply(%i)]-style operands: theexisting super-vec machinery (including
%i mod Nstrided-broadcast patterns)already relies on this canonical form, so reusing it keeps the check
consistent and avoids over-rejection.
Test
Added
mlir/test/Dialect/Affine/SuperVectorize/vectorize_non_unit_stride.mlir,checking that
%arg0[%i * 4]remains scalar and novector.transfer_writeisemitted.
Updated
mlir/test/Dialect/Affine/access-analysis.mlir: the test previouslycarried
Note/FIXME: access stride isn't being checkedannotations next toexpected-remarklines that asserted buggy "contiguous along loop N"diagnostics for stride-8 and stride-(-16) accesses. With the fix in place
those remarks are correctly suppressed; the FIXME comments and corresponding
expected-remarks are dropped.
Ran:
Result: 70/70 tests pass (no regressions in
vectorize_affine_apply.mlir,which exercises the affine.apply-composition path).
How it was found
Surfaced by an in-progress SMT-solver-based analysis tool for MLIR affine
passes.
Tool-use disclosure
This change was drafted with OpenAI Codex assistance per LLVM's AI Tool Use
Policy. All code was read, reviewed, and tested locally; the commit should
carry an
Assisted-by: OpenAI Codextrailer.