[MLIR] Create memref dialect and move dialect-specific ops from std.

Create the memref dialect and move dialect-specific ops from std dialect to this dialect. Moved ops: AllocOp -> MemRef_AllocOp AllocaOp -> MemRef_AllocaOp AssumeAlignmentOp -> MemRef_AssumeAlignmentOp DeallocOp -> MemRef_DeallocOp DimOp -> MemRef_DimOp MemRefCastOp -> MemRef_CastOp MemRefReinterpretCastOp -> MemRef_ReinterpretCastOp GetGlobalMemRefOp -> MemRef_GetGlobalOp GlobalMemRefOp -> MemRef_GlobalOp LoadOp -> MemRef_LoadOp PrefetchOp -> MemRef_PrefetchOp ReshapeOp -> MemRef_ReshapeOp StoreOp -> MemRef_StoreOp SubViewOp -> MemRef_SubViewOp TransposeOp -> MemRef_TransposeOp TensorLoadOp -> MemRef_TensorLoadOp TensorStoreOp -> MemRef_TensorStoreOp TensorToMemRefOp -> MemRef_BufferCastOp ViewOp -> MemRef_ViewOp The roadmap to split the memref dialect from std is discussed here: https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667 Differential Revision: https://reviews.llvm.org/D98041
llvm · Mar 15, 2021 · e231070 · e231070
1 parent a883714
commit e231070
Show file tree

Hide file tree

Showing 367 changed files with 10,173 additions and 9,642 deletions.
diff --git a/mlir/docs/BufferDeallocationInternals.md b/mlir/docs/BufferDeallocationInternals.md
@@ -779,8 +779,8 @@ the deallocation of the source value.
 ## Known Limitations
 
 BufferDeallocation introduces additional copies using allocations from the
-“std” dialect (“std.alloc”). Analogous, all deallocations use the “std”
-dialect-free operation “std.dealloc”. The actual copy process is realized using
-“linalg.copy”. Furthermore, buffers are essentially immutable after their
-creation in a block. Another limitations are known in the case using
-unstructered control flow.
+“memref” dialect (“memref.alloc”). Analogous, all deallocations use the
+“memref” dialect-free operation “memref.dealloc”. The actual copy process is
+realized using “linalg.copy”. Furthermore, buffers are essentially immutable
+after their creation in a block. Another limitations are known in the case
+using unstructered control flow.
diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
@@ -190,8 +190,8 @@ One convenient utility provided by the MLIR bufferization infrastructure is the
 `BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions
 and materializations between `tensor` and `memref`.
 
-In this case, the `StandardOpsDialect` is marked as legal, so the `tensor_load`
-and `tensor_to_memref` ops, which are inserted automatically by the dialect
+In this case, the `MemRefOpsDialect` is marked as legal, so the `tensor_load`
+and `buffer_cast` ops, which are inserted automatically by the dialect
 conversion framework as materializations, are legal. There is a helper
 `populateBufferizeMaterializationLegality`
 ([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53))
@@ -247,7 +247,7 @@ from the program.
 
 The easiest way to write a finalizing bufferize pass is to not write one at all!
 MLIR provides a pass `finalizing-bufferize` which eliminates the `tensor_load` /
-`tensor_to_memref` materialization ops inserted by partial bufferization passes
+`buffer_cast` materialization ops inserted by partial bufferization passes
 and emits an error if that is not sufficient to remove all tensors from the
 program.
 
@@ -268,7 +268,7 @@ recommended in new code. A helper,
 `populateEliminateBufferizeMaterializationsPatterns`
 ([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58))
 is available for such passes to provide patterns that eliminate `tensor_load`
-and `tensor_to_memref`.
+and `buffer_cast`.
 
 ## Changes since [the talk](#the-talk)
 

diff --git a/mlir/docs/Dialects/Linalg.md b/mlir/docs/Dialects/Linalg.md
@@ -406,9 +406,9 @@ into a form that will resemble:
 #map0 = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>
 
 func @example(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
-  %0 = memref_cast %arg0 : memref<?x?xf32> to memref<?x?xf32, #map0>
-  %1 = memref_cast %arg1 : memref<?x?xf32> to memref<?x?xf32, #map0>
-  %2 = memref_cast %arg2 : memref<?x?xf32> to memref<?x?xf32, #map0>
+  %0 = memref.cast %arg0 : memref<?x?xf32> to memref<?x?xf32, #map0>
+  %1 = memref.cast %arg1 : memref<?x?xf32> to memref<?x?xf32, #map0>
+  %2 = memref.cast %arg2 : memref<?x?xf32> to memref<?x?xf32, #map0>
   call @pointwise_add(%0, %1, %2) : (memref<?x?xf32, #map0>, memref<?x?xf32, #map0>, memref<?x?xf32, #map0>) -> ()
   return
 }
@@ -518,9 +518,9 @@ A set of ops that manipulate metadata but do not move memory. These ops take
 generally alias the operand `view`. At the moment the existing ops are:
 
 ```
-* `std.view`,
+* `memref.view`,
 * `std.subview`,
-* `std.transpose`.
+* `memref.transpose`.
 * `linalg.range`,
 * `linalg.slice`,
 * `linalg.reshape`,

diff --git a/mlir/docs/Dialects/MemRef.md b/mlir/docs/Dialects/MemRef.md
@@ -0,0 +1,76 @@
+# 'memref' Dialect
+
+This dialect provides documentation for operations within the MemRef dialect.
+
+**Please post an RFC on the [forum](https://llvm.discourse.group/c/mlir/31)
+before adding or changing any operation in this dialect.**
+
+[TOC]
+
+## Operations
+
+[include "Dialects/MemRefOps.md"]
+
+### 'dma_start' operation
+
+Syntax:
+
+```
+operation ::= `dma_start` ssa-use`[`ssa-use-list`]` `,`
+               ssa-use`[`ssa-use-list`]` `,` ssa-use `,`
+               ssa-use`[`ssa-use-list`]` (`,` ssa-use `,` ssa-use)?
+              `:` memref-type `,` memref-type `,` memref-type
+```
+
+Starts a non-blocking DMA operation that transfers data from a source memref to
+a destination memref. The operands include the source and destination memref's
+each followed by its indices, size of the data transfer in terms of the number
+of elements (of the elemental type of the memref), a tag memref with its
+indices, and optionally two additional arguments corresponding to the stride (in
+terms of number of elements) and the number of elements to transfer per stride.
+The tag location is used by a dma_wait operation to check for completion. The
+indices of the source memref, destination memref, and the tag memref have the
+same restrictions as any load/store operation in an affine context (whenever DMA
+operations appear in an affine context). See
+[restrictions on dimensions and symbols](Affine.md#restrictions-on-dimensions-and-symbols)
+in affine contexts. This allows powerful static analysis and transformations in
+the presence of such DMAs including rescheduling, pipelining / overlap with
+computation, and checking for matching start/end operations. The source and
+destination memref need not be of the same dimensionality, but need to have the
+same elemental type.
+
+For example, a `dma_start` operation that transfers 32 vector elements from a
+memref `%src` at location `[%i, %j]` to memref `%dst` at `[%k, %l]` would be
+specified as shown below.
+
+Example:
+
+```mlir
+%size = constant 32 : index
+%tag = alloc() : memref<1 x i32, affine_map<(d0) -> (d0)>, 4>
+%idx = constant 0 : index
+dma_start %src[%i, %j], %dst[%k, %l], %size, %tag[%idx] :
+     memref<40 x 8 x vector<16xf32>, affine_map<(d0, d1) -> (d0, d1)>, 0>,
+     memref<2 x 4 x vector<16xf32>, affine_map<(d0, d1) -> (d0, d1)>, 2>,
+     memref<1 x i32>, affine_map<(d0) -> (d0)>, 4>
+```
+
+### 'dma_wait' operation
+
+Syntax:
+
+```
+operation ::= `dma_wait` ssa-use`[`ssa-use-list`]` `,` ssa-use `:` memref-type
+```
+
+Blocks until the completion of a DMA operation associated with the tag element
+specified with a tag memref and its indices. The operands include the tag memref
+followed by its indices and the number of elements associated with the DMA being
+waited on. The indices of the tag memref have the same restrictions as
+load/store indices.
+
+Example:
+
+```mlir
+dma_wait %tag[%idx], %size : memref<1 x i32, affine_map<(d0) -> (d0)>, 4>
+```
diff --git a/mlir/docs/Dialects/Standard.md b/mlir/docs/Dialects/Standard.md
@@ -13,67 +13,3 @@ before adding or changing any operation in this dialect.**
 ## Operations
 
 [include "Dialects/StandardOps.md"]
-
-### 'dma_start' operation
-
-Syntax:
-
-```
-operation ::= `dma_start` ssa-use`[`ssa-use-list`]` `,`
-               ssa-use`[`ssa-use-list`]` `,` ssa-use `,`
-               ssa-use`[`ssa-use-list`]` (`,` ssa-use `,` ssa-use)?
-              `:` memref-type `,` memref-type `,` memref-type
-```
-
-Starts a non-blocking DMA operation that transfers data from a source memref to
-a destination memref. The operands include the source and destination memref's
-each followed by its indices, size of the data transfer in terms of the number
-of elements (of the elemental type of the memref), a tag memref with its
-indices, and optionally two additional arguments corresponding to the stride (in
-terms of number of elements) and the number of elements to transfer per stride.
-The tag location is used by a dma_wait operation to check for completion. The
-indices of the source memref, destination memref, and the tag memref have the
-same restrictions as any load/store operation in an affine context (whenever DMA
-operations appear in an affine context). See
-[restrictions on dimensions and symbols](Affine.md#restrictions-on-dimensions-and-symbols)
-in affine contexts. This allows powerful static analysis and transformations in
-the presence of such DMAs including rescheduling, pipelining / overlap with
-computation, and checking for matching start/end operations. The source and
-destination memref need not be of the same dimensionality, but need to have the
-same elemental type.
-
-For example, a `dma_start` operation that transfers 32 vector elements from a
-memref `%src` at location `[%i, %j]` to memref `%dst` at `[%k, %l]` would be
-specified as shown below.
-
-Example:
-
-```mlir
-%size = constant 32 : index
-%tag = alloc() : memref<1 x i32, affine_map<(d0) -> (d0)>, 4>
-%idx = constant 0 : index
-dma_start %src[%i, %j], %dst[%k, %l], %size, %tag[%idx] :
-     memref<40 x 8 x vector<16xf32>, affine_map<(d0, d1) -> (d0, d1)>, 0>,
-     memref<2 x 4 x vector<16xf32>, affine_map<(d0, d1) -> (d0, d1)>, 2>,
-     memref<1 x i32>, affine_map<(d0) -> (d0)>, 4>
-```
-
-### 'dma_wait' operation
-
-Syntax:
-
-```
-operation ::= `dma_wait` ssa-use`[`ssa-use-list`]` `,` ssa-use `:` memref-type
-```
-
-Blocks until the completion of a DMA operation associated with the tag element
-specified with a tag memref and its indices. The operands include the tag memref
-followed by its indices and the number of elements associated with the DMA being
-waited on. The indices of the tag memref have the same restrictions as
-load/store indices.
-
-Example:
-
-```mlir
-dma_wait %tag[%idx], %size : memref<1 x i32, affine_map<(d0) -> (d0)>, 4>
-```
diff --git a/mlir/docs/Rationale/UsageOfConst.md b/mlir/docs/Rationale/UsageOfConst.md
@@ -200,7 +200,7 @@ for.
 ### The `OpPointer` and `ConstOpPointer` Classes
 
 The "typed operation" classes for registered operations (e.g. like `DimOp` for
-the "std.dim" operation in standard ops) contain a pointer to an operation and
+the "memref.dim" operation in memref ops) contain a pointer to an operation and
 provide typed APIs for processing it.
 
 However, this is a problem for our current `const` design - `const DimOp` means

diff --git a/mlir/docs/Traits.md b/mlir/docs/Traits.md
@@ -211,7 +211,7 @@ are nested inside of other operations that themselves have this trait.
 This trait is carried by region holding operations that define a new scope for
 automatic allocation. Such allocations are automatically freed when control is
 transferred back from the regions of such operations. As an example, allocations
-performed by [`std.alloca`](Dialects/Standard.md#stdalloca-allocaop) are
+performed by [`memref.alloca`](Dialects/MemRef.md#memrefalloca-allocaop) are
 automatically freed when control leaves the region of its closest surrounding op
 that has the trait AutomaticAllocationScope.
 

diff --git a/mlir/docs/Tutorials/Toy/Ch-5.md b/mlir/docs/Tutorials/Toy/Ch-5.md
@@ -50,8 +50,9 @@ framework, we need to provide two things (and an optional third):
 ## Conversion Target
 
 For our purposes, we want to convert the compute-intensive `Toy` operations into
-a combination of operations from the `Affine` `Standard` dialects for further
-optimization. To start off the lowering, we first define our conversion target:
+a combination of operations from the `Affine`, `MemRef` and `Standard` dialects
+for further optimization. To start off the lowering, we first define our
+conversion target:
 
 ```c++
 void ToyToAffineLoweringPass::runOnFunction() {
@@ -61,8 +62,9 @@ void ToyToAffineLoweringPass::runOnFunction() {
 
   // We define the specific operations, or dialects, that are legal targets for
   // this lowering. In our case, we are lowering to a combination of the
-  // `Affine` and `Standard` dialects.
-  target.addLegalDialect<mlir::AffineDialect, mlir::StandardOpsDialect>();
+  // `Affine`, `MemRef` and `Standard` dialects.
+  target.addLegalDialect<mlir::AffineDialect, mlir::memref::MemRefDialect,
+                         mlir::StandardOpsDialect>();
 
   // We also define the Toy dialect as Illegal so that the conversion will fail
   // if any of these operations are *not* converted. Given that we actually want

diff --git a/mlir/examples/toy/Ch5/mlir/LowerToAffineLoops.cpp b/mlir/examples/toy/Ch5/mlir/LowerToAffineLoops.cpp
@@ -7,15 +7,16 @@
 //===----------------------------------------------------------------------===//
 //
 // This file implements a partial lowering of Toy operations to a combination of
-// affine loops and standard operations. This lowering expects that all calls
-// have been inlined, and all shapes have been resolved.
+// affine loops, memref operations and standard operations. This lowering
+// expects that all calls have been inlined, and all shapes have been resolved.
 //
 //===----------------------------------------------------------------------===//
 
 #include "toy/Dialect.h"
 #include "toy/Passes.h"
 
 #include "mlir/Dialect/Affine/IR/AffineOps.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
 #include "mlir/Dialect/StandardOps/IR/Ops.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Transforms/DialectConversion.h"
@@ -36,15 +37,15 @@ static MemRefType convertTensorToMemRef(TensorType type) {
 /// Insert an allocation and deallocation for the given MemRefType.
 static Value insertAllocAndDealloc(MemRefType type, Location loc,
                                    PatternRewriter &rewriter) {
-  auto alloc = rewriter.create<AllocOp>(loc, type);
+  auto alloc = rewriter.create<memref::AllocOp>(loc, type);
 
   // Make sure to allocate at the beginning of the block.
   auto *parentBlock = alloc->getBlock();
   alloc->moveBefore(&parentBlock->front());
 
   // Make sure to deallocate this alloc at the end of the block. This is fine
   // as toy functions have no control flow.
-  auto dealloc = rewriter.create<DeallocOp>(loc, alloc);
+  auto dealloc = rewriter.create<memref::DeallocOp>(loc, alloc);
   dealloc->moveBefore(&parentBlock->back());
   return alloc;
 }
@@ -152,8 +153,8 @@ struct ConstantOpLowering : public OpRewritePattern<toy::ConstantOp> {
 
     if (!valueShape.empty()) {
       for (auto i : llvm::seq<int64_t>(
-              0, *std::max_element(valueShape.begin(), valueShape.end())))
-       constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, i));
+               0, *std::max_element(valueShape.begin(), valueShape.end())))
+        constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, i));
     } else {
       // This is the case of a tensor of rank 0.
       constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, 0));
@@ -257,7 +258,7 @@ namespace {
 struct ToyToAffineLoweringPass
     : public PassWrapper<ToyToAffineLoweringPass, FunctionPass> {
   void getDependentDialects(DialectRegistry &registry) const override {
-    registry.insert<AffineDialect, StandardOpsDialect>();
+    registry.insert<AffineDialect, memref::MemRefDialect, StandardOpsDialect>();
   }
   void runOnFunction() final;
 };
@@ -283,8 +284,9 @@ void ToyToAffineLoweringPass::runOnFunction() {
 
   // We define the specific operations, or dialects, that are legal targets for
   // this lowering. In our case, we are lowering to a combination of the
-  // `Affine` and `Standard` dialects.
-  target.addLegalDialect<AffineDialect, StandardOpsDialect>();
+  // `Affine`, `MemRef` and `Standard` dialects.
+  target.addLegalDialect<AffineDialect, memref::MemRefDialect,
+                         StandardOpsDialect>();
 
   // We also define the Toy dialect as Illegal so that the conversion will fail
   // if any of these operations are *not* converted. Given that we actually want

diff --git a/mlir/examples/toy/Ch6/mlir/LowerToAffineLoops.cpp b/mlir/examples/toy/Ch6/mlir/LowerToAffineLoops.cpp
@@ -7,15 +7,16 @@
 //===----------------------------------------------------------------------===//
 //
 // This file implements a partial lowering of Toy operations to a combination of
-// affine loops and standard operations. This lowering expects that all calls
-// have been inlined, and all shapes have been resolved.
+// affine loops, memref operations and standard operations. This lowering
+// expects that all calls have been inlined, and all shapes have been resolved.
 //
 //===----------------------------------------------------------------------===//
 
 #include "toy/Dialect.h"
 #include "toy/Passes.h"
 
 #include "mlir/Dialect/Affine/IR/AffineOps.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
 #include "mlir/Dialect/StandardOps/IR/Ops.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Transforms/DialectConversion.h"
@@ -36,15 +37,15 @@ static MemRefType convertTensorToMemRef(TensorType type) {
 /// Insert an allocation and deallocation for the given MemRefType.
 static Value insertAllocAndDealloc(MemRefType type, Location loc,
                                    PatternRewriter &rewriter) {
-  auto alloc = rewriter.create<AllocOp>(loc, type);
+  auto alloc = rewriter.create<memref::AllocOp>(loc, type);
 
   // Make sure to allocate at the beginning of the block.
   auto *parentBlock = alloc->getBlock();
   alloc->moveBefore(&parentBlock->front());
 
   // Make sure to deallocate this alloc at the end of the block. This is fine
   // as toy functions have no control flow.
-  auto dealloc = rewriter.create<DeallocOp>(loc, alloc);
+  auto dealloc = rewriter.create<memref::DeallocOp>(loc, alloc);
   dealloc->moveBefore(&parentBlock->back());
   return alloc;
 }
@@ -152,8 +153,8 @@ struct ConstantOpLowering : public OpRewritePattern<toy::ConstantOp> {
 
     if (!valueShape.empty()) {
       for (auto i : llvm::seq<int64_t>(
-              0, *std::max_element(valueShape.begin(), valueShape.end())))
-       constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, i));
+               0, *std::max_element(valueShape.begin(), valueShape.end())))
+        constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, i));
     } else {
       // This is the case of a tensor of rank 0.
       constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, 0));
@@ -256,7 +257,7 @@ namespace {
 struct ToyToAffineLoweringPass
     : public PassWrapper<ToyToAffineLoweringPass, FunctionPass> {
   void getDependentDialects(DialectRegistry &registry) const override {
-    registry.insert<AffineDialect, StandardOpsDialect>();
+    registry.insert<AffineDialect, memref::MemRefDialect, StandardOpsDialect>();
   }
   void runOnFunction() final;
 };
@@ -282,8 +283,9 @@ void ToyToAffineLoweringPass::runOnFunction() {
 
   // We define the specific operations, or dialects, that are legal targets for
   // this lowering. In our case, we are lowering to a combination of the
-  // `Affine` and `Standard` dialects.
-  target.addLegalDialect<AffineDialect, StandardOpsDialect>();
+  // `Affine`, `MemRef` and `Standard` dialects.
+  target.addLegalDialect<AffineDialect, memref::MemRefDialect,
+                         StandardOpsDialect>();
 
   // We also define the Toy dialect as Illegal so that the conversion will fail
   // if any of these operations are *not* converted. Given that we actually want