Skip to content

Commit

Permalink
[MLIR] Create memref dialect and move dialect-specific ops from std.
Browse files Browse the repository at this point in the history
Create the memref dialect and move dialect-specific ops
from std dialect to this dialect.

Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
AssumeAlignmentOp -> MemRef_AssumeAlignmentOp
DeallocOp -> MemRef_DeallocOp
DimOp -> MemRef_DimOp
MemRefCastOp -> MemRef_CastOp
MemRefReinterpretCastOp -> MemRef_ReinterpretCastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
LoadOp -> MemRef_LoadOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
SubViewOp -> MemRef_SubViewOp
TransposeOp -> MemRef_TransposeOp
TensorLoadOp -> MemRef_TensorLoadOp
TensorStoreOp -> MemRef_TensorStoreOp
TensorToMemRefOp -> MemRef_BufferCastOp
ViewOp -> MemRef_ViewOp

The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667

Differential Revision: https://reviews.llvm.org/D98041
  • Loading branch information
dfki-jugr committed Mar 15, 2021
1 parent a883714 commit e231070
Show file tree
Hide file tree
Showing 367 changed files with 10,173 additions and 9,642 deletions.
10 changes: 5 additions & 5 deletions mlir/docs/BufferDeallocationInternals.md
Expand Up @@ -779,8 +779,8 @@ the deallocation of the source value.
## Known Limitations

BufferDeallocation introduces additional copies using allocations from the
std” dialect (“std.alloc”). Analogous, all deallocations use the “std”
dialect-free operation “std.dealloc”. The actual copy process is realized using
“linalg.copy”. Furthermore, buffers are essentially immutable after their
creation in a block. Another limitations are known in the case using
unstructered control flow.
memref” dialect (“memref.alloc”). Analogous, all deallocations use the
“memref” dialect-free operation “memref.dealloc”. The actual copy process is
realized using “linalg.copy”. Furthermore, buffers are essentially immutable
after their creation in a block. Another limitations are known in the case
using unstructered control flow.
8 changes: 4 additions & 4 deletions mlir/docs/Bufferization.md
Expand Up @@ -190,8 +190,8 @@ One convenient utility provided by the MLIR bufferization infrastructure is the
`BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions
and materializations between `tensor` and `memref`.

In this case, the `StandardOpsDialect` is marked as legal, so the `tensor_load`
and `tensor_to_memref` ops, which are inserted automatically by the dialect
In this case, the `MemRefOpsDialect` is marked as legal, so the `tensor_load`
and `buffer_cast` ops, which are inserted automatically by the dialect
conversion framework as materializations, are legal. There is a helper
`populateBufferizeMaterializationLegality`
([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53))
Expand Down Expand Up @@ -247,7 +247,7 @@ from the program.

The easiest way to write a finalizing bufferize pass is to not write one at all!
MLIR provides a pass `finalizing-bufferize` which eliminates the `tensor_load` /
`tensor_to_memref` materialization ops inserted by partial bufferization passes
`buffer_cast` materialization ops inserted by partial bufferization passes
and emits an error if that is not sufficient to remove all tensors from the
program.

Expand All @@ -268,7 +268,7 @@ recommended in new code. A helper,
`populateEliminateBufferizeMaterializationsPatterns`
([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58))
is available for such passes to provide patterns that eliminate `tensor_load`
and `tensor_to_memref`.
and `buffer_cast`.

## Changes since [the talk](#the-talk)

Expand Down
10 changes: 5 additions & 5 deletions mlir/docs/Dialects/Linalg.md
Expand Up @@ -406,9 +406,9 @@ into a form that will resemble:
#map0 = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>
func @example(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
%0 = memref_cast %arg0 : memref<?x?xf32> to memref<?x?xf32, #map0>
%1 = memref_cast %arg1 : memref<?x?xf32> to memref<?x?xf32, #map0>
%2 = memref_cast %arg2 : memref<?x?xf32> to memref<?x?xf32, #map0>
%0 = memref.cast %arg0 : memref<?x?xf32> to memref<?x?xf32, #map0>
%1 = memref.cast %arg1 : memref<?x?xf32> to memref<?x?xf32, #map0>
%2 = memref.cast %arg2 : memref<?x?xf32> to memref<?x?xf32, #map0>
call @pointwise_add(%0, %1, %2) : (memref<?x?xf32, #map0>, memref<?x?xf32, #map0>, memref<?x?xf32, #map0>) -> ()
return
}
Expand Down Expand Up @@ -518,9 +518,9 @@ A set of ops that manipulate metadata but do not move memory. These ops take
generally alias the operand `view`. At the moment the existing ops are:

```
* `std.view`,
* `memref.view`,
* `std.subview`,
* `std.transpose`.
* `memref.transpose`.
* `linalg.range`,
* `linalg.slice`,
* `linalg.reshape`,
Expand Down
76 changes: 76 additions & 0 deletions mlir/docs/Dialects/MemRef.md
@@ -0,0 +1,76 @@
# 'memref' Dialect

This dialect provides documentation for operations within the MemRef dialect.

**Please post an RFC on the [forum](https://llvm.discourse.group/c/mlir/31)
before adding or changing any operation in this dialect.**

[TOC]

## Operations

[include "Dialects/MemRefOps.md"]

### 'dma_start' operation

Syntax:

```
operation ::= `dma_start` ssa-use`[`ssa-use-list`]` `,`
ssa-use`[`ssa-use-list`]` `,` ssa-use `,`
ssa-use`[`ssa-use-list`]` (`,` ssa-use `,` ssa-use)?
`:` memref-type `,` memref-type `,` memref-type
```

Starts a non-blocking DMA operation that transfers data from a source memref to
a destination memref. The operands include the source and destination memref's
each followed by its indices, size of the data transfer in terms of the number
of elements (of the elemental type of the memref), a tag memref with its
indices, and optionally two additional arguments corresponding to the stride (in
terms of number of elements) and the number of elements to transfer per stride.
The tag location is used by a dma_wait operation to check for completion. The
indices of the source memref, destination memref, and the tag memref have the
same restrictions as any load/store operation in an affine context (whenever DMA
operations appear in an affine context). See
[restrictions on dimensions and symbols](Affine.md#restrictions-on-dimensions-and-symbols)
in affine contexts. This allows powerful static analysis and transformations in
the presence of such DMAs including rescheduling, pipelining / overlap with
computation, and checking for matching start/end operations. The source and
destination memref need not be of the same dimensionality, but need to have the
same elemental type.

For example, a `dma_start` operation that transfers 32 vector elements from a
memref `%src` at location `[%i, %j]` to memref `%dst` at `[%k, %l]` would be
specified as shown below.

Example:

```mlir
%size = constant 32 : index
%tag = alloc() : memref<1 x i32, affine_map<(d0) -> (d0)>, 4>
%idx = constant 0 : index
dma_start %src[%i, %j], %dst[%k, %l], %size, %tag[%idx] :
memref<40 x 8 x vector<16xf32>, affine_map<(d0, d1) -> (d0, d1)>, 0>,
memref<2 x 4 x vector<16xf32>, affine_map<(d0, d1) -> (d0, d1)>, 2>,
memref<1 x i32>, affine_map<(d0) -> (d0)>, 4>
```

### 'dma_wait' operation

Syntax:

```
operation ::= `dma_wait` ssa-use`[`ssa-use-list`]` `,` ssa-use `:` memref-type
```

Blocks until the completion of a DMA operation associated with the tag element
specified with a tag memref and its indices. The operands include the tag memref
followed by its indices and the number of elements associated with the DMA being
waited on. The indices of the tag memref have the same restrictions as
load/store indices.

Example:

```mlir
dma_wait %tag[%idx], %size : memref<1 x i32, affine_map<(d0) -> (d0)>, 4>
```
64 changes: 0 additions & 64 deletions mlir/docs/Dialects/Standard.md
Expand Up @@ -13,67 +13,3 @@ before adding or changing any operation in this dialect.**
## Operations

[include "Dialects/StandardOps.md"]

### 'dma_start' operation

Syntax:

```
operation ::= `dma_start` ssa-use`[`ssa-use-list`]` `,`
ssa-use`[`ssa-use-list`]` `,` ssa-use `,`
ssa-use`[`ssa-use-list`]` (`,` ssa-use `,` ssa-use)?
`:` memref-type `,` memref-type `,` memref-type
```

Starts a non-blocking DMA operation that transfers data from a source memref to
a destination memref. The operands include the source and destination memref's
each followed by its indices, size of the data transfer in terms of the number
of elements (of the elemental type of the memref), a tag memref with its
indices, and optionally two additional arguments corresponding to the stride (in
terms of number of elements) and the number of elements to transfer per stride.
The tag location is used by a dma_wait operation to check for completion. The
indices of the source memref, destination memref, and the tag memref have the
same restrictions as any load/store operation in an affine context (whenever DMA
operations appear in an affine context). See
[restrictions on dimensions and symbols](Affine.md#restrictions-on-dimensions-and-symbols)
in affine contexts. This allows powerful static analysis and transformations in
the presence of such DMAs including rescheduling, pipelining / overlap with
computation, and checking for matching start/end operations. The source and
destination memref need not be of the same dimensionality, but need to have the
same elemental type.

For example, a `dma_start` operation that transfers 32 vector elements from a
memref `%src` at location `[%i, %j]` to memref `%dst` at `[%k, %l]` would be
specified as shown below.

Example:

```mlir
%size = constant 32 : index
%tag = alloc() : memref<1 x i32, affine_map<(d0) -> (d0)>, 4>
%idx = constant 0 : index
dma_start %src[%i, %j], %dst[%k, %l], %size, %tag[%idx] :
memref<40 x 8 x vector<16xf32>, affine_map<(d0, d1) -> (d0, d1)>, 0>,
memref<2 x 4 x vector<16xf32>, affine_map<(d0, d1) -> (d0, d1)>, 2>,
memref<1 x i32>, affine_map<(d0) -> (d0)>, 4>
```

### 'dma_wait' operation

Syntax:

```
operation ::= `dma_wait` ssa-use`[`ssa-use-list`]` `,` ssa-use `:` memref-type
```

Blocks until the completion of a DMA operation associated with the tag element
specified with a tag memref and its indices. The operands include the tag memref
followed by its indices and the number of elements associated with the DMA being
waited on. The indices of the tag memref have the same restrictions as
load/store indices.

Example:

```mlir
dma_wait %tag[%idx], %size : memref<1 x i32, affine_map<(d0) -> (d0)>, 4>
```
2 changes: 1 addition & 1 deletion mlir/docs/Rationale/UsageOfConst.md
Expand Up @@ -200,7 +200,7 @@ for.
### The `OpPointer` and `ConstOpPointer` Classes

The "typed operation" classes for registered operations (e.g. like `DimOp` for
the "std.dim" operation in standard ops) contain a pointer to an operation and
the "memref.dim" operation in memref ops) contain a pointer to an operation and
provide typed APIs for processing it.

However, this is a problem for our current `const` design - `const DimOp` means
Expand Down
2 changes: 1 addition & 1 deletion mlir/docs/Traits.md
Expand Up @@ -211,7 +211,7 @@ are nested inside of other operations that themselves have this trait.
This trait is carried by region holding operations that define a new scope for
automatic allocation. Such allocations are automatically freed when control is
transferred back from the regions of such operations. As an example, allocations
performed by [`std.alloca`](Dialects/Standard.md#stdalloca-allocaop) are
performed by [`memref.alloca`](Dialects/MemRef.md#memrefalloca-allocaop) are
automatically freed when control leaves the region of its closest surrounding op
that has the trait AutomaticAllocationScope.

Expand Down
10 changes: 6 additions & 4 deletions mlir/docs/Tutorials/Toy/Ch-5.md
Expand Up @@ -50,8 +50,9 @@ framework, we need to provide two things (and an optional third):
## Conversion Target

For our purposes, we want to convert the compute-intensive `Toy` operations into
a combination of operations from the `Affine` `Standard` dialects for further
optimization. To start off the lowering, we first define our conversion target:
a combination of operations from the `Affine`, `MemRef` and `Standard` dialects
for further optimization. To start off the lowering, we first define our
conversion target:

```c++
void ToyToAffineLoweringPass::runOnFunction() {
Expand All @@ -61,8 +62,9 @@ void ToyToAffineLoweringPass::runOnFunction() {

// We define the specific operations, or dialects, that are legal targets for
// this lowering. In our case, we are lowering to a combination of the
// `Affine` and `Standard` dialects.
target.addLegalDialect<mlir::AffineDialect, mlir::StandardOpsDialect>();
// `Affine`, `MemRef` and `Standard` dialects.
target.addLegalDialect<mlir::AffineDialect, mlir::memref::MemRefDialect,
mlir::StandardOpsDialect>();

// We also define the Toy dialect as Illegal so that the conversion will fail
// if any of these operations are *not* converted. Given that we actually want
Expand Down
20 changes: 11 additions & 9 deletions mlir/examples/toy/Ch5/mlir/LowerToAffineLoops.cpp
Expand Up @@ -7,15 +7,16 @@
//===----------------------------------------------------------------------===//
//
// This file implements a partial lowering of Toy operations to a combination of
// affine loops and standard operations. This lowering expects that all calls
// have been inlined, and all shapes have been resolved.
// affine loops, memref operations and standard operations. This lowering
// expects that all calls have been inlined, and all shapes have been resolved.
//
//===----------------------------------------------------------------------===//

#include "toy/Dialect.h"
#include "toy/Passes.h"

#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/Pass/Pass.h"
#include "mlir/Transforms/DialectConversion.h"
Expand All @@ -36,15 +37,15 @@ static MemRefType convertTensorToMemRef(TensorType type) {
/// Insert an allocation and deallocation for the given MemRefType.
static Value insertAllocAndDealloc(MemRefType type, Location loc,
PatternRewriter &rewriter) {
auto alloc = rewriter.create<AllocOp>(loc, type);
auto alloc = rewriter.create<memref::AllocOp>(loc, type);

// Make sure to allocate at the beginning of the block.
auto *parentBlock = alloc->getBlock();
alloc->moveBefore(&parentBlock->front());

// Make sure to deallocate this alloc at the end of the block. This is fine
// as toy functions have no control flow.
auto dealloc = rewriter.create<DeallocOp>(loc, alloc);
auto dealloc = rewriter.create<memref::DeallocOp>(loc, alloc);
dealloc->moveBefore(&parentBlock->back());
return alloc;
}
Expand Down Expand Up @@ -152,8 +153,8 @@ struct ConstantOpLowering : public OpRewritePattern<toy::ConstantOp> {

if (!valueShape.empty()) {
for (auto i : llvm::seq<int64_t>(
0, *std::max_element(valueShape.begin(), valueShape.end())))
constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, i));
0, *std::max_element(valueShape.begin(), valueShape.end())))
constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, i));
} else {
// This is the case of a tensor of rank 0.
constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, 0));
Expand Down Expand Up @@ -257,7 +258,7 @@ namespace {
struct ToyToAffineLoweringPass
: public PassWrapper<ToyToAffineLoweringPass, FunctionPass> {
void getDependentDialects(DialectRegistry &registry) const override {
registry.insert<AffineDialect, StandardOpsDialect>();
registry.insert<AffineDialect, memref::MemRefDialect, StandardOpsDialect>();
}
void runOnFunction() final;
};
Expand All @@ -283,8 +284,9 @@ void ToyToAffineLoweringPass::runOnFunction() {

// We define the specific operations, or dialects, that are legal targets for
// this lowering. In our case, we are lowering to a combination of the
// `Affine` and `Standard` dialects.
target.addLegalDialect<AffineDialect, StandardOpsDialect>();
// `Affine`, `MemRef` and `Standard` dialects.
target.addLegalDialect<AffineDialect, memref::MemRefDialect,
StandardOpsDialect>();

// We also define the Toy dialect as Illegal so that the conversion will fail
// if any of these operations are *not* converted. Given that we actually want
Expand Down
20 changes: 11 additions & 9 deletions mlir/examples/toy/Ch6/mlir/LowerToAffineLoops.cpp
Expand Up @@ -7,15 +7,16 @@
//===----------------------------------------------------------------------===//
//
// This file implements a partial lowering of Toy operations to a combination of
// affine loops and standard operations. This lowering expects that all calls
// have been inlined, and all shapes have been resolved.
// affine loops, memref operations and standard operations. This lowering
// expects that all calls have been inlined, and all shapes have been resolved.
//
//===----------------------------------------------------------------------===//

#include "toy/Dialect.h"
#include "toy/Passes.h"

#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/Pass/Pass.h"
#include "mlir/Transforms/DialectConversion.h"
Expand All @@ -36,15 +37,15 @@ static MemRefType convertTensorToMemRef(TensorType type) {
/// Insert an allocation and deallocation for the given MemRefType.
static Value insertAllocAndDealloc(MemRefType type, Location loc,
PatternRewriter &rewriter) {
auto alloc = rewriter.create<AllocOp>(loc, type);
auto alloc = rewriter.create<memref::AllocOp>(loc, type);

// Make sure to allocate at the beginning of the block.
auto *parentBlock = alloc->getBlock();
alloc->moveBefore(&parentBlock->front());

// Make sure to deallocate this alloc at the end of the block. This is fine
// as toy functions have no control flow.
auto dealloc = rewriter.create<DeallocOp>(loc, alloc);
auto dealloc = rewriter.create<memref::DeallocOp>(loc, alloc);
dealloc->moveBefore(&parentBlock->back());
return alloc;
}
Expand Down Expand Up @@ -152,8 +153,8 @@ struct ConstantOpLowering : public OpRewritePattern<toy::ConstantOp> {

if (!valueShape.empty()) {
for (auto i : llvm::seq<int64_t>(
0, *std::max_element(valueShape.begin(), valueShape.end())))
constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, i));
0, *std::max_element(valueShape.begin(), valueShape.end())))
constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, i));
} else {
// This is the case of a tensor of rank 0.
constantIndices.push_back(rewriter.create<ConstantIndexOp>(loc, 0));
Expand Down Expand Up @@ -256,7 +257,7 @@ namespace {
struct ToyToAffineLoweringPass
: public PassWrapper<ToyToAffineLoweringPass, FunctionPass> {
void getDependentDialects(DialectRegistry &registry) const override {
registry.insert<AffineDialect, StandardOpsDialect>();
registry.insert<AffineDialect, memref::MemRefDialect, StandardOpsDialect>();
}
void runOnFunction() final;
};
Expand All @@ -282,8 +283,9 @@ void ToyToAffineLoweringPass::runOnFunction() {

// We define the specific operations, or dialects, that are legal targets for
// this lowering. In our case, we are lowering to a combination of the
// `Affine` and `Standard` dialects.
target.addLegalDialect<AffineDialect, StandardOpsDialect>();
// `Affine`, `MemRef` and `Standard` dialects.
target.addLegalDialect<AffineDialect, memref::MemRefDialect,
StandardOpsDialect>();

// We also define the Toy dialect as Illegal so that the conversion will fail
// if any of these operations are *not* converted. Given that we actually want
Expand Down

0 comments on commit e231070

Please sign in to comment.