diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md index d9d0751cae8c9..ea3593549ca15 100644 --- a/mlir/docs/Bufferization.md +++ b/mlir/docs/Bufferization.md @@ -101,10 +101,28 @@ bufferization strategy would be unacceptable for high-performance codegen. When choosing an already existing buffer, we must be careful not to accidentally overwrite data that is still needed later in the program. -To simplify this problem, One-Shot Bufferize was designed for ops that are in -*destination-passing style*. For every tensor result, such ops have a tensor -operand, whose buffer could be utilized for storing the result of the op in the -absence of other conflicts. We call such tensor operands the *destination*. +To simplify this problem, One-Shot Bufferize was designed to take advantage of +*destination-passing style*. This form exists in itself independently of +bufferization and is tied to SSA semantics: many ops are “updating” part of +their input SSA variable. For example the LLVM instruction +[`insertelement`](https://llvm.org/docs/LangRef.html#insertelement-instruction) +is inserting an element inside a vector. Since SSA values are immutable, the +operation returns a copy of the input vector with the element inserted. +Another example in MLIR is `linalg.generic`, which always has an extra `outs` +operand which provides the initial values to update (for example when the +operation is doing a reduction). + +This input is referred to as "destination" in the following (quotes are +important are this operand isn't modified in place but copied) and come into +place in the context of bufferization as a possible "anchor" for the +bufferization algorithm. This allows the user to shape the input in a form that +guarantees close to optimal bufferization result when carefully choosing the +SSA value used as "destination". + +For every tensor result, a "destination-passing" style op has a corresponding +tensor operand. If there aren't any other uses of this tensor, the bufferization +can alias it with the op result and perform the operation "in-place" by reusing +the buffer allocated for this "destination" input. As an example, consider the following op: `%0 = tensor.insert %cst into %t[%idx] : tensor` @@ -112,15 +130,16 @@ As an example, consider the following op: `%0 = tensor.insert %cst into `%t` is the destination in this example. When choosing a buffer for the result `%0`, One-Shot Bufferize considers only two options: -1. buffer(`%0`) = buffer(`%t`). -2. buffer(`%0`) is a newly allocated buffer. +1. buffer(`%0`) = buffer(`%t`): alias the destination tensor with the + result and perform the operation in-place. +3. buffer(`%0`) is a newly allocated buffer. There may be other buffers in the same function that could potentially be used for buffer(`%0`), but those are not considered by One-Shot Bufferize to keep the bufferization simple. One-Shot Bufferize could be extended to consider such buffers in the future to achieve a better quality of bufferization. -Tensor ops that are not in destination-passing style always bufferize to a +Tensor ops that are not in destination-passing style always bufferized to a memory allocation. E.g.: ```mlir @@ -159,9 +178,9 @@ slice of a tensor: ``` The above example bufferizes to a `memref.subview`, followed by a -"`linalg.generic` on memrefs" that overwrites the memory of the subview. The -`tensor.insert_slice` bufferizes to a no-op (in the absence of RaW conflicts -such as a subsequent read of `%s`). +"`linalg.generic` on memrefs" that overwrites the memory of the subview, assuming +that the slice `%t` has no other user. The `tensor.insert_slice` then bufferizes +to a no-op (in the absence of RaW conflicts such as a subsequent read of `%s`). RaW conflicts are detected with an analysis of SSA use-def chains (details later). One-Shot Bufferize works best if there is a single SSA use-def chain,