-
Notifications
You must be signed in to change notification settings - Fork 100
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
A manifesto RFC that outlines a potential path forward for adding sparsity support to StableHLO. No changes to StableHLO are proposed at this point - this will be done in subsequent RFCs.
- Loading branch information
Showing
1 changed file
with
373 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,373 @@ | ||
|
||
# Stable HLO Sparsity RFC | ||
|
||
## Motivation | ||
|
||
The [StableHLO](https://github.com/openxla/stablehlo) operation set is | ||
intended as a compiler IR for ML computations. The set has been | ||
originally bootstrapped from the | ||
[MHLO dialect](https://github.com/tensorflow/mlir-hlo#meta-hlo-dialect-mhlo) | ||
but added functionality such as [de]serialization and versioning. | ||
StableHLO is meant to be a portability layer between ML frameworks | ||
and ML compilers. The “stable” in StableHLO refers to the intended | ||
stability of changes to the definition, making this operation set a | ||
suitable target IR for a wide variety of ML frameworks including | ||
TensorFlow, JAX and PyTorch, as well as a suitable source IR for | ||
ML compilers including XLA and IREE. | ||
|
||
Because sparse tensor algebra is expected to become rather ubiquitous | ||
in many ML problems, this RFC proposes adding **sparsity support** | ||
to StableHLO, in particular by introducing a new **sparse tensor type** | ||
that combines transparently with a restricted subset of StableHLO | ||
operations. In this context, transparency implies that a regular, | ||
single StableHLO operation can be made sparse by merely marking one | ||
or several of its operands and outputs with sparse tensor types. | ||
For example, a 2-dim dot operator for a regular, dense matrix | ||
multiplication (`GEMM`), can be made sparse by marking one (`SpMM`) | ||
or two (`SpMSpM`) operands as sparse, together with a specification | ||
of the desired sparse storage scheme for each. With this novel | ||
approach of **treating sparsity as a type, and not a tedious | ||
explicit implementation detail**, we avoid the typical explosion | ||
of the number of required operations that is inherent to the explicit | ||
approach, where we need to support a matrix multiplication for each | ||
combination of sparse operands and outputs, including the exact | ||
sparse storage scheme (such as separate `SpMM` operations for `COO`, | ||
`CSR`, `CSC`, `DCSR`, etc. and separate `SpMSpM` operations for even | ||
more combinations of inputs and outputs). Also, in this initial | ||
proposal, “sparsity” exclusively refers to tensors with many zeros | ||
and the obvious operations that can take advantage of these (such as | ||
`x+0=x` and `x*0=0`). In principle, however, the proposal can be | ||
extended to deal with other forms of frequently occurring values | ||
(like ones, Inf, NaNs) and semi-ring algebras over these values that | ||
can be exploited to reduce storage requirements and/or computational | ||
requirements of solving ML problems. | ||
|
||
The concept of treating sparsity as a property and not as a tedious | ||
implementation detail has been studied extensively in academia | ||
(see [MT1](https://www.aartbik.com/sparse.php) and | ||
[TACO](http://tensor-compiler.org/) as two prominent example | ||
projects). In this new paradigm for developing sparse code, | ||
developers define a computation in a completely sparsity-agnostic | ||
(viz. “dense”) way, and merely annotate some tensor types as sparse. | ||
After this, the compiler automatically maps the sparsity-agnostic | ||
computation to possibly many sparsity-aware implementations, each | ||
tailored to specific sparsity properties of the involved tensors. | ||
This RFC is meant to provide an extension to StableHLO that provides | ||
an IR that fits this new paradigm well. ML frameworks that follow | ||
this new paradigm, such as Sparse JAX, can use this IR as their | ||
target, while ML compilers that support the new paradigm, such as | ||
MLIR, can use this IR as their source from which actual sparse code | ||
can be generated. Providing a standardized way to represent sparsity | ||
in the IR will hopefully contribute to providing an ecosystem in which | ||
the new paradigm of sparse compilation can thrive. | ||
|
||
The MLIR compiler infrastructure added sparse tensor types by means | ||
of a new | ||
[Sparse Tensor Dialect](https://mlir.llvm.org/docs/Dialects/SparseTensorOps/) | ||
(for more details on its design, see this | ||
[ACM publication](https://dl.acm.org/doi/10.1145/3544559)). | ||
Previous experience with converting sparse extension to JAX to MHLO | ||
with sparse tensor types demonstrated the ease in which the concept | ||
of treating sparsity as a property can propagate all the way from | ||
a source framework with sparse extensions into a target IR that uses | ||
sparse tensor types, and then ultimately down to an actual sparse | ||
compilation pass that makes sparsity concrete in the IR before | ||
handing it off to a backend compiler such as LLVM for execution on | ||
a target platform. The experience obtained with adding sparse tensor | ||
types to MHLO gave rise to proposing similar support to StableHLO, | ||
as formulated in this RFC. | ||
|
||
This RFC proposed two new concepts for StableHLO: | ||
|
||
1. Sparse tensor types (in addition to regular “dense” tensor types) | ||
2. StableHLO operations that support sparse tensor types | ||
|
||
## Sparse Tensor Types | ||
|
||
Tensor types are the cornerstone of the StableHLO type system. These | ||
types are well-defined in the existing specification. For example, | ||
a three-dimensional immutable array with static dimensions of sizes | ||
`10`, `20`, and `30`, together with double-precision elements is | ||
represented by the following type (for more details, | ||
see the original specification). | ||
|
||
tensor<10x20x30xf64> | ||
|
||
The extension is inspired by the TACO formalization of sparse tensors | ||
that was also adopted in MLIR for sparse tensor types. Below we use | ||
**dimension** to refer to the axes of the semantic tensor, and | ||
**level** to refer to the axes of the actual storage scheme. | ||
We propose to extend the regular tensor types of StableHLO | ||
with a new encoding field that allows for providing: | ||
|
||
1. An ordered sequence of level specifications, each of which includes: | ||
1. a required **level-type**, which defines *how* the level is stored, | ||
including: | ||
1. a required **level-format** | ||
2. a collection of **level-properties** | ||
2. a **level-expression**, which defines *what* is stored. | ||
2. An ordered sequence of dimension specifications, each of which includes: | ||
1. the **dimension-size**; however, that’s already given by the tensor’s | ||
dimension-shape | ||
2. an optional **dimension-expression** | ||
|
||
Different level-formats may have different collections of level-properties, | ||
and whether these properties are optional vs required also depends on the | ||
level-format; hence why we group the level-format and level-properties | ||
together as a level-type. Whether the level-expression is optional vs | ||
required may also depend on the level-type. For the three level-formats | ||
discussed below, both the level-properties and level-expressions are | ||
optional. (Whereas the level-format for supporting `ITPACK/ELLPACK` | ||
has no properties and the level-expression is required.) | ||
|
||
For the three level-types discussed below, each level-expression is an | ||
affine expression over dimension-variables. Thus, the level-expressions | ||
collectively define an affine map from dimension-coordinates to | ||
level-coordinates. And the dimension-expressions collectively define the | ||
inverse map— which only needs to be provided for elaborate cases | ||
where it cannot be inferred automatically. However, we avoid | ||
*defining* the level-/dimension-expressions as affine maps, because | ||
other level-types may require different sorts of expressions. | ||
(One known example is for supporting `ITPACK/ELLPACK`.) | ||
|
||
The supported level-formats should consist of at least the following: | ||
|
||
* **dense** : level is dense, all entries along the level are stored, | ||
* **compressed** : level is sparse, only nonzeros along the level are stored, | ||
* **singleton** : a variant of the compressed format, | ||
for when coordinates have no siblings. | ||
|
||
The specification should be easy to extend for adding new level-formats. | ||
For example, a parameterized type **compressed(2:4)** could be used to | ||
define 2-out-of-4 structured sparsity that is well-supported by some | ||
vendors (viz. NVidia GPUs). These three level-formats have two properties: | ||
**non/unique** (are duplicates allowed at that level), and | ||
**un/ordered** (do coordinates appear sorted at that level). Both of | ||
which are optional, with the defaults being unique and ordered; but | ||
the encoding should allow specifying the non-unique and unordered | ||
settings. However, the dense format must always be ordered and | ||
unique; so the encoding should also be able to prohibit users from | ||
specifying invalid level-types like “unordered dense”. | ||
|
||
The encoding outlined above allows for a formal specification of most | ||
common sparse storage schemes (e.g., `COO`, `CSR`, `CSC`, `DCSR`, | ||
`DCSC`, `CSF`, `BCSR`, and `ELL`). Other extensions to the | ||
encoding, such as found in the paper by | ||
[Chou et al.](http://tensor-compiler.org/publications.html) | ||
are open to debate as well. | ||
|
||
The exact syntax for the encoding is still open for debate | ||
(it will be finalized and formalized in the final spec). | ||
We will give a concrete proposal as a separate RFC, since | ||
doing so gets into some technical details that are independent | ||
of the rest of the current RFC. Regardless of the particular | ||
syntax used for the encoding, the essence of the proposal | ||
is that a regular tensor type such as the following 10×20 | ||
matrix type with 32-bit floating-point elements | ||
|
||
tensor<10x20xf32> | ||
|
||
is made sparse by adding a single field | ||
|
||
tensor<10x20xf32, #CSR> | ||
|
||
where the field is defined elsewhere to specify the encoding parts | ||
(1)-(2) described above. The `CSR` storage scheme is defined by | ||
the following encoding, where each dimension is mapped to a level | ||
in the same order: | ||
|
||
#CSR = #sparse_tensor.encoding< | ||
(i, j) -> (i : dense, j : compressed) | ||
> | ||
|
||
The `CSC` storage scheme is very similar, but the mapping from | ||
dimensions to levels is a permutation: | ||
|
||
#CSC = #sparse_tensor.encoding< | ||
(i, j) -> (j : dense, i : compressed) | ||
> | ||
|
||
Block sparsity with 2×3 blocks is expressed as follows: | ||
|
||
#BCSR = #sparse_tensor.encoding< | ||
(i, j) -> | ||
( i floordiv 2 : compressed | ||
, j floordiv 3 : compressed | ||
, i mod 2 : dense | ||
, j mod 3 : dense | ||
) | ||
> | ||
|
||
For the block sparsity example, the dimension-expressions can | ||
be inferred automatically. But here’s what it would look like | ||
to specify them explicitly (just to give an idea of the syntax): | ||
|
||
#BCSR = #sparse_tensor.encoding< | ||
( i = ib * 3 + ii | ||
, j = jb * 2 + jj | ||
) -> | ||
( ib = i floordiv 2 : compressed | ||
, jb = j floordiv 3 : compressed | ||
, ii = i mod 2 : dense | ||
, jj = j mod 3 : dense | ||
) | ||
> | ||
|
||
Combining 1×4 blocks together with the parameterized compressed level type | ||
alluded to above would provide direct support for expressing a computation | ||
using NVidia 2:4 sparsity acceleration. | ||
|
||
As said, the final syntax will be formalized using feedback on this RFC and | ||
the separate RFC for concretely specifying the syntax used above. Likewise, | ||
the exact way of encoding the sparsity information inside the tensor type | ||
will be formalized after considering other ongoing RFCs that extend the | ||
existing tensor type (e.g., with dynamism and quantization). | ||
|
||
## StableHLO Ops Support of Sparse Tensor Types | ||
|
||
The following two considerations need to be addressed when sparse | ||
tensor types are added to the StableHLO operations: | ||
|
||
1. what operations support sparse tensor types (all ops vs | ||
a restricted set of ops), and | ||
2. what additions are required to the type system for sparse | ||
tensor types (in particular, type inference vs explicitly | ||
defining all types while building the IR for the ops). | ||
|
||
As for consideration (1), the most ambitious extension would be to | ||
allow sparse tensor types for all operations and all operands and | ||
outputs where regular tensor types are allowed. Such an | ||
"**open world**" sparsity extension would, in theory, allow | ||
arbitrary programs written in a language such as JAX or NumPy | ||
to be converted into an equivalent sparse program by merely | ||
changing some properties on the tensor (viz. array) types | ||
of that language. However, generating efficient sparse code | ||
for such an “open world” system is generally an unsolved problem that | ||
is still under active research in academia. More realistically, | ||
in a “**closed world**” sparsity extension, sparse tensor types | ||
are *restricted to a subset of operations* that can be handled | ||
efficiently by a downstream “sparse compilation” pass, | ||
i.e., with some basic guarantees of being able to generate a | ||
well performing sparse version of the input code. | ||
|
||
Although we are open to debate an "open world" extension, this RFC initially | ||
proposes a "closed world" extension where sparse tensor types can only be | ||
applied to the following operations (the current set is based on our | ||
experiences with sparse JAX and MHLO; adding more operations already | ||
in the initial spec is also open for debate; for the "closed world" | ||
approach, the set of supported operations should still be easily | ||
extensible as the need arises). | ||
|
||
1. The common arithmetic binary and unary “element-wise” operations: | ||
`add`, `subtract`, `multiply`, `negate` | ||
2. Zero preserving unary math “element-wise” operations: | ||
`abs`, `exponential_minus_one`, `log_plus_one`, | ||
`sign`, `sine`, `sqrt`, `tanh`, `ceil`, `floor`, | ||
`bessel_i1e`, `real`, `imag` | ||
3. General dot products and reductions: | ||
`dot`, `dot_general`, `reduce` | ||
4. Reshaping operations: | ||
`reshape`, `broadcast_in_dim`, `concatenate`, `squeeze`, `transpose` | ||
5. Conversions (between dense/sparse tensors and between different | ||
sparsity encodings): `convert` | ||
|
||
As for consideration (2), we could come up with type inference rules | ||
for all operations that, given the types of the input operands, | ||
define the types of the result. However, for sparsity, such type | ||
inference rules would always be heuristic in nature. For example, | ||
when adding two sparse tensors element-wise, should the result be | ||
dense (when the two nonzero patterns pretty much cover the complete | ||
coordinate space) or sparse (when the union of the two nonzero patterns | ||
still remains sparse). Either heuristic would fail for a certain | ||
class of sparse tensors. Furthermore, it would introduce difficult | ||
choices for the other components of sparsity. For example, in the | ||
following element-wise addition of a row-wise and column-wise | ||
sparse matrix, it is unclear what ordering should be inferred | ||
on the output sparse matrix. | ||
|
||
%result = stablehlo.add %matrix1, %matrix2 | ||
: tensor<10x10xf32, #CSR>, | ||
tensor<10x10xf32, #CSC> -> tensor<10x10xf32, #???> | ||
|
||
Therefore, this RFC proposes that “builders” of the StableHLO IR with | ||
sparse tensor types must explicitly define sparsity types of inputs | ||
and outputs (which allows language implementations with their own | ||
type inference as well as language implementations that require the | ||
programmer to define sparsity of results explicitly). | ||
|
||
Alternatively, the spec could provide “builders” that accept an explicit | ||
output type or otherwise fall back to the (heuristic) inference alluded | ||
to above. As yet another alternative, we can relax this proposal with | ||
“builders” that define the sparsity of input types only with output types | ||
that are always dense (but, of course, following the standard inference | ||
rules for all other components of the output tensor types, such as shape | ||
and element type). A ML framework can then still force the desired | ||
output sparsity as follows | ||
|
||
x_dense_inferred = a_sparse1 OP b_sparse2 | ||
x_sparse_desired = convert x_dense_inferred | ||
|
||
Which is subsequently “cleaned up” by the ML compiler that accepts | ||
this IR in an early pass. | ||
|
||
x_sparse_desired = a_sparse1 OP b_sparse2 | ||
|
||
For all operations, adding sparse types to the operation changes the | ||
dense semantics into sparse semantics, as illustrated below | ||
for element-wise addition, where the IR represents adding a | ||
sparse matrix in CSR format to a sparse matrix into DCSR format | ||
and storing the result into a new matrix in CSR format. | ||
|
||
%0 = stablehlo.add %arg0, %arg1 | ||
: (tensor<10x20xf32, #CSR>, | ||
tensor<10x20xf32, #DCSR>) -> tensor<10x20xf32, #CSR> | ||
|
||
The sparse semantics imply that the generated code ultimately | ||
can **take advantage of the zeros by relying on common | ||
mathematical laws that allow skipping operations** | ||
(such as `x+0=x` and `x*0=0`) but **ignoring strict | ||
IREE requirements that would prevent such savings** | ||
(like `0*NaN=NaN` and `0*Inf=NaN`). Furthermore, the IR | ||
is not necessarily concerned with the efficiency of any of | ||
the expressed operations. For example, in the following, a | ||
row-wise stored sparse matrix is added to a column-wise | ||
sparse matrix. | ||
|
||
%0 = stablehlo.add %arg0, %arg1 | ||
: (tensor<10x20xf32, #CSC>, | ||
tensor<10x20xf32, #CSR>) -> tensor<10x20xf32, #CSR> | ||
|
||
The sparse compiler will eventually be responsible for generating | ||
efficient code, possibly by inserting sparse storage scheme conversions | ||
into the code. Nevertheless, in a healthy sparse ecosystem, | ||
ML frameworks are expected to generate IR that can ultimately map | ||
to efficient sparse code with the current state-of-the-art | ||
sparse compilation technology. | ||
|
||
The conversion operator can be used for all sorts of conversions | ||
(dense to sparse, sparse to sparse, and sparse to sparse for | ||
storage scheme transformation). An example of converting a dense | ||
tensor into a sparse tensor is shown below. In such a conversion, | ||
only the nonzeros found in the dense tensor are transferred into | ||
the sparse tensor stored in fiber format. | ||
|
||
%st = stablehlo.convert %dt | ||
: (tensor<?x?x?xf64>) -> tensor<?x?x?xf64, #CSF> | ||
|
||
Conversely, converting a sparse vector to a dense vector can be | ||
represented as follows. As expected, the dense vector will receive all | ||
explicitly stored values from the sparse vector, while all other values | ||
are set to zero. | ||
|
||
%dv = stablehlo.convert %sv | ||
: (tensor<100xi32, #SparseVec>) -> tensor<100xi32> | ||
|
||
Finally, a storage scheme transformation transfers one sparse tensor | ||
into another sparse tensor, as illustrated below by converting CSR into CSC. | ||
|
||
%x = stablehlo.convert %y | ||
: (tensor<10x10xf64, #CSR>) -> tensor<10x10xf64, #CSC> | ||
|
||
Once this RFC has been debated and approved, the final specification will | ||
precisely define all StableHLO operations that can accept sparse tensor | ||
types as operands and outputs, together with the semantics of such operations. |