Sparsity RFC (#1143)

A manifesto RFC that outlines a potential path forward for adding sparsity support to StableHLO. No changes to StableHLO are proposed at this point - this will be done in subsequent RFCs.
openxla · Apr 12, 2023 · 5313209 · 5313209
1 parent ad88bc5
commit 5313209
Showing 1 changed file with 373 additions and 0 deletions.
diff --git a/rfcs/20230210-sparsity.md b/rfcs/20230210-sparsity.md
@@ -0,0 +1,373 @@
+
+# Stable HLO Sparsity RFC
+
+## Motivation
+
+The [StableHLO](https://github.com/openxla/stablehlo) operation set is
+intended as a compiler IR for ML computations. The set has been
+originally bootstrapped from the
+[MHLO dialect](https://github.com/tensorflow/mlir-hlo#meta-hlo-dialect-mhlo)
+but added functionality such as [de]serialization and versioning.
+StableHLO is meant to be a portability layer between ML frameworks
+and ML compilers. The “stable” in StableHLO refers to the intended
+stability of changes to the definition, making this operation set a
+suitable target IR for a wide variety of ML frameworks including
+TensorFlow, JAX and PyTorch, as well as a suitable source IR for
+ML compilers including XLA and IREE.
+
+Because sparse tensor algebra is expected to become rather ubiquitous
+in many ML problems, this RFC proposes adding **sparsity support**
+to StableHLO, in particular by introducing a new **sparse tensor type**
+that combines transparently with a restricted subset of StableHLO
+operations. In this context, transparency implies that a regular,
+single StableHLO operation can be made sparse by merely marking one
+or several of its operands and outputs with sparse tensor types.
+For example, a 2-dim dot operator for a regular, dense matrix
+multiplication (`GEMM`), can be made sparse by marking one (`SpMM`)
+or two (`SpMSpM`) operands as sparse, together with a specification
+of the desired sparse storage scheme for each. With this novel
+approach of **treating sparsity as a type, and not a tedious
+explicit implementation detail**, we avoid the typical explosion
+of the number of required operations that is inherent to the explicit
+approach, where we need to support a matrix multiplication for each
+combination of sparse operands and outputs, including the exact
+sparse storage scheme (such as separate `SpMM` operations for `COO`,
+`CSR`, `CSC`, `DCSR`, etc. and separate `SpMSpM` operations for even
+more combinations of inputs and outputs). Also, in this initial
+proposal, “sparsity” exclusively refers to tensors with many zeros
+and the obvious operations that can take advantage of these (such as
+`x+0=x` and `x*0=0`). In principle, however, the proposal can be
+extended to deal with other forms of frequently occurring values
+(like ones, Inf, NaNs) and semi-ring algebras over these values that
+can be exploited to reduce storage requirements and/or computational
+requirements of solving ML problems.
+
+The concept of treating sparsity as a property and not as a tedious
+implementation detail has been studied extensively in academia
+(see [MT1](https://www.aartbik.com/sparse.php) and
+[TACO](http://tensor-compiler.org/) as two prominent example
+projects). In this new paradigm for developing sparse code,
+developers define a computation in a completely sparsity-agnostic
+(viz. “dense”) way, and merely annotate some tensor types as sparse.
+After this, the compiler automatically maps the sparsity-agnostic
+computation to possibly many sparsity-aware implementations, each
+tailored to specific sparsity properties of the involved tensors.
+This RFC is meant to provide an extension to StableHLO that provides
+an IR that fits this new paradigm well. ML frameworks that follow
+this new paradigm, such as Sparse JAX, can use this IR as their
+target, while ML compilers that support the new paradigm, such as
+MLIR, can use this IR as their source from which actual sparse code
+can be generated. Providing a standardized way to represent sparsity
+in the IR will hopefully contribute to providing an ecosystem in which
+the new paradigm of sparse compilation can thrive.
+
+The MLIR compiler infrastructure added sparse tensor types by means
+of a new
+[Sparse Tensor Dialect](https://mlir.llvm.org/docs/Dialects/SparseTensorOps/)
+(for more details on its design, see this
+[ACM publication](https://dl.acm.org/doi/10.1145/3544559)).
+Previous experience with converting sparse extension to JAX to MHLO
+with sparse tensor types demonstrated the ease in which the concept
+of treating sparsity as a property can propagate all the way from
+a source framework with sparse extensions into a target IR that uses
+sparse tensor types, and then ultimately down to an actual sparse
+compilation pass that makes sparsity concrete in the IR before
+handing it off to a backend compiler such as LLVM for execution on
+a target platform. The experience obtained with adding sparse tensor
+types to MHLO gave rise to proposing similar support to StableHLO,
+as formulated in this RFC.
+
+This RFC proposed two new concepts for StableHLO:
+
+1. Sparse tensor types (in addition to regular “dense” tensor types)
+2. StableHLO operations that support sparse tensor types
+
+## Sparse Tensor Types
+
+Tensor types are the cornerstone of the StableHLO type system. These
+types are well-defined in the existing specification. For example,
+a three-dimensional immutable array with static dimensions of sizes
+`10`, `20`, and `30`, together with double-precision elements is
+represented by the following type (for more details,
+see the original specification).
+
+    tensor<10x20x30xf64>
+
+The extension is inspired by the TACO formalization of sparse tensors
+that was also adopted in MLIR for sparse tensor types. Below we use
+**dimension** to refer to the axes of the semantic tensor, and
+**level** to refer to the axes of the actual storage scheme.
+We propose to extend the regular tensor types of StableHLO
+with a new encoding field that allows for providing:
+
+1. An ordered sequence of level specifications, each of which includes:
+   1. a required **level-type**, which defines *how* the level is stored,
+      including:
+      1. a required **level-format**
+      2. a collection of **level-properties**
+   2. a **level-expression**, which defines *what* is stored.
+2. An ordered sequence of dimension specifications, each of which includes:
+   1. the **dimension-size**; however, that’s already given by the tensor’s
+      dimension-shape
+   2. an optional **dimension-expression**
+
+Different level-formats may have different collections of level-properties,
+and whether these properties are optional vs required also depends on the
+level-format; hence why we group the level-format and level-properties
+together as a level-type. Whether the level-expression is optional vs
+required may also depend on the level-type. For the three level-formats
+discussed below, both the level-properties and level-expressions are
+optional. (Whereas the level-format for supporting `ITPACK/ELLPACK`
+has no properties and the level-expression is required.)
+
+For the three level-types discussed below, each level-expression is an
+affine expression over dimension-variables. Thus, the level-expressions
+collectively define an affine map from dimension-coordinates to
+level-coordinates. And the dimension-expressions collectively define the
+inverse map— which only needs to be provided for elaborate cases
+where it cannot be inferred automatically. However, we avoid
+*defining* the level-/dimension-expressions as affine maps, because
+other level-types may require different sorts of expressions.
+(One known example is for supporting `ITPACK/ELLPACK`.)
+
+The supported level-formats should consist of at least the following:
+
+* **dense** : level is dense, all entries along the level are stored,
+* **compressed** : level is sparse, only nonzeros along the level are stored,
+* **singleton** : a variant of the compressed format,
+  for when coordinates have no siblings.
+
+The specification should be easy to extend for adding new level-formats.
+For example, a parameterized type **compressed(2:4)** could be used to
+define 2-out-of-4 structured sparsity that is well-supported by some
+vendors (viz. NVidia GPUs). These three level-formats have two properties:
+**non/unique** (are duplicates allowed at that level), and
+**un/ordered** (do coordinates appear sorted at that level). Both of
+which are optional, with the defaults being unique and ordered; but
+the encoding should allow specifying the non-unique and unordered
+settings. However, the dense format must always be ordered and
+unique; so the encoding should also be able to prohibit users from
+specifying invalid level-types like “unordered dense”.
+
+The encoding outlined above allows for a formal specification of most
+common sparse storage schemes (e.g., `COO`, `CSR`, `CSC`, `DCSR`,
+`DCSC`, `CSF`, `BCSR`, and `ELL`). Other extensions to the
+encoding, such as found in the paper by
+[Chou et al.](http://tensor-compiler.org/publications.html)
+are open to debate as well.
+
+The exact syntax for the encoding is still open for debate
+(it will be finalized and formalized in the final spec).
+We will give a concrete proposal as a separate RFC, since
+doing so gets into some technical details that are independent
+of the rest of the current RFC. Regardless of the particular
+syntax used for the encoding, the essence of the proposal
+is that a regular tensor type such as the following 10×20
+matrix type with 32-bit floating-point elements
+
+    tensor<10x20xf32>
+
+is made sparse by adding a single field
+
+    tensor<10x20xf32, #CSR>
+
+where the field is defined elsewhere to specify the encoding parts
+(1)-(2) described above. The `CSR` storage scheme is defined by
+the following encoding, where each dimension is mapped to a level
+in the same order:
+
+    #CSR = #sparse_tensor.encoding<
+      (i, j) -> (i : dense, j : compressed)
+    >
+
+The `CSC` storage scheme is very similar, but the mapping from
+dimensions to levels is a permutation:
+
+    #CSC = #sparse_tensor.encoding<
+      (i, j) -> (j : dense, i : compressed)
+    >
+
+Block sparsity with 2×3 blocks is expressed as follows:
+
+    #BCSR = #sparse_tensor.encoding<
+      (i, j) ->
+        ( i floordiv 2 : compressed
+        , j floordiv 3 : compressed
+        , i mod 2 : dense
+        , j mod 3 : dense
+        )
+     >
+
+For the block sparsity example, the dimension-expressions can
+be inferred automatically. But here’s what it would look like
+to specify them explicitly (just to give an idea of the syntax):
+
+    #BCSR = #sparse_tensor.encoding<
+      ( i = ib * 3 + ii
+      , j = jb * 2 + jj
+      ) ->
+      ( ib = i floordiv 2 : compressed
+      , jb = j floordiv 3 : compressed
+      , ii = i mod 2 : dense
+      , jj = j mod 3 : dense
+      )
+     >
+
+Combining 1×4 blocks together with the parameterized compressed level type
+alluded to above would provide direct support for expressing a computation
+using NVidia 2:4 sparsity acceleration.
+
+As said, the final syntax will be formalized using feedback on this RFC and
+the separate RFC for concretely specifying the syntax used above. Likewise,
+the exact way of encoding the sparsity information inside the tensor type
+will be formalized after considering other ongoing RFCs that extend the
+existing tensor type (e.g., with dynamism and quantization).
+
+## StableHLO Ops Support of Sparse Tensor Types
+
+The following two considerations need to be addressed when sparse
+tensor types are added to the StableHLO operations:
+
+1. what operations support sparse tensor types (all ops vs
+   a restricted set of ops), and
+2. what additions are required to the type system for sparse
+   tensor types (in particular, type inference vs explicitly
+   defining all types while building the IR for the ops).
+
+As for consideration (1), the most ambitious extension would be to
+allow sparse tensor types for all operations and all operands and
+outputs where regular tensor types are allowed. Such an
+"**open world**" sparsity extension would, in theory, allow
+arbitrary programs written in a language such as JAX or NumPy
+to be converted into an equivalent sparse program by merely
+changing some properties on the tensor (viz. array) types
+of that language. However, generating efficient sparse code
+for such an “open world” system is generally an unsolved problem that
+is still under active research in academia. More realistically,
+in a “**closed world**” sparsity extension, sparse tensor types
+are *restricted to a subset of operations* that can be handled
+efficiently by a downstream “sparse compilation” pass,
+i.e., with some basic guarantees of being able to generate a
+well performing sparse version of the input code.
+
+Although we are open to debate an "open world" extension, this RFC initially
+proposes a "closed world" extension where sparse tensor types can only be
+applied to the following operations (the current set is based on our
+experiences with sparse JAX and MHLO; adding more operations already
+in the initial spec is also open for debate; for the "closed world"
+approach, the set of supported operations should still be easily
+extensible as the need arises).
+
+1. The common arithmetic binary and unary “element-wise” operations:
+   `add`, `subtract`, `multiply`, `negate`
+2. Zero preserving unary math “element-wise” operations:
+   `abs`, `exponential_minus_one`, `log_plus_one`,
+   `sign`, `sine`, `sqrt`, `tanh`, `ceil`, `floor`,
+   `bessel_i1e`, `real`, `imag`
+3. General dot products and reductions:
+   `dot`, `dot_general`, `reduce`
+4. Reshaping operations:
+   `reshape`, `broadcast_in_dim`, `concatenate`, `squeeze`, `transpose`
+5. Conversions (between dense/sparse tensors and between different
+   sparsity encodings): `convert`
+
+As for consideration (2), we could come up with type inference rules
+for all operations that, given the types of the input operands,
+define the types of the result. However, for sparsity, such type
+inference rules would always be heuristic in nature. For example,
+when adding two sparse tensors element-wise, should the result be
+dense (when the two nonzero patterns pretty much cover the complete
+coordinate space) or sparse (when the union of the two nonzero patterns
+still remains sparse). Either heuristic would fail for a certain
+class of sparse tensors. Furthermore, it would introduce difficult
+choices for the other components of sparsity. For example, in the
+following element-wise addition of a row-wise and column-wise
+sparse matrix, it is unclear what ordering should be inferred
+on the output sparse matrix.
+
+      %result = stablehlo.add %matrix1, %matrix2
+        : tensor<10x10xf32, #CSR>,
+          tensor<10x10xf32, #CSC> -> tensor<10x10xf32, #???>
+
+Therefore, this RFC proposes that “builders” of the StableHLO IR with
+sparse tensor types must explicitly define sparsity types of inputs
+and outputs (which allows language implementations with their own
+type inference as well as language implementations that require the
+programmer to define sparsity of results explicitly).
+
+Alternatively, the spec could provide “builders” that accept an explicit
+output type or otherwise fall back to the (heuristic) inference alluded
+to above. As yet another alternative, we can relax this proposal with
+“builders” that define the sparsity of input types only with output types
+that are always dense (but, of course, following the standard inference
+rules for all other components of the output tensor types, such as shape
+and element type). A ML framework can then still force the desired
+output sparsity as follows
+
+    x_dense_inferred = a_sparse1 OP b_sparse2
+    x_sparse_desired = convert x_dense_inferred
+
+Which is subsequently “cleaned up” by the ML compiler that accepts
+this IR in an early pass.
+
+    x_sparse_desired = a_sparse1 OP b_sparse2
+
+For all operations, adding sparse types to the operation changes the
+dense semantics into sparse semantics, as illustrated below
+for element-wise addition, where the IR represents adding a
+sparse matrix in CSR format to a sparse matrix into DCSR format
+and storing the result into a new matrix in CSR format.
+
+    %0 = stablehlo.add %arg0, %arg1
+      : (tensor<10x20xf32, #CSR>,
+         tensor<10x20xf32, #DCSR>) -> tensor<10x20xf32, #CSR>
+
+The sparse semantics imply that the generated code ultimately
+can **take advantage of the zeros by relying on common
+mathematical laws that allow skipping operations**
+(such as `x+0=x` and `x*0=0`) but **ignoring strict
+IREE requirements that would prevent such savings**
+(like `0*NaN=NaN` and `0*Inf=NaN`). Furthermore, the IR
+is not necessarily concerned with the efficiency of any of
+the expressed operations. For example, in the following, a
+row-wise stored sparse matrix is added to a column-wise
+sparse matrix.
+
+    %0 = stablehlo.add %arg0, %arg1
+       : (tensor<10x20xf32, #CSC>,
+          tensor<10x20xf32, #CSR>) -> tensor<10x20xf32, #CSR>
+
+The sparse compiler will eventually be responsible for generating
+efficient code, possibly by inserting sparse storage scheme conversions
+into the code. Nevertheless, in a healthy sparse ecosystem,
+ML frameworks are expected to generate IR that can ultimately map
+to efficient sparse code with the current state-of-the-art
+sparse compilation technology.
+
+The conversion operator can be used for all sorts of conversions
+(dense to sparse, sparse to sparse, and sparse to sparse for
+storage scheme transformation). An example of converting a dense
+tensor into a sparse tensor is shown below. In such a conversion,
+only the nonzeros found in the dense tensor are transferred into
+the sparse tensor stored in fiber format.
+
+    %st = stablehlo.convert %dt
+      : (tensor<?x?x?xf64>) -> tensor<?x?x?xf64, #CSF>
+
+Conversely, converting a sparse vector to a dense vector can be
+represented as follows. As expected, the dense vector will receive all
+explicitly stored values from the sparse vector, while all other values
+are set to zero.
+
+    %dv = stablehlo.convert %sv
+      : (tensor<100xi32, #SparseVec>) -> tensor<100xi32>
+
+Finally, a storage scheme transformation transfers one sparse tensor
+into another sparse tensor, as illustrated below by converting CSR into CSC.
+
+    %x = stablehlo.convert %y
+      : (tensor<10x10xf64, #CSR>) -> tensor<10x10xf64, #CSC>
+
+Once this RFC has been debated and approved, the final specification will
+precisely define all StableHLO operations that can accept sparse tensor
+types as operands and outputs, together with the semantics of such operations.