Skip to content

Commit

Permalink
Sparsity RFC (#1143)
Browse files Browse the repository at this point in the history
A manifesto RFC that outlines a potential path forward for adding sparsity 
support to StableHLO. No changes to StableHLO are proposed at this point -
this will be done in subsequent RFCs.
  • Loading branch information
aartbik committed Apr 12, 2023
1 parent ad88bc5 commit 5313209
Showing 1 changed file with 373 additions and 0 deletions.
373 changes: 373 additions & 0 deletions rfcs/20230210-sparsity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@

# Stable HLO Sparsity RFC

## Motivation

The [StableHLO](https://github.com/openxla/stablehlo) operation set is
intended as a compiler IR for ML computations. The set has been
originally bootstrapped from the
[MHLO dialect](https://github.com/tensorflow/mlir-hlo#meta-hlo-dialect-mhlo)
but added functionality such as [de]serialization and versioning.
StableHLO is meant to be a portability layer between ML frameworks
and ML compilers. The “stable” in StableHLO refers to the intended
stability of changes to the definition, making this operation set a
suitable target IR for a wide variety of ML frameworks including
TensorFlow, JAX and PyTorch, as well as a suitable source IR for
ML compilers including XLA and IREE.

Because sparse tensor algebra is expected to become rather ubiquitous
in many ML problems, this RFC proposes adding **sparsity support**
to StableHLO, in particular by introducing a new **sparse tensor type**
that combines transparently with a restricted subset of StableHLO
operations. In this context, transparency implies that a regular,
single StableHLO operation can be made sparse by merely marking one
or several of its operands and outputs with sparse tensor types.
For example, a 2-dim dot operator for a regular, dense matrix
multiplication (`GEMM`), can be made sparse by marking one (`SpMM`)
or two (`SpMSpM`) operands as sparse, together with a specification
of the desired sparse storage scheme for each. With this novel
approach of **treating sparsity as a type, and not a tedious
explicit implementation detail**, we avoid the typical explosion
of the number of required operations that is inherent to the explicit
approach, where we need to support a matrix multiplication for each
combination of sparse operands and outputs, including the exact
sparse storage scheme (such as separate `SpMM` operations for `COO`,
`CSR`, `CSC`, `DCSR`, etc. and separate `SpMSpM` operations for even
more combinations of inputs and outputs). Also, in this initial
proposal, “sparsity” exclusively refers to tensors with many zeros
and the obvious operations that can take advantage of these (such as
`x+0=x` and `x*0=0`). In principle, however, the proposal can be
extended to deal with other forms of frequently occurring values
(like ones, Inf, NaNs) and semi-ring algebras over these values that
can be exploited to reduce storage requirements and/or computational
requirements of solving ML problems.

The concept of treating sparsity as a property and not as a tedious
implementation detail has been studied extensively in academia
(see [MT1](https://www.aartbik.com/sparse.php) and
[TACO](http://tensor-compiler.org/) as two prominent example
projects). In this new paradigm for developing sparse code,
developers define a computation in a completely sparsity-agnostic
(viz. “dense”) way, and merely annotate some tensor types as sparse.
After this, the compiler automatically maps the sparsity-agnostic
computation to possibly many sparsity-aware implementations, each
tailored to specific sparsity properties of the involved tensors.
This RFC is meant to provide an extension to StableHLO that provides
an IR that fits this new paradigm well. ML frameworks that follow
this new paradigm, such as Sparse JAX, can use this IR as their
target, while ML compilers that support the new paradigm, such as
MLIR, can use this IR as their source from which actual sparse code
can be generated. Providing a standardized way to represent sparsity
in the IR will hopefully contribute to providing an ecosystem in which
the new paradigm of sparse compilation can thrive.

The MLIR compiler infrastructure added sparse tensor types by means
of a new
[Sparse Tensor Dialect](https://mlir.llvm.org/docs/Dialects/SparseTensorOps/)
(for more details on its design, see this
[ACM publication](https://dl.acm.org/doi/10.1145/3544559)).
Previous experience with converting sparse extension to JAX to MHLO
with sparse tensor types demonstrated the ease in which the concept
of treating sparsity as a property can propagate all the way from
a source framework with sparse extensions into a target IR that uses
sparse tensor types, and then ultimately down to an actual sparse
compilation pass that makes sparsity concrete in the IR before
handing it off to a backend compiler such as LLVM for execution on
a target platform. The experience obtained with adding sparse tensor
types to MHLO gave rise to proposing similar support to StableHLO,
as formulated in this RFC.

This RFC proposed two new concepts for StableHLO:

1. Sparse tensor types (in addition to regular “dense” tensor types)
2. StableHLO operations that support sparse tensor types

## Sparse Tensor Types

Tensor types are the cornerstone of the StableHLO type system. These
types are well-defined in the existing specification. For example,
a three-dimensional immutable array with static dimensions of sizes
`10`, `20`, and `30`, together with double-precision elements is
represented by the following type (for more details,
see the original specification).

tensor<10x20x30xf64>

The extension is inspired by the TACO formalization of sparse tensors
that was also adopted in MLIR for sparse tensor types. Below we use
**dimension** to refer to the axes of the semantic tensor, and
**level** to refer to the axes of the actual storage scheme.
We propose to extend the regular tensor types of StableHLO
with a new encoding field that allows for providing:

1. An ordered sequence of level specifications, each of which includes:
1. a required **level-type**, which defines *how* the level is stored,
including:
1. a required **level-format**
2. a collection of **level-properties**
2. a **level-expression**, which defines *what* is stored.
2. An ordered sequence of dimension specifications, each of which includes:
1. the **dimension-size**; however, that’s already given by the tensor’s
dimension-shape
2. an optional **dimension-expression**

Different level-formats may have different collections of level-properties,
and whether these properties are optional vs required also depends on the
level-format; hence why we group the level-format and level-properties
together as a level-type. Whether the level-expression is optional vs
required may also depend on the level-type. For the three level-formats
discussed below, both the level-properties and level-expressions are
optional. (Whereas the level-format for supporting `ITPACK/ELLPACK`
has no properties and the level-expression is required.)

For the three level-types discussed below, each level-expression is an
affine expression over dimension-variables. Thus, the level-expressions
collectively define an affine map from dimension-coordinates to
level-coordinates. And the dimension-expressions collectively define the
inverse map— which only needs to be provided for elaborate cases
where it cannot be inferred automatically. However, we avoid
*defining* the level-/dimension-expressions as affine maps, because
other level-types may require different sorts of expressions.
(One known example is for supporting `ITPACK/ELLPACK`.)

The supported level-formats should consist of at least the following:

* **dense** : level is dense, all entries along the level are stored,
* **compressed** : level is sparse, only nonzeros along the level are stored,
* **singleton** : a variant of the compressed format,
for when coordinates have no siblings.

The specification should be easy to extend for adding new level-formats.
For example, a parameterized type **compressed(2:4)** could be used to
define 2-out-of-4 structured sparsity that is well-supported by some
vendors (viz. NVidia GPUs). These three level-formats have two properties:
**non/unique** (are duplicates allowed at that level), and
**un/ordered** (do coordinates appear sorted at that level). Both of
which are optional, with the defaults being unique and ordered; but
the encoding should allow specifying the non-unique and unordered
settings. However, the dense format must always be ordered and
unique; so the encoding should also be able to prohibit users from
specifying invalid level-types like “unordered dense”.

The encoding outlined above allows for a formal specification of most
common sparse storage schemes (e.g., `COO`, `CSR`, `CSC`, `DCSR`,
`DCSC`, `CSF`, `BCSR`, and `ELL`). Other extensions to the
encoding, such as found in the paper by
[Chou et al.](http://tensor-compiler.org/publications.html)
are open to debate as well.

The exact syntax for the encoding is still open for debate
(it will be finalized and formalized in the final spec).
We will give a concrete proposal as a separate RFC, since
doing so gets into some technical details that are independent
of the rest of the current RFC. Regardless of the particular
syntax used for the encoding, the essence of the proposal
is that a regular tensor type such as the following 10×20
matrix type with 32-bit floating-point elements

tensor<10x20xf32>

is made sparse by adding a single field

tensor<10x20xf32, #CSR>

where the field is defined elsewhere to specify the encoding parts
(1)-(2) described above. The `CSR` storage scheme is defined by
the following encoding, where each dimension is mapped to a level
in the same order:

#CSR = #sparse_tensor.encoding<
(i, j) -> (i : dense, j : compressed)
>

The `CSC` storage scheme is very similar, but the mapping from
dimensions to levels is a permutation:

#CSC = #sparse_tensor.encoding<
(i, j) -> (j : dense, i : compressed)
>

Block sparsity with 2×3 blocks is expressed as follows:

#BCSR = #sparse_tensor.encoding<
(i, j) ->
( i floordiv 2 : compressed
, j floordiv 3 : compressed
, i mod 2 : dense
, j mod 3 : dense
)
>

For the block sparsity example, the dimension-expressions can
be inferred automatically. But here’s what it would look like
to specify them explicitly (just to give an idea of the syntax):

#BCSR = #sparse_tensor.encoding<
( i = ib * 3 + ii
, j = jb * 2 + jj
) ->
( ib = i floordiv 2 : compressed
, jb = j floordiv 3 : compressed
, ii = i mod 2 : dense
, jj = j mod 3 : dense
)
>

Combining 1×4 blocks together with the parameterized compressed level type
alluded to above would provide direct support for expressing a computation
using NVidia 2:4 sparsity acceleration.

As said, the final syntax will be formalized using feedback on this RFC and
the separate RFC for concretely specifying the syntax used above. Likewise,
the exact way of encoding the sparsity information inside the tensor type
will be formalized after considering other ongoing RFCs that extend the
existing tensor type (e.g., with dynamism and quantization).

## StableHLO Ops Support of Sparse Tensor Types

The following two considerations need to be addressed when sparse
tensor types are added to the StableHLO operations:

1. what operations support sparse tensor types (all ops vs
a restricted set of ops), and
2. what additions are required to the type system for sparse
tensor types (in particular, type inference vs explicitly
defining all types while building the IR for the ops).

As for consideration (1), the most ambitious extension would be to
allow sparse tensor types for all operations and all operands and
outputs where regular tensor types are allowed. Such an
"**open world**" sparsity extension would, in theory, allow
arbitrary programs written in a language such as JAX or NumPy
to be converted into an equivalent sparse program by merely
changing some properties on the tensor (viz. array) types
of that language. However, generating efficient sparse code
for such an “open world” system is generally an unsolved problem that
is still under active research in academia. More realistically,
in a “**closed world**” sparsity extension, sparse tensor types
are *restricted to a subset of operations* that can be handled
efficiently by a downstream “sparse compilation” pass,
i.e., with some basic guarantees of being able to generate a
well performing sparse version of the input code.

Although we are open to debate an "open world" extension, this RFC initially
proposes a "closed world" extension where sparse tensor types can only be
applied to the following operations (the current set is based on our
experiences with sparse JAX and MHLO; adding more operations already
in the initial spec is also open for debate; for the "closed world"
approach, the set of supported operations should still be easily
extensible as the need arises).

1. The common arithmetic binary and unary “element-wise” operations:
`add`, `subtract`, `multiply`, `negate`
2. Zero preserving unary math “element-wise” operations:
`abs`, `exponential_minus_one`, `log_plus_one`,
`sign`, `sine`, `sqrt`, `tanh`, `ceil`, `floor`,
`bessel_i1e`, `real`, `imag`
3. General dot products and reductions:
`dot`, `dot_general`, `reduce`
4. Reshaping operations:
`reshape`, `broadcast_in_dim`, `concatenate`, `squeeze`, `transpose`
5. Conversions (between dense/sparse tensors and between different
sparsity encodings): `convert`

As for consideration (2), we could come up with type inference rules
for all operations that, given the types of the input operands,
define the types of the result. However, for sparsity, such type
inference rules would always be heuristic in nature. For example,
when adding two sparse tensors element-wise, should the result be
dense (when the two nonzero patterns pretty much cover the complete
coordinate space) or sparse (when the union of the two nonzero patterns
still remains sparse). Either heuristic would fail for a certain
class of sparse tensors. Furthermore, it would introduce difficult
choices for the other components of sparsity. For example, in the
following element-wise addition of a row-wise and column-wise
sparse matrix, it is unclear what ordering should be inferred
on the output sparse matrix.

%result = stablehlo.add %matrix1, %matrix2
: tensor<10x10xf32, #CSR>,
tensor<10x10xf32, #CSC> -> tensor<10x10xf32, #???>

Therefore, this RFC proposes that “builders” of the StableHLO IR with
sparse tensor types must explicitly define sparsity types of inputs
and outputs (which allows language implementations with their own
type inference as well as language implementations that require the
programmer to define sparsity of results explicitly).

Alternatively, the spec could provide “builders” that accept an explicit
output type or otherwise fall back to the (heuristic) inference alluded
to above. As yet another alternative, we can relax this proposal with
“builders” that define the sparsity of input types only with output types
that are always dense (but, of course, following the standard inference
rules for all other components of the output tensor types, such as shape
and element type). A ML framework can then still force the desired
output sparsity as follows

x_dense_inferred = a_sparse1 OP b_sparse2
x_sparse_desired = convert x_dense_inferred

Which is subsequently “cleaned up” by the ML compiler that accepts
this IR in an early pass.

x_sparse_desired = a_sparse1 OP b_sparse2

For all operations, adding sparse types to the operation changes the
dense semantics into sparse semantics, as illustrated below
for element-wise addition, where the IR represents adding a
sparse matrix in CSR format to a sparse matrix into DCSR format
and storing the result into a new matrix in CSR format.

%0 = stablehlo.add %arg0, %arg1
: (tensor<10x20xf32, #CSR>,
tensor<10x20xf32, #DCSR>) -> tensor<10x20xf32, #CSR>

The sparse semantics imply that the generated code ultimately
can **take advantage of the zeros by relying on common
mathematical laws that allow skipping operations**
(such as `x+0=x` and `x*0=0`) but **ignoring strict
IREE requirements that would prevent such savings**
(like `0*NaN=NaN` and `0*Inf=NaN`). Furthermore, the IR
is not necessarily concerned with the efficiency of any of
the expressed operations. For example, in the following, a
row-wise stored sparse matrix is added to a column-wise
sparse matrix.

%0 = stablehlo.add %arg0, %arg1
: (tensor<10x20xf32, #CSC>,
tensor<10x20xf32, #CSR>) -> tensor<10x20xf32, #CSR>

The sparse compiler will eventually be responsible for generating
efficient code, possibly by inserting sparse storage scheme conversions
into the code. Nevertheless, in a healthy sparse ecosystem,
ML frameworks are expected to generate IR that can ultimately map
to efficient sparse code with the current state-of-the-art
sparse compilation technology.

The conversion operator can be used for all sorts of conversions
(dense to sparse, sparse to sparse, and sparse to sparse for
storage scheme transformation). An example of converting a dense
tensor into a sparse tensor is shown below. In such a conversion,
only the nonzeros found in the dense tensor are transferred into
the sparse tensor stored in fiber format.

%st = stablehlo.convert %dt
: (tensor<?x?x?xf64>) -> tensor<?x?x?xf64, #CSF>

Conversely, converting a sparse vector to a dense vector can be
represented as follows. As expected, the dense vector will receive all
explicitly stored values from the sparse vector, while all other values
are set to zero.

%dv = stablehlo.convert %sv
: (tensor<100xi32, #SparseVec>) -> tensor<100xi32>

Finally, a storage scheme transformation transfers one sparse tensor
into another sparse tensor, as illustrated below by converting CSR into CSC.

%x = stablehlo.convert %y
: (tensor<10x10xf64, #CSR>) -> tensor<10x10xf64, #CSC>

Once this RFC has been debated and approved, the final specification will
precisely define all StableHLO operations that can accept sparse tensor
types as operands and outputs, together with the semantics of such operations.

0 comments on commit 5313209

Please sign in to comment.