Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparsity RFC #1143

Merged
merged 7 commits into from
Apr 12, 2023
Merged

Sparsity RFC #1143

merged 7 commits into from
Apr 12, 2023

Conversation

aartbik
Copy link
Member

@aartbik aartbik commented Feb 10, 2023

A proposal to add sparse tensor types to StableHLO.

@burmako burmako added the RFC label Feb 10, 2023
@burmako burmako self-assigned this Feb 10, 2023
Comment on lines +205 to +213
#BCSR = #sparse_tensor.encoding<
( i = ib * 3 + ii
, j = jb * 2 + jj
) ->
( ib = i floordiv 2 : compressed
, jb = j floordiv 3 : compressed
, ii = i mod 2 : dense
, jj = j mod 3 : dense
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this concept of expressing the affine mapping of physical (data-structure) indices to logical indices. I do think it's a bit unclear how this mapping will be used to generate good sparse code. For example, how would this mapping enable the compiler to generate an element-wise CSR * BCSR that properly coiterates the j dimension?

Copy link
Member Author

@aartbik aartbik Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thanks for all your feedback! Much appreciated.

You bring up a good point on the tension between keeping the sparse tensor types very general and the ability of a sparse compiler to ultimately map this to efficient code. I was trying to address this in my " healthy sparse ecosystem" comment below, but should probably spell this out a bit more clear. The general direction I want to go is that sparse compilers should probably map any combination of sparsity to functional code, even if some of these are not very efficient (I am a bit weary on just giving up for inefficient case). Finding what combination works well is then a "healthy" cooperation between ML developer, the framework that generates StableHLO, and the sparse compiler.

Comment on lines +288 to +290
%result = stablehlo.add %matrix1, %matrix2
: tensor<10x10xf32, #CSR>,
tensor<10x10xf32, #CSC> -> tensor<10x10xf32, #???>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has there been any consideration in disallowing certain operations on certain combinations of data structures? Element-wise combining CSR and CSC matrices requires a discordant traversal of at least one of them, or a data-structure conversion to be inserted, both of which are generally slow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I should have read further.

Furthermore, the IR is not necessarily concerned with the efficiency of any of the expressed operations.

It does seem like supporting code-generation for all possible combinations may place a lot of strain on compiler writers unless there is a default (e.g. like "Always insert a conversion if the mode ordering does not line up.")

Copy link
Member Author

@aartbik aartbik Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my PhD thesis sparse compiler, I had a "panic" default fall back. If none of the efficient techniques worked, it would fall back to simply "looking up" each next element. Very inefficient, but it would at least generate functional code, together with a prompt back to the user that this version would need a better sparsity design. I feel this approach (or something similar) is slightly better than a priori disallowing sparsity combinations at StableHLO IR level (perhaps some clever researcher finds an efficient way of doing something that was originally deemed too hard ;--)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a reasonable approach. To clarify, I did not intend to restrict the IR, as that seems particularly difficult to do - I was more arguing for disallowing some types of codegen (i.e. "panic" failing to codegen), but you make a good point about it being better to generate inefficient but functional code instead of failing at codegen.

Comment on lines +342 to +345
into the code. Nevertheless, in a healthy sparse ecosystem,
ML frameworks are expected to generate IR that can ultimately map
to efficient sparse code with the current state-of-the-art
sparse compilation technology.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't pretend to be an ML framework expert (am far from it), but has there been any debate on whether the compiler should simply not support inefficient operations (such as the discordant traversal in my comment above) instead of requiring the upstream frameworks to just not generate inefficient IR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to that debate, although, like phrased above, I am leaning towards a feedback loop of always generating functionally correct code, but prompting back (somehow) on cases that were not dealt with efficiently.

@burmako
Copy link
Contributor

burmako commented Apr 12, 2023

It would be great to have sparsity support in StableHLO, and the proposed high-level vision sounds compelling, especially given that it is based on the work that has been going on for 2.5 years.

By design, this RFC doesn't introduce any changes to StableHLO, so there's nothing specific to approve at this point. However, I think would be great to merge this PR anyway to capture the importance of sparsity to StableHLO as well as the community interest in this work.

As a meta comment, evaluating manifesto RFCs is a gap in our RFC process at the moment, and this will need to be addressed in the future - once we complete the transition to OpenXLA governance and roll out a formal RFC process.

Looking forward to subsequent RFCs that would propose to introduce sparse tensor types etc! Figuring out the details would be an interesting discussion to have, and we'll evaluate these RFCs separately.

@burmako burmako merged commit 5313209 into openxla:main Apr 12, 2023
@burmako burmako mentioned this pull request Apr 13, 2023
@aartbik aartbik deleted the bik branch September 26, 2023 19:30
GleasonK pushed a commit that referenced this pull request Apr 2, 2024
After StableHLO => Linalg lowering was implemented in the [linalg
RFC](#1610), most of the ops
stated by @aartbik in the [sparsity
RFC](#1143) can generate
correct linalg code with sparse encodings. And it can then be lowered
through mlir-opt with --sparsifier pipeline and run with mlir-runner.

The `sparse.mlir` file in this PR checks the generated linalg code with
sparse encodings. These are the same tests for the dense ops but with
sparse input/output.

The only op that couldn't be lowered to correct linalg with sparsity is
`stablehlo.concatenate`. Therefore, after discussing with @GleasonK, I
added a flag `enable-sparse-ops` to the linalg pass and rewrite
`stablehlo.concatenate` to `sparse_tensor.concatenate` when the flag is
set to true and input/output have sparse encodings. If there're more
linalg converters that need modification, a new pass for sparsity will
be created to run before linalg pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants