question on blocksparse. #117

Young768 · 2021-05-21T00:08:33Z

Hi thanks for open-sourced contribution. I see there is block sparse implementation in the test.
Is this the block sparse attention mechanism specific for transformer? Or it is the general spMM?

Thanks

ptillet · 2021-05-21T02:23:26Z

This is general block-sparse spMM. There are three modes:

SDD: sparse = dense x dense, a.k.a. sampled dense-dense matrix multiplication
DSD: dense = sparse x dense, the lhs is sparse
DDS: dense = dense x sparse, the rhs is sparse

The output of SDD is in a block-sparse format that can be re-used for triton.ops.blocksparse.softmax and also by DSD for attention mechanisms.

Young768 · 2021-05-21T02:34:57Z

thx

Young768 · 2021-05-23T06:04:46Z

Hi what's the definition of block here? Is it to blockify the matrix and then do the MM?
I'm looking to implement the block sparse attention proposed by bigbird using triton ops. I want to know how to obtain the block in triton. Thanks

ptillet · 2021-05-25T02:06:53Z

The sparsity layout is specified as a tensor of 0s and 1s. On your example, this would only work if each colored square corresponds to a 16x16 (or 32x32, 64x64, 128x128) block of data. Also note that triton.ops.blocksparse doesn't support overlapping blocks

Young768 · 2021-05-25T02:23:09Z

Thanks. There won't be any overlapping blocks during attention. And the block size in default is 64X64.
Does triton provide any optimized softmax or layernorm kernel for the output from block sparse MM?

ptillet · 2021-05-25T22:00:30Z

No layernorm, but you can use triton.ops.blocksparse.softmax to reduce the row of the output of triton.ops.blocksparse.matmul('SDD').

Young768 · 2021-05-25T22:59:48Z

Could you please give some explanation on "how" to reduce the row in triton implementation?

ptillet · 2021-05-26T00:30:35Z

Sure! You can find a block-sparse attention example there https://github.com/ptillet/triton/blob/master/python/test/test_blocksparse.py#L150-L159. You can create a block-sparse softmax operation as follows:

sparse_softmax = triton.ops.blocksparse.softmax(layout, block)

and then call it on the output of a SDD matmul

Young768 · 2021-05-26T01:18:51Z

Thanks. Last question. Is this softmax specifically different from the softmax used for full attention, i.e. full spMM.

ptillet · 2021-05-26T04:27:42Z

Hmm, it should be the same. You can also pass dense masks since a block-triangular matrix is not triangular (the blocks on the diagonals are dense). There are example usages here https://github.com/ptillet/triton/blob/master/python/test/test_blocksparse.py#L47

…ecc_xnack_warnings_navi21 Fix warning on some amdgpu arch (i.e., navi21)

Young768 closed this as completed May 21, 2021

Young768 reopened this May 23, 2021

Young768 closed this as completed May 27, 2021

dfukalov pushed a commit to dfukalov/triton that referenced this issue Feb 19, 2023

Merge pull request triton-lang#117 from ROCmSoftwarePlatform/fix_sram…

2ec42ea

…ecc_xnack_warnings_navi21 Fix warning on some amdgpu arch (i.e., navi21)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question on blocksparse. #117

question on blocksparse. #117

Young768 commented May 21, 2021

ptillet commented May 21, 2021

Young768 commented May 21, 2021

Young768 commented May 23, 2021 •

edited

Loading

ptillet commented May 25, 2021

Young768 commented May 25, 2021

ptillet commented May 25, 2021

Young768 commented May 25, 2021

ptillet commented May 26, 2021

Young768 commented May 26, 2021

ptillet commented May 26, 2021

question on blocksparse. #117

question on blocksparse. #117

Comments

Young768 commented May 21, 2021

ptillet commented May 21, 2021

Young768 commented May 21, 2021

Young768 commented May 23, 2021 • edited Loading

ptillet commented May 25, 2021

Young768 commented May 25, 2021

ptillet commented May 25, 2021

Young768 commented May 25, 2021

ptillet commented May 26, 2021

Young768 commented May 26, 2021

ptillet commented May 26, 2021

Young768 commented May 23, 2021 •

edited

Loading