Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question on blocksparse. #117

Closed
Young768 opened this issue May 21, 2021 · 10 comments
Closed

question on blocksparse. #117

Young768 opened this issue May 21, 2021 · 10 comments

Comments

@Young768
Copy link

Hi thanks for open-sourced contribution. I see there is block sparse implementation in the test.
Is this the block sparse attention mechanism specific for transformer? Or it is the general spMM?

Thanks

@ptillet
Copy link
Collaborator

ptillet commented May 21, 2021

This is general block-sparse spMM. There are three modes:

  • SDD: sparse = dense x dense, a.k.a. sampled dense-dense matrix multiplication
  • DSD: dense = sparse x dense, the lhs is sparse
  • DDS: dense = dense x sparse, the rhs is sparse

The output of SDD is in a block-sparse format that can be re-used for triton.ops.blocksparse.softmax and also by DSD for attention mechanisms.

@Young768
Copy link
Author

thx

@Young768
Copy link
Author

Young768 commented May 23, 2021

Hi what's the definition of block here? Is it to blockify the matrix and then do the MM?
I'm looking to implement the block sparse attention proposed by bigbird using triton ops. I want to know how to obtain the block in triton. Thanks

截屏2021-05-22 上午1 12 38

@Young768 Young768 reopened this May 23, 2021
@ptillet
Copy link
Collaborator

ptillet commented May 25, 2021

The sparsity layout is specified as a tensor of 0s and 1s. On your example, this would only work if each colored square corresponds to a 16x16 (or 32x32, 64x64, 128x128) block of data. Also note that triton.ops.blocksparse doesn't support overlapping blocks

@Young768
Copy link
Author

Thanks. There won't be any overlapping blocks during attention. And the block size in default is 64X64.
Does triton provide any optimized softmax or layernorm kernel for the output from block sparse MM?

@ptillet
Copy link
Collaborator

ptillet commented May 25, 2021

No layernorm, but you can use triton.ops.blocksparse.softmax to reduce the row of the output of triton.ops.blocksparse.matmul('SDD').

@Young768
Copy link
Author

Could you please give some explanation on "how" to reduce the row in triton implementation?

@ptillet
Copy link
Collaborator

ptillet commented May 26, 2021

Sure! You can find a block-sparse attention example there https://github.com/ptillet/triton/blob/master/python/test/test_blocksparse.py#L150-L159. You can create a block-sparse softmax operation as follows:

sparse_softmax = triton.ops.blocksparse.softmax(layout, block)

and then call it on the output of a SDD matmul

@Young768
Copy link
Author

Thanks. Last question. Is this softmax specifically different from the softmax used for full attention, i.e. full spMM.

@ptillet
Copy link
Collaborator

ptillet commented May 26, 2021

Hmm, it should be the same. You can also pass dense masks since a block-triangular matrix is not triangular (the blocks on the diagonals are dense). There are example usages here https://github.com/ptillet/triton/blob/master/python/test/test_blocksparse.py#L47

dfukalov pushed a commit to dfukalov/triton that referenced this issue Feb 19, 2023
…nack_warnings_navi21

Fix warning on some amdgpu arch (i.e., navi21)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants