New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question on blocksparse. #117
Comments
This is general block-sparse spMM. There are three modes:
The output of SDD is in a block-sparse format that can be re-used for |
thx |
Hi what's the definition of block here? Is it to blockify the matrix and then do the MM? |
The sparsity layout is specified as a tensor of 0s and 1s. On your example, this would only work if each colored square corresponds to a 16x16 (or 32x32, 64x64, 128x128) block of data. Also note that |
Thanks. There won't be any overlapping blocks during attention. And the block size in default is 64X64. |
No layernorm, but you can use triton.ops.blocksparse.softmax to reduce the row of the output of triton.ops.blocksparse.matmul('SDD'). |
Could you please give some explanation on "how" to reduce the row in triton implementation? |
Sure! You can find a block-sparse attention example there https://github.com/ptillet/triton/blob/master/python/test/test_blocksparse.py#L150-L159. You can create a block-sparse softmax operation as follows:
and then call it on the output of a SDD matmul |
Thanks. Last question. Is this |
Hmm, it should be the same. You can also pass dense masks since a block-triangular matrix is not triangular (the blocks on the diagonals are dense). There are example usages here https://github.com/ptillet/triton/blob/master/python/test/test_blocksparse.py#L47 |
…nack_warnings_navi21 Fix warning on some amdgpu arch (i.e., navi21)
Hi thanks for open-sourced contribution. I see there is block sparse implementation in the test.
Is this the block sparse attention mechanism specific for transformer? Or it is the general spMM?
Thanks
The text was updated successfully, but these errors were encountered: