You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ah 512 is not an option, it's 16/32/64 top of mind, I need to guard that! Thanks a lot for the report @xwhan , will fix asap
I was wrong with 64, needs to be a multiple of 16 (because of tensor cores) but no obvious upper bound, except that you don't really win above a given size (you can repro a 512 block with multiple 64 blocks of course). I'll add an assert to make sure that users stay within reasonable bounds
Thanks @blefaudeux, not sure if I understand correctly about "repro a 512 block with 64 blocks" --- the softmax will still be calculated within each 64-size block size
Thanks @blefaudeux, not sure if I understand correctly about "repro a 512 block with 64 blocks" --- the softmax will still be calculated within each 64-size block size
Ah no, typically blocksparse just means that the sparsity pattern is blocky, but the softmax is computed over all the coefficients which are computed, not per tile, unless I misunderstood your question ?
If you want the normalization to be on a neighborhood only, then that's different, typically you could get that by summing blocksparse results with non overlapping patterns
馃悰 Bug
I'm not sure whether this is a bug or simply a restriction of triton. But when I follow you doc here https://github.com/facebookresearch/xformers/blob/main/HOWTO.md#blocksparseattention
The sample works fine but it does not work when I increase the block size.
Command
To Reproduce
Steps to reproduce the behavior:
Simply replacing your hyperparameter like this
BATCH = 1
HEADS = 16
SEQ = 8192
EMB = 64 * HEADS
BLOCK_SIZE = 512
DROPOUT = 0.1
should reproduce the error "RuntimeError: CUDA: Error- illegal address"
conda
,pip
, source): pipAdditional context
The text was updated successfully, but these errors were encountered: