Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

blocksparse gives RuntimeError: CUDA: Error- illegal address when increase the block size #206

Closed
xwhan opened this issue Feb 5, 2022 · 4 comments 路 Fixed by #207
Closed
Assignees

Comments

@xwhan
Copy link
Contributor

xwhan commented Feb 5, 2022

馃悰 Bug

I'm not sure whether this is a bug or simply a restriction of triton. But when I follow you doc here https://github.com/facebookresearch/xformers/blob/main/HOWTO.md#blocksparseattention

The sample works fine but it does not work when I increase the block size.

Command

To Reproduce

Steps to reproduce the behavior:

Simply replacing your hyperparameter like this
BATCH = 1
HEADS = 16
SEQ = 8192
EMB = 64 * HEADS
BLOCK_SIZE = 512
DROPOUT = 0.1
should reproduce the error "RuntimeError: CUDA: Error- illegal address"

  • PyTorch Version (e.g., 1.0): 1.10.2
  • OS (e.g., Linux): Ubuntu 18.04
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.8
  • CUDA/cuDNN version: 11.6
  • GPU models and configuration: A100
  • Any other relevant information:

Additional context

@blefaudeux
Copy link
Contributor

Ah 512 is not an option, it's 16/32/64 top of mind, I need to guard that! Thanks a lot for the report @xwhan , will fix asap

@blefaudeux
Copy link
Contributor

Ah 512 is not an option, it's 16/32/64 top of mind, I need to guard that! Thanks a lot for the report @xwhan , will fix asap

I was wrong with 64, needs to be a multiple of 16 (because of tensor cores) but no obvious upper bound, except that you don't really win above a given size (you can repro a 512 block with multiple 64 blocks of course). I'll add an assert to make sure that users stay within reasonable bounds

@xwhan
Copy link
Contributor Author

xwhan commented Feb 7, 2022

Thanks @blefaudeux, not sure if I understand correctly about "repro a 512 block with 64 blocks" --- the softmax will still be calculated within each 64-size block size

@blefaudeux
Copy link
Contributor

Thanks @blefaudeux, not sure if I understand correctly about "repro a 512 block with 64 blocks" --- the softmax will still be calculated within each 64-size block size

Ah no, typically blocksparse just means that the sparsity pattern is blocky, but the softmax is computed over all the coefficients which are computed, not per tile, unless I misunderstood your question ?

If you want the normalization to be on a neighborhood only, then that's different, typically you could get that by summing blocksparse results with non overlapping patterns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants