Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [BENCHMARKING.PY] RuntimeError: No available kernel. Aborting execution. #9

Closed
kyegomez opened this issue Jul 17, 2023 · 0 comments
Labels
bug Something isn't working Fund

Comments

@kyegomez
Copy link
Owner

kyegomez commented Jul 17, 2023

: /root/.cache/pip/wheels/20/7b/3f/2807682bad2fba40ed888e6309597a5fda545ab30964c835aa
Successfully built deepspeed
Installing collected packages: tokenizers, SentencePiece, safetensors, ninja, hjson, bitsandbytes, xxhash, rouge, einops, dill, multiprocess, huggingface-hub, transformers, datasets, lion-pytorch, deepspeed, accelerate
Successfully installed SentencePiece-0.1.99 accelerate-0.21.0 bitsandbytes-0.40.2 datasets-2.13.1 deepspeed-0.10.0 dill-0.3.6 einops-0.6.1 hjson-3.1.0 huggingface-hub-0.16.4 lion-pytorch-0.1.2 multiprocess-0.70.14 ninja-1.11.1 rouge-1.0.1 safetensors-0.3.1 tokenizers-0.13.3 transformers-4.30.2 xxhash-3.2.0
[2023-07-17 22:42:48,068] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-07-17 22:42:50.272490: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
A100 GPU detected, using flash attention if input tensor is on cuda
/content/Andromeda/Andromeda/optimus_prime/attend.py:168: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:545.)
  out = F.scaled_dot_product_attention(
/content/Andromeda/Andromeda/optimus_prime/attend.py:168: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:338.)
  out = F.scaled_dot_product_attention(
/content/Andromeda/Andromeda/optimus_prime/attend.py:168: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:547.)
  out = F.scaled_dot_product_attention(
/content/Andromeda/Andromeda/optimus_prime/attend.py:168: UserWarning: Both fused kernels do not support non-null attn_mask. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:191.)
  out = F.scaled_dot_product_attention(
Traceback (most recent call last):
  File "/content/Andromeda/benchmarking.py", line 237, in <module>
    forward_pass_time = speed_metrics.forward_pass_time()
  File "/content/Andromeda/benchmarking.py", line 66, in forward_pass_time
    model_input = self.model.decoder.forward(torch.randint(0, 50304, (1, 8192), device=device, dtype=torch.long))[0]
  File "/content/Andromeda/Andromeda/optimus_prime/autoregressive_wrapper.py", line 141, in forward
    logits = self.net(inp, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/Andromeda/Andromeda/optimus_prime/x_transformers.py", line 1422, in forward
    x = self.attn_layers(x, mask = mask, mems = mems, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/Andromeda/Andromeda/optimus_prime/x_transformers.py", line 1155, in forward
    out, inter = block(x, mask = mask, context_mask = self_attn_context_mask, attn_mask = attn_mask, rel_pos = self.rel_pos, rotary_pos_emb = rotary_pos_emb, prev_attn = prev_attn, mem = layer_mem)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/Andromeda/Andromeda/optimus_prime/x_transformers.py", line 581, in forward
    return self.fn(x, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/Andromeda/Andromeda/optimus_prime/x_transformers.py", line 863, in forward
    out, intermediates = self.attend(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/Andromeda/Andromeda/optimus_prime/attend.py", line 198, in forward
    return self.flash_attn(q, k, v, mask = mask, attn_bias = attn_bias)
  File "/content/Andromeda/Andromeda/optimus_prime/attend.py", line 168, in flash_attn
    out = F.scaled_dot_product_attention(
RuntimeError: No available kernel.  Aborting execution.

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@kyegomez kyegomez added the bug Something isn't working label Jul 17, 2023
@kyegomez kyegomez changed the title [BUG] RuntimeError: No available kernel. Aborting execution. [BUG] [BENCHMARKING.PY] RuntimeError: No available kernel. Aborting execution. Jul 17, 2023
@polar-sh polar-sh bot added the Fund label Oct 11, 2023
@kyegomez kyegomez closed this as completed Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Fund
Projects
None yet
Development

No branches or pull requests

1 participant