You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Opening a github issue for public tracking purposes. (internal #4489257)
megatron_gpt_eval.py throws the following error:
...
core_attn_out = super().forward(
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 2099, in forward
qkv_layout, query_layer, key_layer, value_layer = _get_qkv_layout(
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 1209, in _get_qkv_layout
raise Exception("The provided qkv memory layout is not supported!")
Exception: The provided qkv memory layout is not supported!
This only happens when all of the following are true:
micro batch size = 1 (mbs = 2 works)
apply_rope_fusion = True
rope fusion is available (i.e. container has latest Apex)
running an inference workload – megatron_gpt_eval.py, or megatron_gpt_generate.py, or validation loop during training. (training with mbs = 1 and mbs = 2 both work)
Steps/Code to reproduce bug
run inference with any gpt-style model (e.g. llama2)
Describe the bug
Opening a github issue for public tracking purposes. (internal #4489257)
megatron_gpt_eval.py throws the following error:
micro batch size = 1 (mbs = 2 works)
apply_rope_fusion = True
rope fusion is available (i.e. container has latest Apex)
running an inference workload – megatron_gpt_eval.py, or megatron_gpt_generate.py, or validation loop during training. (training with mbs = 1 and mbs = 2 both work)
Steps/Code to reproduce bug
run inference with any gpt-style model (e.g. llama2)
Environment overview (please complete the following information)
nvcr.io/nvidia/nemo:24.01.framework
The text was updated successfully, but these errors were encountered: