Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qkv format error in GPT Eval when bs=1 and using fused rope kernel #8590

Closed
cuichenx opened this issue Mar 5, 2024 · 2 comments
Closed

qkv format error in GPT Eval when bs=1 and using fused rope kernel #8590

cuichenx opened this issue Mar 5, 2024 · 2 comments
Labels
bug Something isn't working stale

Comments

@cuichenx
Copy link
Collaborator

cuichenx commented Mar 5, 2024

Describe the bug

Opening a github issue for public tracking purposes. (internal #4489257)

megatron_gpt_eval.py throws the following error:

...
    core_attn_out = super().forward(
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 2099, in forward
    qkv_layout, query_layer, key_layer, value_layer = _get_qkv_layout(
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 1209, in _get_qkv_layout
    raise Exception("The provided qkv memory layout is not supported!")
Exception: The provided qkv memory layout is not supported!
This only happens when all of the following are true:

micro batch size = 1 (mbs = 2 works)
apply_rope_fusion = True
rope fusion is available (i.e. container has latest Apex)
running an inference workload – megatron_gpt_eval.py, or megatron_gpt_generate.py, or validation loop during training. (training with mbs = 1 and mbs = 2 both work)

Steps/Code to reproduce bug

run inference with any gpt-style model (e.g. llama2)

torchrun --nproc_per_node=1 /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
    gpt_model_file=<path to model> \
    inference.greedy=True \
    inference.add_BOS=True \
    trainer.devices=1 \
    trainer.num_nodes=1 \
    tensor_model_parallel_size=1 \
    pipeline_model_parallel_size=1 \
    prompts=["deep learning is"]

Environment overview (please complete the following information)

nvcr.io/nvidia/nemo:24.01.framework

@cuichenx cuichenx added the bug Something isn't working label Mar 5, 2024
Copy link
Contributor

github-actions bot commented Apr 5, 2024

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Apr 5, 2024
@cuichenx
Copy link
Collaborator Author

cuichenx commented Apr 5, 2024

fixed in nemo 24.03 release / megatron core 0.6.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

1 participant