You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in the title, after finishing all of the installs and building nemo and megatron-lm from source, assuming that the model has been trained with megatron-lm.
Hi @SkanderBS2024, I see you are mounting. You are not using the NeMo container nvcr.io/nvidia/nemo:24.07, and you are mounting the NeMo. I tested the conversion script in the nvcr.io/nvidia/nemo:24.07, and it works fine. However, there is an update needed for the latest main, for which I have raised a PR. #10224. You can either checkout this PR or use the 24.07 nemo container. Thanks for reporting the issue!
Describe the bug
As described in the title, after finishing all of the installs and building nemo and megatron-lm from source, assuming that the model has been trained with megatron-lm.
Steps/Code to reproduce bug
Expected behavior
Expected to convert the mamba trained model to a .nemo format for fine-tuning.
Environment overview (please complete the following information)
docker pull
&docker run
commands used :Docker pull command :
Docker Run command :
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Nvidia pytorch container : 24.07 (assmuming training was made with 24.03)
GPUS : 2 * GPU A100 80
Followed steps here : tutorials/llm/mamba/mamba.rst
The text was updated successfully, but these errors were encountered: