-
Notifications
You must be signed in to change notification settings - Fork 582
Fixes for llama 3.1 training #823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
recheck |
Added comments on the issues, can you please revert the git based changes - the other 2 look good. |
Done ✅ |
* Remove subpath from pretrain_llama.py * Install toml package * Adjust --gres=gpu:8 to number of user specified devices Signed-off-by: Sean Smith <seasmith@nvidia.com>
mem="0", | ||
exclusive=True, | ||
gres="gpu:8", | ||
packager=run.GitArchivePackager(subpath="large_language_model_pretraining/nemo", ref="HEAD"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please revert packager=run.GitArchivePackager(subpath="large_language_model_pretraining/nemo", ref="HEAD"),
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can however I don't understand who is able to run this. If you follow the instructions in the readme this will fail since the path is wrong.
Maybe if you move the Dockerfile to the root directory and build there then this will work, i.e.
cp Dockerfile ../..
cd ../..
docker build -t nemo .
Fixes #821, #820, #824