-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support to finetune with use_distributed_optimizer #68
Conversation
could you comment what is the issue solved by this fix? (compared to the finetuning code and scripts we provide?) |
Yes. fix the missing function when u add --use-distributed-optimizer args in fintuning scripts. |
also fix #67 (comment) |
any update ? |
hi, sorry no update: the whole team is working on a big run right now and obviously changing the function signature for checkpoint loading is not something we're keen to do right now. We should be done in about a month. |
Hi @dumpmemory, If I used --use_checkpoint_args and --use_distributed_optimizer , an assertion error would be encountered in the code in checkpointing.py, as mpu is not initialized.
The root cause is _finish_mpu_init() is called after load_args_from_checkpoint(args) in initialize.py, the code is as follows:
|
I will update the code. i have fixed this |
pls , try the new one. |
hello @dumpmemory we're working on clearing the open issues and will be getting to this one soon. Thank you for your patience. |
Thank you for your contribution @dumpmemory. We'll not merge this to keep our own complexity down. Sorry if this wasn't clear, but this repo is meant more as replication code for an upcoming paper than a long-lived fork from NVIDIA's megatron and we are not so interested in allocating time to adding features that we're not using. I'll add a note to the docs saying this :). |
@kylematoba So does it means the main branch don't support use_distributed_optimizer? |
fix issues for finetune with use_distributed_optimizer option