-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CodeLlama support #51
Conversation
The vocab size of the 34b model is still 32000 |
I think the mapping also needs to be updated: main...panx27:Megatron-LLM:codellama_test#diff-ec4ffbea89a2008356ff84590b89264ef61b775a46422546296f030923e6c3bbL16-L20 |
The tokenizer situation is indeed a bit special and unfortunately not only dependent on size, a check might be >13b or "Python" in the name:
|
@panx27 I have tested so far only fine-tuning the 13b version codellama vanilla version .. if you happen to try 34b please let me know if it works with the current version of the code. |
@andreaskoepf I have tested the conversion process (meta -> megatron, sharding, megatron -> hf) and the training process for the 34b model on the current version of your code. The results look good to me. There's a small modification needed. In here, the default value should be set to 32,000. As it stands, the current logic will assign a vocab size of 32,016 to the 34b model, which is not what we want. |
Thanks a lot for testing!
Oh yes, I forgot to remove that line.. will correct now. |
Main differences of codellama:
Note: Once support for rope_theta has been added to hf transformers (see huggingface/transformers#25740) the args.rope_theta value should also be applied in megatron2hf.py.