Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CodeLlama support #51

Merged
merged 6 commits into from
Aug 29, 2023
Merged

Conversation

andreaskoepf
Copy link
Contributor

Main differences of codellama:

  • sequence_length: 16384
  • rope_theta: 1e6
  • vocabulary size: 32016

Note: Once support for rope_theta has been added to hf transformers (see huggingface/transformers#25740) the args.rope_theta value should also be applied in megatron2hf.py.

@panx27
Copy link

panx27 commented Aug 27, 2023

The vocab size of the 34b model is still 32000

@panx27
Copy link

panx27 commented Aug 27, 2023

@andreaskoepf
Copy link
Contributor Author

The tokenizer situation is indeed a bit special and unfortunately not only dependent on size, a check might be >13b or "Python" in the name:

find -iname tokenizer.model -exec md5sum '{}' \;
9e597e72392fd4005529a33f2bf708ba  ./CodeLlama-7b/tokenizer.model
9e597e72392fd4005529a33f2bf708ba  ./CodeLlama-13b/tokenizer.model
9e597e72392fd4005529a33f2bf708ba  ./CodeLlama-7b-Instruct/tokenizer.model
9e597e72392fd4005529a33f2bf708ba  ./CodeLlama-13b-Instruct/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-7b-Python/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-13b-Python/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-34b-Python/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-34b/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-34b-Instruct/tokenizer.model

@andreaskoepf
Copy link
Contributor Author

andreaskoepf commented Aug 27, 2023

@panx27 I have tested so far only fine-tuning the 13b version codellama vanilla version .. if you happen to try 34b please let me know if it works with the current version of the code.

@panx27
Copy link

panx27 commented Aug 28, 2023

@andreaskoepf I have tested the conversion process (meta -> megatron, sharding, megatron -> hf) and the training process for the 34b model on the current version of your code. The results look good to me.

There's a small modification needed. In here, the default value should be set to 32,000. As it stands, the current logic will assign a vocab size of 32,016 to the 34b model, which is not what we want.

@andreaskoepf
Copy link
Contributor Author

Thanks a lot for testing!

There's a small modification needed.

Oh yes, I forgot to remove that line.. will correct now.

@martinjaggi martinjaggi merged commit 15b051d into epfLLM:main Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants