Add CodeLlama support #51

andreaskoepf · 2023-08-25T12:16:05Z

Main differences of codellama:

sequence_length: 16384
rope_theta: 1e6
vocabulary size: 32016

Note: Once support for rope_theta has been added to hf transformers (see huggingface/transformers#25740) the args.rope_theta value should also be applied in megatron2hf.py.

panx27 · 2023-08-27T05:12:08Z

The vocab size of the 34b model is still 32000

panx27 · 2023-08-27T05:15:06Z

I think the mapping also needs to be updated: main...panx27:Megatron-LLM:codellama_test#diff-ec4ffbea89a2008356ff84590b89264ef61b775a46422546296f030923e6c3bbL16-L20

andreaskoepf · 2023-08-27T09:03:13Z

The tokenizer situation is indeed a bit special and unfortunately not only dependent on size, a check might be >13b or "Python" in the name:

find -iname tokenizer.model -exec md5sum '{}' \;
9e597e72392fd4005529a33f2bf708ba  ./CodeLlama-7b/tokenizer.model
9e597e72392fd4005529a33f2bf708ba  ./CodeLlama-13b/tokenizer.model
9e597e72392fd4005529a33f2bf708ba  ./CodeLlama-7b-Instruct/tokenizer.model
9e597e72392fd4005529a33f2bf708ba  ./CodeLlama-13b-Instruct/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-7b-Python/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-13b-Python/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-34b-Python/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-34b/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  ./CodeLlama-34b-Instruct/tokenizer.model

andreaskoepf · 2023-08-27T09:36:57Z

@panx27 I have tested so far only fine-tuning the 13b version codellama vanilla version .. if you happen to try 34b please let me know if it works with the current version of the code.

panx27 · 2023-08-28T07:29:21Z

@andreaskoepf I have tested the conversion process (meta -> megatron, sharding, megatron -> hf) and the training process for the 34b model on the current version of your code. The results look good to me.

There's a small modification needed. In here, the default value should be set to 32,000. As it stands, the current logic will assign a vocab size of 32,016 to the 34b model, which is not what we want.

andreaskoepf · 2023-08-28T10:14:54Z

Thanks a lot for testing!

There's a small modification needed.

Oh yes, I forgot to remove that line.. will correct now.

andreaskoepf added 2 commits August 25, 2023 14:04

Add support for CodeLlama

db8dc4c

fix indentation with spaces

29b8a7a

andreaskoepf added 3 commits August 27, 2023 11:06

add basic hf conversion for codellama

c5e9bef

Add changes for codellama 34b suggested by user panx27

7504693

set make_vocab_size_divisible_by to 16

108fdf6

remove old vocab-size handling

eeabc16

martinjaggi approved these changes Aug 28, 2023

View reviewed changes

martinjaggi merged commit 15b051d into epfLLM:main Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CodeLlama support #51

Add CodeLlama support #51

andreaskoepf commented Aug 25, 2023

panx27 commented Aug 27, 2023

panx27 commented Aug 27, 2023

andreaskoepf commented Aug 27, 2023

andreaskoepf commented Aug 27, 2023 •

edited

Loading

panx27 commented Aug 28, 2023

andreaskoepf commented Aug 28, 2023

Add CodeLlama support #51

Add CodeLlama support #51

Conversation

andreaskoepf commented Aug 25, 2023

panx27 commented Aug 27, 2023

panx27 commented Aug 27, 2023

andreaskoepf commented Aug 27, 2023

andreaskoepf commented Aug 27, 2023 • edited Loading

panx27 commented Aug 28, 2023

andreaskoepf commented Aug 28, 2023

andreaskoepf commented Aug 27, 2023 •

edited

Loading