-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to fine-tune instruct mpt-7b model? #108
Comments
A solution was found here: #94 (comment) |
To set your starting model to our MPT-7B-Instruct model on the Hugging Face hub, you'd use this model config in your YAML # Model
model:
name: hf_causal_lm
pretrained_model_name_or_path: mosaicml/mpt-7b-instruct
init_device: cpu
pretrained: true
# Comment the attn_config block to use default "torch" attention
config_overrides:
attn_config:
attn_impl: triton Note that For freezing all the weights except the last layer, you can add some custom logic to |
@baptistejamin is it possible to make a jupyter notebook , that way we could use that to fine-tune MPT-7B using cloud GPUs (paid ofc) , but in a notebook format it will be easy for others to pick up as well. |
@alextrott16 so I managed to successfully run the train.py. Two questions however:
Thank you! |
Hi, we fine tuned an instruction model without freezing any weights. The resulting checkpoint is 77GB ! How do we convert it back to the hf-format and use it ? Also, when trying to fine-tune the training loss profile is way better with Thank you. |
Hi @SoumitriKolavennu , the 77GB is expected given that it is a 7B model and the Composer checkpoint holds both model and optimizer state. To convert to a HF checkpoint folder, you can use the instructions in the
Could you clarify exactly what you see different between the training loss profiles (with screenshots)? In one case ( |
Hi @SoumitriKolavennu , I'm closing this issue for now but feel free to reopen if you need further assistance. |
Hi @abhi-mosaic, Thank you for the help in converting to huggingface. Following your instructions worked great. One minor suggestion would be to include the converter in training folder instead of the inference folder. My other question about name is still relevant but perhaps it deserves a thread of its own. Please close this issue and thank you for the help. |
Hey guys, I just made a video on how to do this in google collab: https://youtu.be/3de0Utr9XnI |
@VRSEN can you make a Colab or video by generalizing the input data preprocessing a bit more. For example, if we want to fine-tune for a news summarization task, how would I preprocess the dataset (e.g., HF dataset multi_news) which has two columns only "document" and "summary"? |
I see in the
train.py
underscripts/train
, it gets a model when given a model configuration. I took a look at this yaml7b_dolly_sft.yaml
and do you think I could further tune the instruct model somehow?What name do I specify in the yaml for the instruct model? Or can I give it a custom path to that instruct model if I have saved it locally? Looking to freeze weights for all the layers other than the last one and then fine-tuning.
Please guide me. Thanks a bunch!
The text was updated successfully, but these errors were encountered: