How to fine-tune instruct mpt-7b model? #108

dydx-git · 2023-05-11T09:51:33Z

I see in the train.py under scripts/train, it gets a model when given a model configuration. I took a look at this yaml 7b_dolly_sft.yaml and do you think I could further tune the instruct model somehow?

What name do I specify in the yaml for the instruct model? Or can I give it a custom path to that instruct model if I have saved it locally? Looking to freeze weights for all the layers other than the last one and then fine-tuning.

Please guide me. Thanks a bunch!

The text was updated successfully, but these errors were encountered:

baptistejamin · 2023-05-11T12:19:44Z

A solution was found here: #94 (comment)

alextrott16 · 2023-05-11T17:59:21Z

To set your starting model to our MPT-7B-Instruct model on the Hugging Face hub, you'd use this model config in your YAML

  # Model
  model:
    name: hf_causal_lm
    pretrained_model_name_or_path: mosaicml/mpt-7b-instruct
    init_device: cpu
    pretrained: true
    # Comment the attn_config block to use default "torch" attention
    config_overrides:
        attn_config:
            attn_impl: triton

Note that pretrained_model_name_or_path determines what value is passed to Hugging Face's AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=...) when building the model. from_pretrained supports local paths.

For freezing all the weights except the last layer, you can add some custom logic to scripts/train/train.py. The guy in this video does that freezing with our model inside a notebook. Might be a useful reference.

GeorvityLabs · 2023-05-13T03:59:47Z

@baptistejamin is it possible to make a jupyter notebook , that way we could use that to fine-tune MPT-7B using cloud GPUs (paid ofc) , but in a notebook format it will be easy for others to pick up as well.

dydx-git · 2023-05-14T10:13:57Z

@alextrott16 so I managed to successfully run the train.py. Two questions however:

I froze all the weights except the last one and it took maybe 10 seconds to fine-tune on a dataset of just 6 samples. Is that normal? Using an A100-40GB
Where is the updated model saved? I see in the yaml there's a prop called save_folder and having specified that I get a file called "latest-rank0.pt", it's about 23.61 GB. Is this the updated model? How do I use it? Sorry if I'm not asking the right questions.

Thank you!

SoumitriKolavennu · 2023-05-15T11:30:27Z

To set your starting model to our MPT-7B-Instruct model on the Hugging Face hub, you'd use this model config in your YAML
  # Model

  model:

    name: hf_causal_lm

    pretrained_model_name_or_path: mosaicml/mpt-7b-instruct

    device: cpu

    pretrained: true

    # Comment the attn_config block to use default "torch" attention

    attn_config:

      attn_impl: triton
Note that pretrained_model_name_or_path determines what value is passed to Hugging Face's AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=...) when building the model. from_pretrained supports local paths.

Hi, we fine tuned an instruction model without freezing any weights. The resulting checkpoint is 77GB ! How do we convert it back to the hf-format and use it ?

Also, when trying to fine-tune the training loss profile is way better with
name: mpt_causal_lm than with hf_causal_lm. What is the difference?

Thank you.

abhi-mosaic · 2023-05-18T20:36:48Z

Hi, we fine tuned an instruction model without freezing any weights. The resulting checkpoint is 77GB ! How do we convert it back to the hf-format and use it ?

Hi @SoumitriKolavennu , the 77GB is expected given that it is a 7B model and the Composer checkpoint holds both model and optimizer state. To convert to a HF checkpoint folder, you can use the instructions in the scripts/inference folder: https://github.com/mosaicml/llm-foundry/tree/main/scripts/inference#converting-a-composer-checkpoint-to-an-hf-checkpoint-folder

Also, when trying to fine-tune the training loss profile is way better with
name: mpt_causal_lm than with hf_causal_lm. What is the difference?

Could you clarify exactly what you see different between the training loss profiles (with screenshots)? In one case (mpt_causal_lm) , you are probably initializing a from-scratch MPT and finetuning it. In the other case (hf_causal_lm pointed at the HF model mosaicml/mpt-7b-instruct) you are starting from the pretrained weights of our MPT-7B-Instruct model on the HF Hub. I would expect the latter to have much lower initial loss and result in a higher quality model

abhi-mosaic · 2023-05-29T18:34:31Z

Hi @SoumitriKolavennu , I'm closing this issue for now but feel free to reopen if you need further assistance.

SoumitriKolavennu · 2023-05-29T23:19:03Z

Hi @SoumitriKolavennu , I'm closing this issue for now but feel free to reopen if you need further assistance.

Hi @abhi-mosaic, Thank you for the help in converting to huggingface. Following your instructions worked great. One minor suggestion would be to include the converter in training folder instead of the inference folder. My other question about name is still relevant but perhaps it deserves a thread of its own. Please close this issue and thank you for the help.

VRSEN · 2023-06-08T08:35:50Z

Hey guys, I just made a video on how to do this in google collab: https://youtu.be/3de0Utr9XnI
Hope it helps!

tirthajyoti · 2023-06-19T20:14:03Z

Hey guys, I just made a video on how to do this in google collab: https://youtu.be/3de0Utr9XnI Hope it helps!

@VRSEN can you make a Colab or video by generalizing the input data preprocessing a bit more. For example, if we want to fine-tune for a news summarization task, how would I preprocess the dataset (e.g., HF dataset multi_news) which has two columns only "document" and "summary"?

abhi-mosaic closed this as completed May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fine-tune instruct mpt-7b model? #108

How to fine-tune instruct mpt-7b model? #108

dydx-git commented May 11, 2023

baptistejamin commented May 11, 2023

alextrott16 commented May 11, 2023 •

edited

GeorvityLabs commented May 13, 2023

dydx-git commented May 14, 2023 •

edited

SoumitriKolavennu commented May 15, 2023

abhi-mosaic commented May 18, 2023 •

edited

abhi-mosaic commented May 29, 2023

SoumitriKolavennu commented May 29, 2023

VRSEN commented Jun 8, 2023

tirthajyoti commented Jun 19, 2023

How to fine-tune instruct mpt-7b model? #108

How to fine-tune instruct mpt-7b model? #108

Comments

dydx-git commented May 11, 2023

baptistejamin commented May 11, 2023

alextrott16 commented May 11, 2023 • edited

GeorvityLabs commented May 13, 2023

dydx-git commented May 14, 2023 • edited

SoumitriKolavennu commented May 15, 2023

abhi-mosaic commented May 18, 2023 • edited

abhi-mosaic commented May 29, 2023

SoumitriKolavennu commented May 29, 2023

VRSEN commented Jun 8, 2023

tirthajyoti commented Jun 19, 2023

alextrott16 commented May 11, 2023 •

edited

dydx-git commented May 14, 2023 •

edited

abhi-mosaic commented May 18, 2023 •

edited