[Question] LLaVA Pretraining with Mixtral 8×7B #1417

ShawnAn-WHU · 2024-04-17T13:19:40Z

Question

Does anyone have carried out the pretraining with Mixtral 8×7B? When I run the petraining script, one problem occured like the figure shown below. I just add a llava_mixtral.py to the llava/model/language_model and some necessary supplementary code.

martinakaduc · 2024-04-18T03:08:27Z

I have not faced this issue. Can you give me the reproducing command.

ShawnAn-WHU · 2024-04-18T04:13:25Z

@martinakaduc Thank you very much for your prompt reply! Below is my pretraining script. The --model_name_or_path is the model I downloaded from HF mistralai/Mixtral-8x7B-v0.1. Despite the warnings, running this script will produce a mm_projector.bin file. When pretraining, the loss decreases from ~15 to ~6 and does not decrease any more. Can you figure out the problem?

martinakaduc · 2024-04-18T06:36:17Z

Have you merged my pull request about adding mixtral? If not, you can use my modified repo here: https://github.com/martinakaduc/LLaVA

My pretraining script:
deepspeed llava/train/train_mem.py --deepspeed ./scripts/zero3_offload.json --model_name_or_path mistralai/Mixtral-8x7B-Instruct-v0.1 --version plain --data_path ./playground/data/LLaVA-Pretrain/blip_laion_cc_sbu_558k.json --image_folder ./playground/data/LLaVA-Pretrain/images --vision_tower openai/clip-vit-large-patch14-336 --mm_projector_type mlp2x_gelu --tune_mm_mlp_adapter True --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --bf16 True --output_dir ./checkpoints/Mixtral-pt --num_train_epochs 1 --per_device_train_batch_size 32 --per_device_eval_batch_size 4 --gradient_accumulation_steps 2 --evaluation_strategy "no" --save_strategy "steps" --save_steps 200 --save_total_limit 2 --learning_rate 1e-3 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 True --model_max_length 32768 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to neptune

And fine-tuning script:
deepspeed llava/train/train_mem.py --deepspeed ./scripts/zero3_offload.json --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 --model_name_or_path mistralai/Mixtral-8x7B-Instruct-v0.1 --version mistral_instruct --data_path ./playground/data/llava_v1_5_mix665k.json --image_folder ./playground/data --vision_tower openai/clip-vit-large-patch14-336 --pretrain_mm_mlp_adapter ./checkpoints/Mixtral-pt/checkpoint-400/mm_projector.bin --mm_projector_type mlp2x_gelu --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad --group_by_modality_length True --bf16 True --output_dir ./checkpoints/Mixtral-sft --num_train_epochs 1 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "steps" --save_steps 200 --save_total_limit 2 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 True --model_max_length 32768 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to neptune

ShawnAn-WHU · 2024-04-18T06:50:54Z

@martinakaduc Thank you! I will try it now and find out the problem!

accupham · 2024-04-18T15:21:50Z

Interesting, would pretraining on mixtral-8x-22b also be possible?

martinakaduc · 2024-04-18T16:06:34Z

I think it is possible. However I have not tested yet.

ShawnAn-WHU · 2024-04-25T11:40:39Z

@martinakaduc Hi, I'm using your pretrained MixSUraV model downloaded from HF to finetune on my own dataset. The script I use is like Figure 1, is it correct? If correct, I found it infeasible when using 8 3090 GPUs (24G) even with 4-bit quantification (set --bits 4, like the red rectangle in the figure). The code for model loading is like Figure 2. However, when I use the code shown in Figure 3, only 3 GPUs are more than enough (may be 1 is ok). Is there any difference between these two codes? And could you please tell me your finetuning script and computational resources needed if you have done this? Thank tou so much!

fisher75 · 2024-04-25T13:12:45Z

Hi, how do you know the training was effecitve? Did you use the default training setting? I LoRA with default parameters and basically no improvement.

ShawnAn-WHU · 2024-04-26T07:06:49Z

Hi, how do you know the training was effecitve? Did you use the default training setting? I LoRA with default parameters and basically no improvement.

@fisher75 I have LoRA finetuned with my own dataset using LLaVA-v1.5 and the qualitative results are better than the original LLaVA-v1.5.

fisher75 · 2024-04-26T07:11:36Z

ShawnAn-WHU

Hi @ShawnAn-WHU thanks for your reply. I am also working on this, may I ask is the improvement is very obvious? May I see the training and inference scripts(mostly I am curious about the parameter settings), btw if possible, may I add your WeChat? Could be very helpful to share some details.

ShawnAn-WHU · 2024-04-26T09:01:06Z

@fisher75 Sure, e-mail me your WeChat ID is ok.

This was referenced Apr 17, 2024

[Usage] Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint When I try to pretrain model. #650

Open

[Feature] Adding support for Mixtral and Gemma models #1247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] LLaVA Pretraining with Mixtral 8×7B #1417

[Question] LLaVA Pretraining with Mixtral 8×7B #1417

ShawnAn-WHU commented Apr 17, 2024

martinakaduc commented Apr 18, 2024

ShawnAn-WHU commented Apr 18, 2024

martinakaduc commented Apr 18, 2024 •

edited

Loading

ShawnAn-WHU commented Apr 18, 2024

accupham commented Apr 18, 2024 •

edited

Loading

martinakaduc commented Apr 18, 2024

ShawnAn-WHU commented Apr 25, 2024

fisher75 commented Apr 25, 2024

ShawnAn-WHU commented Apr 26, 2024

fisher75 commented Apr 26, 2024

ShawnAn-WHU commented Apr 26, 2024

[Question] LLaVA Pretraining with Mixtral 8×7B #1417

[Question] LLaVA Pretraining with Mixtral 8×7B #1417

Comments

ShawnAn-WHU commented Apr 17, 2024

Question

martinakaduc commented Apr 18, 2024

ShawnAn-WHU commented Apr 18, 2024

martinakaduc commented Apr 18, 2024 • edited Loading

ShawnAn-WHU commented Apr 18, 2024

accupham commented Apr 18, 2024 • edited Loading

martinakaduc commented Apr 18, 2024

ShawnAn-WHU commented Apr 25, 2024

fisher75 commented Apr 25, 2024

ShawnAn-WHU commented Apr 26, 2024

fisher75 commented Apr 26, 2024

ShawnAn-WHU commented Apr 26, 2024

martinakaduc commented Apr 18, 2024 •

edited

Loading

accupham commented Apr 18, 2024 •

edited

Loading