Question of training utilizing A6000 #71

xogud3373 · 2023-11-15T08:32:47Z

Hello, first of all, I would like to express my deep gratitude for your excellent research.

I'm currently conducting training with A6000 x 8 GPUs.
But, I got below errors.

Is there a way to resolve this issue by not using flash-attention or by modifying another part of the code??

I did below train code.

torchrun --nproc_per_node 8 --master_port 29001 video_chatgpt/train/train_mem.py \
          --model_name_or_path ./LLaVA-Lightning-7B-v1-1 \
          --version v1 \
          --data_path video_chatgpt_training.json \
          --video_folder st_outputs1 \
          --tune_mm_mlp_adapter True \
          --mm_use_vid_start_end \
          --bf16 True \
          --output_dir ./Video-ChatGPT_7B-1.1_Checkpoints \
          --num_train_epochs 3 \
          --per_device_train_batch_size 4 \
          --per_device_eval_batch_size 4 \
          --gradient_accumulation_steps 8 \
          --evaluation_strategy "no" \
          --save_strategy "steps" \
          --save_steps 3000 \
          --save_total_limit 3 \
          --learning_rate 2e-5 \
          --weight_decay 0. \
          --warmup_ratio 0.03 \
          --lr_scheduler_type "cosine" \
          --logging_steps 100 \
          --tf32 True \
          --model_max_length 2048 \
          --gradient_checkpointing True \
          --lazy_preprocess True

The text was updated successfully, but these errors were encountered:

leesungjae7469 · 2023-11-15T08:36:24Z

When i tried to train this model, i couldn't train with A6000.

CallmeBOKE · 2023-11-15T08:44:45Z

Same issue here.

JakePark-Kor · 2023-11-15T08:55:14Z

I met same issue, if anyone has found the solution of it plz share :)

xogud3373 · 2023-11-15T12:17:23Z

I removed a 'replace_llama_attn_with_flash_attn()' statement from the 'video_chatgpt/train/train_mem.py' path and then the training proceeded. Could removing this code cause any issues with performance?

Abyss-J · 2023-12-03T17:53:19Z

I used A40 GPUs and got same issue here. How should I solve this problem？

mmaaz60 · 2024-04-12T19:04:06Z

Hi @everyone,

Flash Attention only works on A100 or H100. In case if you want to train on any other GPU, commenting out the line at

Video-ChatGPT/video_chatgpt/train/train_mem.py

Line 4 in f27bf8c

replace_llama_attn_with_flash_attn()

should work. Thanks and Good Luck!

Please let me know if you will have any questions.

mmaaz60 closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question of training utilizing A6000 #71

Question of training utilizing A6000 #71

xogud3373 commented Nov 15, 2023 •

edited

Loading

leesungjae7469 commented Nov 15, 2023 •

edited

Loading

CallmeBOKE commented Nov 15, 2023

JakePark-Kor commented Nov 15, 2023

xogud3373 commented Nov 15, 2023

Abyss-J commented Dec 3, 2023

mmaaz60 commented Apr 12, 2024

Question of training utilizing A6000 #71

Question of training utilizing A6000 #71

Comments

xogud3373 commented Nov 15, 2023 • edited Loading

leesungjae7469 commented Nov 15, 2023 • edited Loading

CallmeBOKE commented Nov 15, 2023

JakePark-Kor commented Nov 15, 2023

xogud3373 commented Nov 15, 2023

Abyss-J commented Dec 3, 2023

mmaaz60 commented Apr 12, 2024

xogud3373 commented Nov 15, 2023 •

edited

Loading

leesungjae7469 commented Nov 15, 2023 •

edited

Loading