OOM on two 80GB GPUs #49

kyleliang919 · 2024-01-09T21:20:32Z

accelerate launch finetune.py \
    --output-dir output/mistral-yarn-7b-64k \
    --model mistralai/Mistral-7B-v0.1 \
    --architecture mistral \
    --scaling-factor 2 \
    --max-position-embeddings 16384 \
    --dataset emozilla/yarn-train-tokenized-8k-mistral \
    --sliding-window-attention-schedule 4096 \
    --lr-schedule constant \
    --learning-rate 0.000001 \
    --max-train-steps 1000

Both with or without lora hits the OOM error, this is on only 8K sequence length, so memory consumption should be around 4x smaller compared with training on 16K sequence length.

accelerate is configured to use two GPU and FSDP.

edisonzf2020 · 2024-02-15T11:23:24Z

+1

TracyPlus · 2024-04-08T02:20:44Z

+1

YL-9 · 2024-05-10T17:16:11Z

accelerate launch finetune.py \
    --output-dir output/mistral-yarn-7b-64k \
    --model mistralai/Mistral-7B-v0.1 \
    --architecture mistral \
    --scaling-factor 2 \
    --max-position-embeddings 16384 \
    --dataset emozilla/yarn-train-tokenized-8k-mistral \
    --sliding-window-attention-schedule 4096 \
    --lr-schedule constant \
    --learning-rate 0.000001 \
    --max-train-steps 1000
Both with or without lora hits the OOM error, this is on only 8K sequence length, so memory consumption should be around 4x smaller compared with training on 16K sequence length.

accelerate is configured to use two GPU and FSDP.

I also encountered this problem, have you solved it now? @kyleliang919 @edisonzf2020

kyleliang919 · 2024-05-10T22:50:43Z

unfortunately no, I think you probably need at least 320 GB to handle this run.

YL-9 · 2024-05-13T09:41:10Z

unfortunately no, I think you probably need at least 320 GB to handle this run.

thank you for your reply.
I have 4xA100, but there is a process on each GPU, so it's still OOM, like 2xA100, I don't know how to configure. QAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM on two 80GB GPUs #49

OOM on two 80GB GPUs #49

kyleliang919 commented Jan 9, 2024 •

edited

Loading

edisonzf2020 commented Feb 15, 2024

TracyPlus commented Apr 8, 2024

YL-9 commented May 10, 2024

kyleliang919 commented May 10, 2024

YL-9 commented May 13, 2024

OOM on two 80GB GPUs #49

OOM on two 80GB GPUs #49

Comments

kyleliang919 commented Jan 9, 2024 • edited Loading

edisonzf2020 commented Feb 15, 2024

TracyPlus commented Apr 8, 2024

YL-9 commented May 10, 2024

kyleliang919 commented May 10, 2024

YL-9 commented May 13, 2024

kyleliang919 commented Jan 9, 2024 •

edited

Loading