train speed is too slow #9

jkl375 · 2024-04-08T12:18:53Z

I found that when the context length is 512k, the training speed is too slow, which is different from your experimental results. It takes 585 seconds for training a batch of 512k , which is to 512000/585.85=873.94 tokens/s
And I used A100-80G*8 with NVLINK.

accelerate launch \
--config_file accelerate_configs/single_node.yaml \
train.py \
--batch-size 1 \
--gradient-accumulate-every 2  \
--output-dir  ./output/7B_0.5M_bs_1M_rope_250M_step_90_lr_2e-5 \
--seed 2027 \
--max-train-steps 90  \
--learning-rate 1e-5  \
--dataset PY007/slimpajama_llama_tokenized_upsample_4096_chunk_1M \
--model meta-llama/Llama-2-7b-hf  \
--seq-length 512000 \
--rope-theta 250000000 \
--parallel_mode zigzag_ring_attn

The text was updated successfully, but these errors were encountered:

jzhang38 · 2024-04-08T12:31:23Z

gradient-accumulate-every is set to two. So it should be 512000* 2 /585.85

jkl375 · 2024-04-08T12:36:48Z

gradient-accumulate-every is set to two. So it should be 512000* 2 /585.85
I see, thanks.

puppet101 mentioned this issue Apr 10, 2024

error when finetuning yi-34b #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train speed is too slow #9

train speed is too slow #9

jkl375 commented Apr 8, 2024 •

edited

Loading

jzhang38 commented Apr 8, 2024

jkl375 commented Apr 8, 2024

train speed is too slow #9

train speed is too slow #9

Comments

jkl375 commented Apr 8, 2024 • edited Loading

jzhang38 commented Apr 8, 2024

jkl375 commented Apr 8, 2024

jkl375 commented Apr 8, 2024 •

edited

Loading