Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train speed is too slow #9

Open
jkl375 opened this issue Apr 8, 2024 · 2 comments
Open

train speed is too slow #9

jkl375 opened this issue Apr 8, 2024 · 2 comments

Comments

@jkl375
Copy link

jkl375 commented Apr 8, 2024

I found that when the context length is 512k, the training speed is too slow, which is different from your experimental results. It takes 585 seconds for training a batch of 512k , which is to 512000/585.85=873.94 tokens/s
And I used A100-80G*8 with NVLINK.

accelerate launch \
--config_file accelerate_configs/single_node.yaml \
train.py \
--batch-size 1 \
--gradient-accumulate-every 2  \
--output-dir  ./output/7B_0.5M_bs_1M_rope_250M_step_90_lr_2e-5 \
--seed 2027 \
--max-train-steps 90  \
--learning-rate 1e-5  \
--dataset PY007/slimpajama_llama_tokenized_upsample_4096_chunk_1M \
--model meta-llama/Llama-2-7b-hf  \
--seq-length 512000 \
--rope-theta 250000000 \
--parallel_mode zigzag_ring_attn

image

@jzhang38
Copy link
Owner

jzhang38 commented Apr 8, 2024

gradient-accumulate-every is set to two. So it should be 512000* 2 /585.85

@jkl375
Copy link
Author

jkl375 commented Apr 8, 2024

gradient-accumulate-every is set to two. So it should be 512000* 2 /585.85
I see, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants