-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about training speed. #5
Comments
Can your dataloaders catch up? i.e. are the GPUs at (almost) full load all the time? In any case, it should take much less than 100h for s0 even with 2x 1080Ti. The most probable reason is dataloader bottleneck. |
I think it's highly likely, cause my GPU sometimes far lower than 100%. |
I find bigger num_works really speed things up in my case. |
I think the general wisdom is to use higher OMP_NUM_THREADS and num_workers when you have more free CPU cores available. |
OK. More num_works exactly helps, and OMP_NUM_THREADS=4 (1,8,16 will be slower even) as your original setting is the fastest in my case. |
BTW you can try adding the |
Thank you, I tried it but not really effective. |
Sorry, it should be a 25% improvement about speed. (bigger num_workers=16, 1*3090, --nproc_per_node=1, bs=16). |
With 1x 3090 I am getting around 0.7 for [time]. |
Hmm, it's actually 0.7 around the start of training and stabilizes around 0.5. |
I compared 22080ti with 13090, and result is 1×3090 is a little faster than 2×2080ti. |
It refers to s0+s3. I guess hardware infrastructure affects the training speed a lot. |
Ok. Thank you ! |
May I ask the training time for stage 0 after you use bigger num_workers? (num_workers=16, 1*3090, --nproc_per_node=1, bs=16). @BWYWTB |
First of all, thank you for your great work ! ! !
Conduct the training s0 & s2 with 2×2080Ti should take 30h as your paper. But in practice, I will take 100h just for s0 with2×2080ti (or 1*3090).
So I wanted to confirm the training speed. Or maybe what's wrong with me?
The text was updated successfully, but these errors were encountered: