Question about training speed. #5

PinxueGuo · 2021-06-25T12:20:41Z

First of all, thank you for your great work ! ! !
Conduct the training s0 & s2 with 2×2080Ti should take 30h as your paper. But in practice, I will take 100h just for s0 with2×2080ti (or 1*3090).
So I wanted to confirm the training speed. Or maybe what's wrong with me?

hkchengrex · 2021-06-25T13:18:05Z

Can your dataloaders catch up? i.e. are the GPUs at (almost) full load all the time?
The reported training time is very rough (we used a mix of hardware at different times). We will re-train again and give a better estimate in the next revision of the paper.

In any case, it should take much less than 100h for s0 even with 2x 1080Ti. The most probable reason is dataloader bottleneck.

PinxueGuo · 2021-06-26T05:16:43Z

I think it's highly likely, cause my GPU sometimes far lower than 100%.
Can you give me some suggestions to solve it? Should I change OMP_NUM_THREADS (in command) or num_works (pytorch dataloader) bigger？
Thank you!

PinxueGuo · 2021-06-26T05:35:02Z

I find bigger num_works really speed things up in my case.

hkchengrex · 2021-06-26T06:46:20Z

I think the general wisdom is to use higher OMP_NUM_THREADS and num_workers when you have more free CPU cores available.
That's great to hear.

PinxueGuo · 2021-06-26T06:51:24Z

OK. More num_works exactly helps, and OMP_NUM_THREADS=4 (1,8,16 will be slower even) as your original setting is the fastest in my case.
Thank you for your great work and quick reply !

hkchengrex · 2021-06-26T07:06:42Z

BTW you can try adding the --benchmark flag.

PinxueGuo · 2021-06-26T11:11:47Z

Thank you, I tried it but not really effective.
And bigger num_workers only bring 10% speed improvement .
Could you tell me what's your time consuming in log "retrain_s0 - It ******* [TRAIN] [time ]: ？". In my case , time≈1.0+.

PinxueGuo · 2021-06-26T11:29:49Z

Sorry, it should be a 25% improvement about speed. (bigger num_workers=16, 1*3090, --nproc_per_node=1, bs=16).
log：retrain_s0 - It 51300 [TRAIN] [time ]: 1.0771173

hkchengrex · 2021-06-26T14:53:09Z

With 1x 3090 I am getting around 0.7 for [time].
2x 2080Ti should be faster than 1x 3090.

hkchengrex · 2021-06-26T14:58:19Z

Hmm, it's actually 0.7 around the start of training and stabilizes around 0.5.

PinxueGuo · 2021-06-26T16:15:04Z

I compared 22080ti with 13090, and result is 1×3090 is a little faster than 2×2080ti.
If the [time] is round 0.5, s0 need 45 hours right?
In my case, s3 exactly need 30 hours. So I wanna to confirm "Regular training without BL30K takes around 30 hours"(in paper), 30-hour is refer to s3 or s0+s3？

hkchengrex · 2021-06-26T17:43:02Z

It refers to s0+s3. I guess hardware infrastructure affects the training speed a lot.

PinxueGuo · 2021-06-28T05:12:24Z

Ok. Thank you !

zhouweii234 · 2021-07-17T04:32:50Z

May I ask the training time for stage 0 after you use bigger num_workers? (num_workers=16, 1*3090, --nproc_per_node=1, bs=16). @BWYWTB

PinxueGuo closed this as completed Jun 26, 2021

PinxueGuo reopened this Jun 26, 2021

PinxueGuo closed this as completed Jun 29, 2021

bellos1203 mentioned this issue Jul 13, 2021

A quick question about the code and the training details #17

Closed

1359347500cwc mentioned this issue Jul 16, 2021

RuntimeError: CUDA error: the launch timed out and was terminated #21

Closed

zhouweii234 mentioned this issue Jul 17, 2021

The training time is too long #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about training speed. #5

Question about training speed. #5

PinxueGuo commented Jun 25, 2021 •

edited

Loading

hkchengrex commented Jun 25, 2021

PinxueGuo commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021

hkchengrex commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021

hkchengrex commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021 •

edited

Loading

hkchengrex commented Jun 26, 2021

hkchengrex commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021 •

edited

Loading

hkchengrex commented Jun 26, 2021

PinxueGuo commented Jun 28, 2021

zhouweii234 commented Jul 17, 2021 •

edited

Loading

Question about training speed. #5

Question about training speed. #5

Comments

PinxueGuo commented Jun 25, 2021 • edited Loading

hkchengrex commented Jun 25, 2021

PinxueGuo commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021

hkchengrex commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021

hkchengrex commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021 • edited Loading

hkchengrex commented Jun 26, 2021

hkchengrex commented Jun 26, 2021

PinxueGuo commented Jun 26, 2021 • edited Loading

hkchengrex commented Jun 26, 2021

PinxueGuo commented Jun 28, 2021

zhouweii234 commented Jul 17, 2021 • edited Loading

PinxueGuo commented Jun 25, 2021 •

edited

Loading

PinxueGuo commented Jun 26, 2021 •

edited

Loading

PinxueGuo commented Jun 26, 2021 •

edited

Loading

zhouweii234 commented Jul 17, 2021 •

edited

Loading