Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time too long #32

Closed
cryboykevin opened this issue Jul 20, 2021 · 4 comments
Closed

Training time too long #32

cryboykevin opened this issue Jul 20, 2021 · 4 comments

Comments

@cryboykevin
Copy link

cryboykevin commented Jul 20, 2021

I train yolox-s with batch size 128 + 8 x V100, and the task takes about 2 days 6 hours for 300 epoch. Is the training time normal ?

@FateScript
Copy link
Member

Please provide your data_time and train_time in log, Thanks

@cryboykevin
Copy link
Author

cryboykevin commented Jul 21, 2021

Please provide your data_time and train_time in log, Thanks

Here are some of training logs:

2021-07-21 04:20:05 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 370/925, mem: 34832Mb, iter_time: 0.692s, data_time: 0.000s, total_loss: 7.1, iou_loss: 2.6, l1_loss: 0.0, conf_loss: 2.8, cls_loss: 1.7, lr: 1.999e-02, size: 768, ETA: 2 days, 9:28:22
1466 2021-07-21 04:20:14 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 380/925, mem: 34832Mb, iter_time: 0.940s, data_time: 0.001s, total_loss: 7.4, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 3.3, cls_loss: 1.7, lr: 1.999e-02, size: 768, ETA: 2 days, 9:29:06
1467 2021-07-21 04:20:22 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 390/925, mem: 34832Mb, iter_time: 0.808s, data_time: 0.000s, total_loss: 7.6, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 3.4, cls_loss: 1.7, lr: 1.999e-02, size: 800, ETA: 2 days, 9:29:10
1468 2021-07-21 04:20:31 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 400/925, mem: 34832Mb, iter_time: 0.837s, data_time: 0.000s, total_loss: 7.5, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 3.3, cls_loss: 1.7, lr: 1.999e-02, size: 512, ETA: 2 days, 9:29:23
1469 2021-07-21 04:20:39 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 410/925, mem: 34832Mb, iter_time: 0.859s, data_time: 0.000s, total_loss: 7.8, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 3.5, cls_loss: 1.8, lr: 1.999e-02, size: 480, ETA: 2 days, 9:29:43
1470 2021-07-21 04:20:46 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 420/925, mem: 34832Mb, iter_time: 0.625s, data_time: 0.000s, total_loss: 7.5, iou_loss: 2.6, l1_loss: 0.0, conf_loss: 3.2, cls_loss: 1.7, lr: 1.999e-02, size: 704, ETA: 2 days, 9:28:51
1471 2021-07-21 04:20:51 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 430/925, mem: 34832Mb, iter_time: 0.544s, data_time: 0.000s, total_loss: 7.5, iou_loss: 2.6, l1_loss: 0.0, conf_loss: 3.1, cls_loss: 1.8, lr: 1.999e-02, size: 512, ETA: 2 days, 9:27:34
1472 2021-07-21 04:21:00 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 440/925, mem: 34832Mb, iter_time: 0.923s, data_time: 0.000s, total_loss: 7.4, iou_loss: 2.4, l1_loss: 0.0, conf_loss: 3.3, cls_loss: 1.7, lr: 1.999e-02, size: 608, ETA: 2 days, 9:28:13
1473 2021-07-21 04:21:07 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 450/925, mem: 34832Mb, iter_time: 0.624s, data_time: 0.000s, total_loss: 7.1, iou_loss: 2.7, l1_loss: 0.0, conf_loss: 2.9, cls_loss: 1.5, lr: 1.999e-02, size: 512, ETA: 2 days, 9:27:21
1474 2021-07-21 04:21:15 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 460/925, mem: 34832Mb, iter_time: 0.871s, data_time: 0.000s, total_loss: 7.5, iou_loss: 2.6, l1_loss: 0.0, conf_loss: 3.2, cls_loss: 1.7, lr: 1.999e-02, size: 800, ETA: 2 days, 9:27:44
1475 2021-07-21 04:21:21 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 470/925, mem: 34832Mb, iter_time: 0.591s, data_time: 0.000s, total_loss: 7.3, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 3.1, cls_loss: 1.7, lr: 1.999e-02, size: 608, ETA: 2 days, 9:26:42
1476 2021-07-21 04:21:30 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 480/925, mem: 34832Mb, iter_time: 0.892s, data_time: 0.000s, total_loss: 7.7, iou_loss: 2.4, l1_loss: 0.0, conf_loss: 3.5, cls_loss: 1.8, lr: 1.999e-02, size: 704, ETA: 2 days, 9:27:11
1477 2021-07-21 04:21:37 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 490/925, mem: 34832Mb, iter_time: 0.725s, data_time: 0.000s, total_loss: 7.3, iou_loss: 2.4, l1_loss: 0.0, conf_loss: 3.2, cls_loss: 1.7, lr: 1.999e-02, size: 672, ETA: 2 days, 9:26:50
1478 2021-07-21 04:21:47 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 500/925, mem: 34832Mb, iter_time: 0.951s, data_time: 0.000s, total_loss: 8.2, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 3.7, cls_loss: 1.9, lr: 1.999e-02, size: 576, ETA: 2 days, 9:27:37
1479 2021-07-21 04:21:55 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 510/925, mem: 34832Mb, iter_time: 0.763s, data_time: 0.000s, total_loss: 7.2, iou_loss: 2.6, l1_loss: 0.0, conf_loss: 3.0, cls_loss: 1.6, lr: 1.999e-02, size: 608, ETA: 2 days, 9:27:28
1480 2021-07-21 04:22:03 | INFO | yolox.core.trainer:248 - epoch: 10/300, iter: 520/925, mem: 34832Mb, iter_time: 0.835s, data_time: 0.000s, total_loss: 7.6, iou_loss: 2.6, l1_loss: 0.0, conf_loss: 3.3, cls_loss: 1.8, lr: 1.999e-02, size: 800, ETA: 2 days, 9:27:40

@GOATmessi7
Copy link
Member

Hi, we reproduce the training setting and your training time seems normal. Actually, if you change yolox-s to yolox-l, the total training time is almost the same! We suppose the major time consuming comes from our data augment operation, and we also have a plan to accelerate it.

@RaymondByc
Copy link

@ruinmessi Hello, do you know what causes the difference in training time between iters? Like the record shown, the max consuming time is 0.940ms, and the min consuming time is 0.544ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants