About GOT-10k train and test #14

wjc0602 · 2021-11-09T05:37:43Z

Thanks for your excellent work. I meet a problem when I want to reproduce the results in the paper. When I used GOT alone for training, the iou suppression remained at about 0.38 and there was no improvement. I wonder if there was something wrong with the configuration?

fzh0917 · 2021-11-10T03:45:24Z

The hyper-parameter batch_size and start_lr are important for training. What are your batch_size and start_lr? 32 and 1e-6 are recommended.
If you have difficulty in setting the batch_size to 32 due the limitation of hardwares or something else, you can try to increase the start_lr to 1e-2.

Good luck!

wjc0602 · 2021-11-10T04:11:37Z

The hyper-parameter batch_size and start_lr are important for training. What are your batch_size and start_lr? 32 and 1e-6 are recommended. If you have difficulty in setting the batch_size to 32 due the limitation of hardwares or something else, you can try to increase the start_lr to 1e-2.

Good luck!

Thanks for your reply. I set the batch_size and start_lr to 32 and 1e-2, I didn't change the Settings in the 'stmtrack-googlenet-trn' in the got10k flord. I trained it on 3 Tesla V100. Is it because of equipment problems?

fzh0917 · 2021-11-10T10:25:45Z

Can the model converge if you use two GPUs?

hekaijie123 · 2021-12-17T13:08:45Z

I ran into a similar problem.The first time I ran with one RTX3090,and just set "amp" to "True" , "num_processes" to "1" and "num_workers" to "16" .Keep the default Settings for others.It train in GOT,and the results are shown below.
"ao": 0.8214040485044509,
"sr50": 0.9221410022121765,
"sr75": 0.8248533230739636,
"speed_fps": 30.02842858137559,

The second time,I use two RTX3090. I set the "amp" is "False","num_processes" to "2" and "num_workers" to "16" .Keep the default Settings for others.I just want to see the influence of "amp",but the model didn't converge this time:
"ao": 0.18841102443887056,
"sr50": 0.08507261710108685,
"sr75": 0.01986149850918534,
"speed_fps": 31.019533580571974,
When the model is training ,just "cls" and "ctr" reduce , but "reg" and "iou" seem hard to change.

luhannan · 2022-03-17T08:18:35Z

I met the same problem, the model trained with multi-GPU didn`t converge, with or without synchronized-BN. Though it seems converge when trained with 1 gpu

Kevoen · 2022-04-13T09:36:52Z

I ran into a similar problem.The first time I ran with one RTX3090,and just set "amp" to "True" , "num_processes" to "1" and "num_workers" to "16" .Keep the default Settings for others.It train in GOT,and the results are shown below. "ao": 0.8214040485044509, "sr50": 0.9221410022121765, "sr75": 0.8248533230739636, "speed_fps": 30.02842858137559,

The second time,I use two RTX3090. I set the "amp" is "False","num_processes" to "2" and "num_workers" to "16" .Keep the default Settings for others.I just want to see the influence of "amp",but the model didn't converge this time: "ao": 0.18841102443887056, "sr50": 0.08507261710108685, "sr75": 0.01986149850918534, "speed_fps": 31.019533580571974, When the model is training ,just "cls" and "ctr" reduce , but "reg" and "iou" seem hard to change.

I met the same problem, the model trained with multi-GPU didn`t converge, with or without synchronized-BN. Though it seems to converge when trained with 1 GPU

The main reason is that training with multiple GPU requires rewriting the training code, and the author only provides a single GPU training code. Since the authors' code framework is video_analyst, I refer to the (main/dist-train.py) distributed training code in video_anaylst for training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About GOT-10k train and test #14

About GOT-10k train and test #14

wjc0602 commented Nov 9, 2021

fzh0917 commented Nov 10, 2021

wjc0602 commented Nov 10, 2021

fzh0917 commented Nov 10, 2021

hekaijie123 commented Dec 17, 2021

luhannan commented Mar 17, 2022

Kevoen commented Apr 13, 2022

About GOT-10k train and test #14

About GOT-10k train and test #14

Comments

wjc0602 commented Nov 9, 2021

fzh0917 commented Nov 10, 2021

wjc0602 commented Nov 10, 2021

fzh0917 commented Nov 10, 2021

hekaijie123 commented Dec 17, 2021

luhannan commented Mar 17, 2022

Kevoen commented Apr 13, 2022