Training yolov5 model appears nan #20

xialuxi · 2022-11-29T08:37:42Z

No description provided.

xialuxi · 2022-11-29T08:39:52Z

ADan:
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.19G 0.05632 0.0556 0.01426 0.1152 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.19G nan nan nan nan 107 416

SGD
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.02G 0.05808 0.0563 0.01553 0.1285 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.02G 0.05504 0.0543 0.0144 0.1113 107 416

Epoch gpu_mem box obj cls kps labels img_size
6/399 3.02G 0.05382 0.05239 0.0139 0.1092 380 416

XingyuXie · 2022-11-29T09:49:41Z

Hi, @xialuxi,
I suggest using a smaller LR of Adan to train.
BZW, it seems that most yoloVx models are trained by SGD.
The previous research may encounter some problems in using adaptive optimizers such as Adam to train yolo models. Many issues in the repo of yoloV7 mention that Adam may not provide good results, see WongKinYiu/yolov7#702 (comment) and WongKinYiu/yolov7#730 (comment).

So I suggest tuning the LR and weight-decay based on yoloVx with the official Adam's setting.
If the final result is unsatisfactory, it may cause by Adan or Adam, but because of the difference between the adaptive type optimizer and the SGD type optimizer.

If you still need further help with using Adan and parameter tuning, please don't hesitate to leave a message here. Or add my WeChat: xyxie_joy.

xialuxi · 2022-11-29T11:17:24Z

Adam:
Epoch gpu_mem box obj cls kps labels img_size
3/399 3.22G 0.06394 0.05859 0.01641 0.1517 173 416

Epoch gpu_mem box obj cls kps labels img_size
4/399 3.22G 0.06176 0.05778 0.0158 0.1367 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.22G 0.05997 0.05672 0.01512 0.1282 107 416

Epoch gpu_mem box obj cls kps labels img_size
6/399 3.22G 0.05835 0.05481 0.01452 0.1185 112 416

XingyuXie · 2022-11-29T12:11:40Z

Dear @xialuxi,
What is your hyper-parameter of Adan and Adam? Could you please paste them here?
So I may provide a more reasonable LR, Wd, or clip for Adan.

Best

xialuxi · 2022-11-30T01:09:57Z

SGD:
lr = 0.01, momentum=0.937, weight_decay=0.0005, nesterov=True
Adam:
lr = 0.01, betas=(0.937, 0.999), weight_decay=0.0005
Adan:
lr = 0.01, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.0005

XingyuXie · 2022-11-30T07:26:38Z

@xialuxi you may try lr = 1e-3, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.02 for Adan.
As usual, the LR for the adaptive optimizer should be 10-100x times smaller than the LR used in SGD. Thus we suggest lr = 1e-3 for Adan.
The reason that we set wd=0.02 is due to the use of decoupled weight decay.

xialuxi closed this as completed Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training yolov5 model appears nan #20

Training yolov5 model appears nan #20

xialuxi commented Nov 29, 2022

xialuxi commented Nov 29, 2022 •

edited

XingyuXie commented Nov 29, 2022

xialuxi commented Nov 29, 2022 •

edited

XingyuXie commented Nov 29, 2022

xialuxi commented Nov 30, 2022

XingyuXie commented Nov 30, 2022

Training yolov5 model appears nan #20

Training yolov5 model appears nan #20

Comments

xialuxi commented Nov 29, 2022

xialuxi commented Nov 29, 2022 • edited

XingyuXie commented Nov 29, 2022

xialuxi commented Nov 29, 2022 • edited

XingyuXie commented Nov 29, 2022

xialuxi commented Nov 30, 2022

XingyuXie commented Nov 30, 2022

xialuxi commented Nov 29, 2022 •

edited

xialuxi commented Nov 29, 2022 •

edited