Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training yolov5 model appears nan #20

Closed
xialuxi opened this issue Nov 29, 2022 · 6 comments
Closed

Training yolov5 model appears nan #20

xialuxi opened this issue Nov 29, 2022 · 6 comments

Comments

@xialuxi
Copy link

xialuxi commented Nov 29, 2022

No description provided.

@xialuxi
Copy link
Author

xialuxi commented Nov 29, 2022

ADan:
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.19G 0.05632 0.0556 0.01426 0.1152 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.19G nan nan nan nan 107 416

SGD
Epoch gpu_mem box obj cls kps labels img_size
4/399 3.02G 0.05808 0.0563 0.01553 0.1285 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.02G 0.05504 0.0543 0.0144 0.1113 107 416

Epoch gpu_mem box obj cls kps labels img_size
6/399 3.02G 0.05382 0.05239 0.0139 0.1092 380 416

@XingyuXie
Copy link
Collaborator

Hi, @xialuxi,
I suggest using a smaller LR of Adan to train.
BZW, it seems that most yoloVx models are trained by SGD.
The previous research may encounter some problems in using adaptive optimizers such as Adam to train yolo models. Many issues in the repo of yoloV7 mention that Adam may not provide good results, see WongKinYiu/yolov7#702 (comment) and WongKinYiu/yolov7#730 (comment).

So I suggest tuning the LR and weight-decay based on yoloVx with the official Adam's setting.
If the final result is unsatisfactory, it may cause by Adan or Adam, but because of the difference between the adaptive type optimizer and the SGD type optimizer.

If you still need further help with using Adan and parameter tuning, please don't hesitate to leave a message here. Or add my WeChat: xyxie_joy.

@xialuxi
Copy link
Author

xialuxi commented Nov 29, 2022

Adam:
Epoch gpu_mem box obj cls kps labels img_size
3/399 3.22G 0.06394 0.05859 0.01641 0.1517 173 416

Epoch gpu_mem box obj cls kps labels img_size
4/399 3.22G 0.06176 0.05778 0.0158 0.1367 129 416

Epoch gpu_mem box obj cls kps labels img_size
5/399 3.22G 0.05997 0.05672 0.01512 0.1282 107 416

Epoch gpu_mem box obj cls kps labels img_size
6/399 3.22G 0.05835 0.05481 0.01452 0.1185 112 416

@XingyuXie
Copy link
Collaborator

Dear @xialuxi,
What is your hyper-parameter of Adan and Adam? Could you please paste them here?
So I may provide a more reasonable LR, Wd, or clip for Adan.

Best

@xialuxi
Copy link
Author

xialuxi commented Nov 30, 2022

SGD:
lr = 0.01, momentum=0.937, weight_decay=0.0005, nesterov=True
Adam:
lr = 0.01, betas=(0.937, 0.999), weight_decay=0.0005
Adan:
lr = 0.01, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.0005

@XingyuXie
Copy link
Collaborator

@xialuxi you may try lr = 1e-3, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.02 for Adan.
As usual, the LR for the adaptive optimizer should be 10-100x times smaller than the LR used in SGD. Thus we suggest lr = 1e-3 for Adan.
The reason that we set wd=0.02 is due to the use of decoupled weight decay.

@xialuxi xialuxi closed this as completed Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants