model can‘t converge #9

luhavefun · 2022-08-17T07:13:14Z

Thanks for your great work. I meet some troubles in training on HO3D-v2. I trained the model according to the given steps, but found that the model did not converge properly. Here is the logs from the last epoch:
�[92m08-09 04:21:59�[0m Epoch 69/70 itr 4123/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 5.1809 loss_mano_joints: 5.4888 loss_mano_pose: 0.3954 loss_mano_shape: 0.2697 loss_joints_img: 3.1225
�[92m08-09 04:22:00�[0m Epoch 69/70 itr 4124/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 11.5330 loss_mano_joints: 12.5328 loss_mano_pose: 0.6882 loss_mano_shape: 0.3608 loss_joints_img: 4.0549
�[92m08-09 04:22:00�[0m Epoch 69/70 itr 4125/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 4.1236 loss_mano_joints: 4.4260 loss_mano_pose: 0.3778 loss_mano_shape: 0.5172 loss_joints_img: 3.0202
�[92m08-09 04:22:00�[0m Epoch 69/70 itr 4126/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 23.6163 loss_mano_joints: 25.9149 loss_mano_pose: 0.5916 loss_mano_shape: 0.2481 loss_joints_img: 3.1539
�[92m08-09 04:22:00�[0m Epoch 69/70 itr 4127/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 10.1710 loss_mano_joints: 10.8513 loss_mano_pose: 0.5306 loss_mano_shape: 0.2882 loss_joints_img: 4.7335

namepllet · 2022-08-17T10:11:04Z

We trained our model with 4 GPUs.

So the effective batch size is 64 (16 per GPU * 4 GPUs).

I think you trained model with single GPU and set learning rate as 1e-4.

If you'd like to train model with single GPU, please set learning rate as 1e-4*1/4.

luhavefun · 2022-08-17T15:00:16Z

Thank you for your reply! It is stable at present (20epochs) after changing the lr.

namepllet closed this as completed Sep 17, 2022

namepllet mentioned this issue Sep 28, 2022

Reproduce DexYCB results #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model can‘t converge #9

model can‘t converge #9

luhavefun commented Aug 17, 2022

namepllet commented Aug 17, 2022

luhavefun commented Aug 17, 2022

model can‘t converge #9

model can‘t converge #9

Comments

luhavefun commented Aug 17, 2022

namepllet commented Aug 17, 2022

luhavefun commented Aug 17, 2022