About the KD Loss on the RetinaNet One-Stage Object Detectors #4

JCZ404 · 2022-11-25T12:33:40Z

Hi~, Thanks for such great work! I saw you released the baseline performance of the vanilla KD on the one-stage detector RetinaNet, I wonder how this method is applied. Since the classification prediction of RetinaNet is activated by sigmoid and formulated as multiple binary classification problems solved with Focal Loss, it seems we can not use the vanilla KD on these classification outputs. The output processed by sigmoid, for example: [0.4, 0.7, 0.3, 0.2], is not sum up to 1, obviously. So, how the vanilla KD with KLDiv loss is applied under such a situation? Thanks.

hunto · 2022-11-26T05:45:49Z

Hi @Zhangjiacheng144 ,

Thanks for your attention of our work. The distillation on sigmoid probabilistic distributions can also be conducted using KL divergence loss (see Kullback–Leibler divergence, it does not require the sum of the vector to be 1). We use Equation (1) as the form of sigmoid KD loss in our RetinaNet experiments and only replace the softmax function to sigmoid.

BTW, for detectors using sigmoid focal loss, it is more practical to use the above sigmoid KL divergence loss with focal weight.

JCZ404 · 2023-01-04T14:08:44Z

Ok, I got it, thanks for your kind reply!

JCZ404 closed this as completed Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the KD Loss on the RetinaNet One-Stage Object Detectors #4

About the KD Loss on the RetinaNet One-Stage Object Detectors #4

JCZ404 commented Nov 25, 2022

hunto commented Nov 26, 2022

JCZ404 commented Jan 4, 2023

About the KD Loss on the RetinaNet One-Stage Object Detectors #4

About the KD Loss on the RetinaNet One-Stage Object Detectors #4

Comments

JCZ404 commented Nov 25, 2022

hunto commented Nov 26, 2022

JCZ404 commented Jan 4, 2023