The kl_div loss of self distillation #43

luyvlei · 2022-03-08T07:38:40Z

The following code calculate the kl_div loss of teacher from stage 1 and the student model. But the student didn't calculate log_softmax. Is this a mistake?

    student = F.softmax(target_out['out'], dim=1)
    with torch.no_grad():
        teacher_out = self.teacher_DP(target_imageS)
        teacher_out['out'] = F.interpolate(teacher_out['out'], size=threshold_arg.shape[2:], mode='bilinear', align_corners=True)
        teacher = F.softmax(teacher_out['out'], dim=1)

    loss_kd = F.kl_div(student, teacher, reduction='none')
    mask = (teacher != 250).float()
    loss_kd = (loss_kd * mask).sum() / mask.sum()
    loss = loss + self.opt.distillation * loss_kd

The text was updated successfully, but these errors were encountered:

panzhang0104 · 2022-04-01T02:59:13Z

Yeah, should be log_softmax

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The kl_div loss of self distillation #43

The kl_div loss of self distillation #43

luyvlei commented Mar 8, 2022 •

edited

panzhang0104 commented Apr 1, 2022

The kl_div loss of self distillation #43

The kl_div loss of self distillation #43

Comments

luyvlei commented Mar 8, 2022 • edited

panzhang0104 commented Apr 1, 2022

luyvlei commented Mar 8, 2022 •

edited