Add Knowledge-Distillation #42

fmassa · 2021-01-13T12:29:06Z

This PR adds the implementation for both soft and hard distillation, as present in the paper.

We introduce a DistillationLoss, which wraps a criterion and a teacher_model and applies both the original criterion as well as the distillation loss from the teacher.

The distilled models have a different return type than the others:

during training, we return both the class_token predictions, as well as the dist_token predictions
during inference, we return a single Tensor as the average of both predictions, so that the rest of the inference pipeline stays the same

The teacher should stay in eval mode

Also uses more stable variant, instead of using softmax + log, use directly log_softmax

TouvronHugo

Good for me.

fmassa · 2021-01-13T13:19:19Z

Merging to unblock, will upload the weights in a follow-up PR

fmassa added 10 commits January 7, 2021 13:20

Add knowledge distillation

9735d2e

Bugfix

88ec1e3

Bugfix

bd8a459

Make names more readable and use single torch.cat call

c5dd3b2

Remove criterion.train() in engine

ce3dc69

The teacher should stay in eval mode

Change default argument for teacher-model

77d8b89

Return the average of classifiers during inference

dcbb807

Cleanup unused code

c40832b

Add docstring for DistillationLoss

3fc505a

Remove warnings from newer PyTorch

0e01509

Also uses more stable variant, instead of using softmax + log, use directly log_softmax

fmassa requested a review from TouvronHugo January 13, 2021 12:29

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2021

TouvronHugo approved these changes Jan 13, 2021

View reviewed changes

fmassa merged commit 8eae326 into facebookresearch:main Jan 13, 2021

fmassa deleted the distillation branch January 13, 2021 13:19

This was referenced Jan 18, 2021

Image size 384 #1

Closed

Code for distillation part #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Knowledge-Distillation #42

Add Knowledge-Distillation #42

fmassa commented Jan 13, 2021

TouvronHugo left a comment

fmassa commented Jan 13, 2021

Add Knowledge-Distillation #42

Add Knowledge-Distillation #42

Conversation

fmassa commented Jan 13, 2021

TouvronHugo left a comment

Choose a reason for hiding this comment

fmassa commented Jan 13, 2021