Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

Add Knowledge-Distillation #42

Merged
merged 10 commits into from
Jan 13, 2021
Merged

Conversation

fmassa
Copy link
Contributor

@fmassa fmassa commented Jan 13, 2021

This PR adds the implementation for both soft and hard distillation, as present in the paper.

We introduce a DistillationLoss, which wraps a criterion and a teacher_model and applies both the original criterion as well as the distillation loss from the teacher.

The distilled models have a different return type than the others:

  • during training, we return both the class_token predictions, as well as the dist_token predictions
  • during inference, we return a single Tensor as the average of both predictions, so that the rest of the inference pipeline stays the same

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2021
Copy link
Contributor

@TouvronHugo TouvronHugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for me.

@fmassa
Copy link
Contributor Author

fmassa commented Jan 13, 2021

Merging to unblock, will upload the weights in a follow-up PR

@fmassa fmassa merged commit 8eae326 into facebookresearch:main Jan 13, 2021
@fmassa fmassa deleted the distillation branch January 13, 2021 13:19
This was referenced Jan 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants