Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

@natuan
Copy link
Contributor

@natuan natuan commented Oct 4, 2021

I made the following changes to the distillation modifier:

  • Removed the use of forward hook to capture the latest inputs/outputs because it ignores the kwargs dict passed in to model forward (which is the type of model inputs, e.g. in QA model)
  • Removed the computation of teacher outputs inside the modifier's loss_update. Passing in the entire inputs dict into the teacher forward causes cross entropy loss to be calculated, which we don't need; it also creates an issue downstream from autograd when doing loss.backward.

@natuan natuan requested review from a team, bfineran, markurtz and spacemanidol October 4, 2021 03:42
Copy link
Member

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natuan our goal is to remove as much complexity for the user with modifiers as possible. This implementation is adding more complexity back to the user, specifically needing to manage the complete lifecycle of the teacher model themselves now. Additionally, the old code, as set up, would work for other pipelines and reduce simplicity for users.

@natuan
Copy link
Contributor Author

natuan commented Oct 6, 2021

@natuan our goal is to remove as much complexity for the user with modifiers as possible. This implementation is adding more complexity back to the user, specifically needing to manage the complete lifecycle of the teacher model themselves now. Additionally, the old code, as set up, would work for other pipelines and reduce simplicity for users.

Per our discussion, I moved the teacher's logic back into the distillation modifier

Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@natuan natuan merged commit 6a47673 into main Oct 7, 2021
@jeanniefinks jeanniefinks deleted the distill_modifier branch February 10, 2022 18:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants