Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LayerSkip] Per-Layer Dropout Rate Configuration #640

Open
mostafaelhoushi opened this issue Jul 8, 2024 · 0 comments
Open

[LayerSkip] Per-Layer Dropout Rate Configuration #640

mostafaelhoushi opened this issue Jul 8, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@mostafaelhoushi
Copy link

Describe the solution you would like:
Would like to enable configuration of a different layer dropout rate for each layer.

Describe the alternatives you have considered:
Currently, layer dropout is implemented in fairseq2 as a scalar probability for all layers (check here).
We can follow an implementation similar to this PR in torchtune to support linear, exponential, or step configurations for increasing dropout rate acorss layers.

Additional Context:
This will enable implementing:

  • Progressive Layer Dropping: that claims to increase accuracy and speed of training if dropout rate increases across layers linearly
  • LayerSkip: that claims to increase accuracy of early exit layers if dropout rate increaes linearly or exponentially across layers
@mostafaelhoushi mostafaelhoushi added the enhancement New feature or request label Jul 8, 2024
@mostafaelhoushi mostafaelhoushi changed the title Per-Layer Dropout Rate Configuration [LayerSkip] [LayerSkip] Per-Layer Dropout Rate Configuration Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant