New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[T5] allow config.decoder_layers to control decoder size #7409
Conversation
@@ -57,6 +57,8 @@ class T5Config(PretrainedConfig): | |||
Size of the intermediate feed forward layer in each :obj:`T5Block`. | |||
num_layers (:obj:`int`, `optional`, defaults to 6): | |||
Number of hidden layers in the Transformer encoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was previously documented incorrectly. Now it is correct!
Codecov Report
@@ Coverage Diff @@
## master #7409 +/- ##
==========================================
- Coverage 77.58% 76.85% -0.74%
==========================================
Files 181 181
Lines 35725 35784 +59
==========================================
- Hits 27719 27501 -218
- Misses 8006 8283 +277
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thanks for adding the docs and tests :-)
src/transformers/configuration_t5.py
Outdated
@@ -57,6 +57,8 @@ class T5Config(PretrainedConfig): | |||
Size of the intermediate feed forward layer in each :obj:`T5Block`. | |||
num_layers (:obj:`int`, `optional`, defaults to 6): | |||
Number of hidden layers in the Transformer encoder. | |||
decoder_layers (:obj:`int`, `optional`, defaults to num_layers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documented defaults should match the signature. It will eventually default to num_layers
and this should be said, but in the line after, suggestion:
decoder_layers (:obj:`int`, `optional`):
Number of hidden layers in the Transformer decoder. Will use the same value as :obj:`num_layers` if not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean! Would suggest to call the variable num_decoder_layers
though - what do you think?
Renamed it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, LGTM!
…#7409) * Working assymmetrical T5 * rename decoder_layers -> num_decoder_layers * Fix docstring * Allow creation of asymmetric t5 students
…#7409) * Working assymmetrical T5 * rename decoder_layers -> num_decoder_layers * Fix docstring * Allow creation of asymmetric t5 students
…gingface#7409)" This reverts commit c352b87.
Problem
arxiv.org/abs/2006.10369, among others, shows that models with fewer decoder layers than encoder layers can perform well and run generation much faster. Right now it is difficult to do distillation on t5 because there is only
T5Config.num_layers
which controls encoder layers and decoder layers.Solution
config.decoder_layers
to control decoder num layersconfig.decoder_layers = num_layers
Testing