Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[T5] allow config.decoder_layers to control decoder size #7409

Merged
merged 5 commits into from Sep 28, 2020

Conversation

sshleifer
Copy link
Contributor

@sshleifer sshleifer commented Sep 26, 2020

Problem

arxiv.org/abs/2006.10369, among others, shows that models with fewer decoder layers than encoder layers can perform well and run generation much faster. Right now it is difficult to do distillation on t5 because there is only T5Config.num_layers which controls encoder layers and decoder layers.

Solution

  • add config.decoder_layers to control decoder num layers
  • maintain 100% backwards compatibility by defaulting config.decoder_layers = num_layers
  • add tests
  • 4 line PR besides tests+ docs :)

Testing

  • slow t5 tests pass

@sshleifer sshleifer linked an issue Sep 26, 2020 that may be closed by this pull request
@@ -57,6 +57,8 @@ class T5Config(PretrainedConfig):
Size of the intermediate feed forward layer in each :obj:`T5Block`.
num_layers (:obj:`int`, `optional`, defaults to 6):
Number of hidden layers in the Transformer encoder.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was previously documented incorrectly. Now it is correct!

@codecov
Copy link

codecov bot commented Sep 26, 2020

Codecov Report

Merging #7409 into master will decrease coverage by 0.73%.
The diff coverage is 98.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7409      +/-   ##
==========================================
- Coverage   77.58%   76.85%   -0.74%     
==========================================
  Files         181      181              
  Lines       35725    35784      +59     
==========================================
- Hits        27719    27501     -218     
- Misses       8006     8283     +277     
Impacted Files Coverage Δ
src/transformers/modeling_longformer.py 74.14% <81.81%> (ø)
src/transformers/configuration_longformer.py 100.00% <100.00%> (ø)
src/transformers/configuration_t5.py 96.55% <100.00%> (+0.12%) ⬆️
src/transformers/configuration_utils.py 97.35% <100.00%> (+0.68%) ⬆️
src/transformers/modeling_albert.py 84.33% <100.00%> (+0.17%) ⬆️
src/transformers/modeling_bert.py 88.48% <100.00%> (+0.11%) ⬆️
src/transformers/modeling_mobilebert.py 89.43% <100.00%> (+0.04%) ⬆️
src/transformers/modeling_roberta.py 96.62% <100.00%> (+0.06%) ⬆️
src/transformers/modeling_t5.py 82.83% <100.00%> (+0.06%) ⬆️
src/transformers/modeling_tf_albert.py 97.35% <100.00%> (+0.01%) ⬆️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eab5f59...ac38a32. Read the comment docs.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for adding the docs and tests :-)

@@ -57,6 +57,8 @@ class T5Config(PretrainedConfig):
Size of the intermediate feed forward layer in each :obj:`T5Block`.
num_layers (:obj:`int`, `optional`, defaults to 6):
Number of hidden layers in the Transformer encoder.
decoder_layers (:obj:`int`, `optional`, defaults to num_layers):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented defaults should match the signature. It will eventually default to num_layers and this should be said, but in the line after, suggestion:

        decoder_layers (:obj:`int`, `optional`):
            Number of hidden layers in the Transformer decoder. Will use the same value as :obj:`num_layers` if not set.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean! Would suggest to call the variable num_decoder_layers though - what do you think?

@sshleifer
Copy link
Contributor Author

sshleifer commented Sep 27, 2020

Renamed it to num_decoder_layers, and fixed docstring!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, LGTM!

@LysandreJik LysandreJik merged commit 748425d into huggingface:master Sep 28, 2020
thevasudevgupta pushed a commit to thevasudevgupta/transformers that referenced this pull request Sep 29, 2020
…#7409)

* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
…#7409)

* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow creation of asymmetrical T5
4 participants