Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF Bart] Refactor TFBart #9029

Merged
merged 20 commits into from
Dec 15, 2020

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Dec 10, 2020

What does this PR do?

Mirror of #8900 for TFBart.

The same improvements are done for Bart except adding torchscript functionality (as it does not exist in tf bart).

  • Keep dims consistent within the model -> no switching around between time x batch_size and batch_size x time. We can just stick to batch_size x time throughout the whole forward pass just like other models do too.
  • Clean the Attention layer: Replace dict cache by past_key_values tuple (consistency with other models and stateless which is better IMO). Break up complicated if-else cascade and remove unnecessary parameters.
  • Correct error with past_key_values/decoder_input_ids/use_cache
  • Add input_embeds to Bart
  • (very subjectively) better naming
  • Check that all slow tests are passing
  • Update docstring and final design change check
  • Refactor Bart tests
  • should solve 🐛 [TFBART] LayerDrop not working on TPU #9048
  • Check no speed regression

@patrickvonplaten
Copy link
Contributor Author

No speed regression on GPU brutasse in graph mode. PR is ready for review IMO.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for simplifying this!

Copy link
Contributor

@jplu jplu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!! Thanks for taking care of this part!!

Should we merge #9063 before or after this one?

@patrickvonplaten
Copy link
Contributor Author

patrickvonplaten commented Dec 14, 2020

Awesome!! Thanks for taking care of this part!!

Should we merge #9063 before or after this one?

Let's merge after your PR. I'll take the merge conflicts from you :-)
Also this way I can play around a bit with the new not-existing-cast-bool functionality, yaaay!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM! Thanks for working on this @patrickvonplaten!

src/transformers/models/bart/modeling_tf_bart.py Outdated Show resolved Hide resolved
src/transformers/models/bart/modeling_tf_bart.py Outdated Show resolved Hide resolved
Comment on lines +725 to +726
if inputs["training"] and (dropout_probability < self.layerdrop): # skip the layer
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the continue approach better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it partly fixes: #9048

src/transformers/models/bart/modeling_tf_bart.py Outdated Show resolved Hide resolved
@patrickvonplaten patrickvonplaten merged commit abc573f into huggingface:master Dec 15, 2020
@patrickvonplaten patrickvonplaten deleted the refactor_tf_bart branch December 15, 2020 17:55
guyrosin pushed a commit to guyrosin/transformers that referenced this pull request Jan 15, 2021
* reorder file

* delete unnecesarry function

* make style

* save intermediate

* fix attention masks

* correct tf bart past key values

* solve merge conflict bug

* correct tensor dims

* save intermediate tf

* change attn layer

* fix typo re-order past

* inputs_embeds

* make fix copies

* finish tests

* fix graph mode

* appyl lysandres suggestions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants