-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TF Bart] Refactor TFBart #9029
[TF Bart] Refactor TFBart #9029
Conversation
No speed regression on GPU brutasse in graph mode. PR is ready for review IMO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for simplifying this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!! Thanks for taking care of this part!!
Should we merge #9063 before or after this one?
Let's merge after your PR. I'll take the merge conflicts from you :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM! Thanks for working on this @patrickvonplaten!
if inputs["training"] and (dropout_probability < self.layerdrop): # skip the layer | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the continue
approach better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it partly fixes: #9048
* reorder file * delete unnecesarry function * make style * save intermediate * fix attention masks * correct tf bart past key values * solve merge conflict bug * correct tensor dims * save intermediate tf * change attn layer * fix typo re-order past * inputs_embeds * make fix copies * finish tests * fix graph mode * appyl lysandres suggestions
What does this PR do?
Mirror of #8900 for TFBart.
The same improvements are done for Bart except adding torchscript functionality (as it does not exist in tf bart).