New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add BART for summarization training with CNN/DM using pytorch-lightning #3236
[WIP] Add BART for summarization training with CNN/DM using pytorch-lightning #3236
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3236 +/- ##
=======================================
Coverage 77.56% 77.56%
=======================================
Files 100 100
Lines 16970 16970
=======================================
Hits 13162 13162
Misses 3808 3808 Continue to review full report at Codecov.
|
Nice! @yjernite might be interested! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this! I left a couple nitpicks, but looks reasonable to me. Were you planning on running finetuning for longer and posting results?
I made those requested changes. And yes I'm planning to run finetuning this weekend and share results. I only have access to a k80 so it'll take a while 🤷🏽♂️ |
This looks awesome. Let's coordinate with #3290 as well to share whatever code is possible. |
@nateraw can you do a review of this PR as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once #3290 gets merged, you'll have to update a few things, so I marked some here so you can get ahead of the curve on that. Once it's merged I'll be able to give a little more specific advice. Great work 👍
@acarrera94 I will try to get this working this week. If you are in the pytorch-lightning open slack we can also chat a bit more about the design. |
It's blocked on me, I should be able to get to it tonight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This code is nicely done. If you integrate with @nateraw 's work, I think it will eliminate about half the code. Also I would recommend just moving it to one file (utils will become very small.)
…ing test, removed unused imports and functions
…ing test, removed unused imports and functions
2dfbb55
to
8e24219
Compare
New code looks great. Excited to try it out! |
Thanks for sticking with it @ACarrera I'm really impressed how concise this became. Next we can get some numbers. |
@acarrera94 |
@sshleifer I usually ran it using --max_seq_lengt=756. And that used less than 16gb of memory with a batch size of 4, so we might want to change that default. And I haven’t tried it using --fp16. That comes from BaseTransformer right? |
This pull request adds to the example for BART for summarization. I used the example for NER using pytorch-lightning as guidance. This example will train on CNN/DM and evaluate, and get decent results, though I haven't trained it on the full dataset just yet. I'm sure there are better defaults for the hyperparams but these seem to work.
I based this PR on the code I wrote in this colab.
This would hopefully close #3004
TODO
Happy to hear any feedback!