Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add BART for summarization training with CNN/DM using pytorch-lightning #3236

Merged
merged 37 commits into from Mar 25, 2020

Conversation

andr-ec
Copy link
Contributor

@andr-ec andr-ec commented Mar 12, 2020

This pull request adds to the example for BART for summarization. I used the example for NER using pytorch-lightning as guidance. This example will train on CNN/DM and evaluate, and get decent results, though I haven't trained it on the full dataset just yet. I'm sure there are better defaults for the hyperparams but these seem to work.
I based this PR on the code I wrote in this colab.

This would hopefully close #3004

TODO

  • Be able to train the model on a GPU.
  • remove unused args
  • add test step and save results.

Happy to hear any feedback!

@codecov-io
Copy link

codecov-io commented Mar 12, 2020

Codecov Report

Merging #3236 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3236   +/-   ##
=======================================
  Coverage   77.56%   77.56%           
=======================================
  Files         100      100           
  Lines       16970    16970           
=======================================
  Hits        13162    13162           
  Misses       3808     3808           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d4a019...9d4a019. Read the comment docs.

@sshleifer
Copy link
Contributor

Nice! @yjernite might be interested!

@sshleifer sshleifer self-requested a review March 12, 2020 05:28
Copy link
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this! I left a couple nitpicks, but looks reasonable to me. Were you planning on running finetuning for longer and posting results?

examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/utlis.py Outdated Show resolved Hide resolved
examples/summarization/bart/utlis.py Outdated Show resolved Hide resolved
examples/summarization/bart/utlis.py Outdated Show resolved Hide resolved
@andr-ec
Copy link
Contributor Author

andr-ec commented Mar 12, 2020

I made those requested changes. And yes I'm planning to run finetuning this weekend and share results. I only have access to a k80 so it'll take a while 🤷🏽‍♂️

@andr-ec andr-ec requested a review from sshleifer March 13, 2020 20:59
@srush srush self-requested a review March 16, 2020 14:57
@srush
Copy link
Contributor

srush commented Mar 16, 2020

This looks awesome. Let's coordinate with #3290 as well to share whatever code is possible.

@srush
Copy link
Contributor

srush commented Mar 16, 2020

@nateraw can you do a review of this PR as well?

Copy link
Contributor

@nateraw nateraw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #3290 gets merged, you'll have to update a few things, so I marked some here so you can get ahead of the curve on that. Once it's merged I'll be able to give a little more specific advice. Great work 👍

examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
@srush
Copy link
Contributor

srush commented Mar 17, 2020

@acarrera94 I will try to get this working this week. If you are in the pytorch-lightning open slack we can also chat a bit more about the design.

@sshleifer sshleifer dismissed their stale review March 18, 2020 05:02

LGTM pending other reviewers!

@andr-ec
Copy link
Contributor Author

andr-ec commented Mar 18, 2020

@nateraw I've made all of those changes and it looks like #3290 has been merged, anything else that needs to change? Thanks!

@srush
Copy link
Contributor

srush commented Mar 18, 2020

It's blocked on me, I should be able to get to it tonight.

Copy link
Contributor

@srush srush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This code is nicely done. If you integrate with @nateraw 's work, I think it will eliminate about half the code. Also I would recommend just moving it to one file (utils will become very small.)

examples/summarization/bart/README.md Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
examples/summarization/bart/run_bart_sum.py Outdated Show resolved Hide resolved
@andr-ec andr-ec force-pushed the bart_summarization_finetuning branch from 2dfbb55 to 8e24219 Compare March 24, 2020 20:00
@srush
Copy link
Contributor

srush commented Mar 24, 2020

New code looks great. Excited to try it out!

@srush srush merged commit 3d76df3 into huggingface:master Mar 25, 2020
@srush
Copy link
Contributor

srush commented Mar 25, 2020

Thanks for sticking with it @ACarrera I'm really impressed how concise this became. Next we can get some numbers.

@sshleifer
Copy link
Contributor

@acarrera94 run_train.sh is using 19GB on my system.
Does your system use less?
I am also seeing no memory savings from adding --fp16.
Thanks!

@andr-ec
Copy link
Contributor Author

andr-ec commented Mar 29, 2020

@sshleifer I usually ran it using --max_seq_lengt=756. And that used less than 16gb of memory with a batch size of 4, so we might want to change that default. And I haven’t tried it using --fp16. That comes from BaseTransformer right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BART : How can I train and evaluate BART on CNN/DM dataset
6 participants