Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base-size pre-trained models #1651

Closed
XinnuoXu opened this issue Jan 27, 2020 · 7 comments
Closed

Base-size pre-trained models #1651

XinnuoXu opened this issue Jan 27, 2020 · 7 comments

Comments

@XinnuoXu
Copy link

XinnuoXu commented Jan 27, 2020

❓ Questions and Help

What is your question?

  1. Does Bart offer base-size(6-layer encoder, 6-layer decoder, hidden size 768) pre-trained models? Since in the summarization task, the baseline BERTSUMABS is trained on bert-base(12-layer encoder, 6-layer decoder, both hidden size 768), have you ever compared base-size Bart with it?

  2. Could you please offer a README file for XSum (similar with the CNN one)?

  3. How much time does the XSum fine-tuning take with smaller GPUs (like 4 11GB GPUs)?

@myleott @yinhanliu @ngoyal2707

@XinnuoXu XinnuoXu changed the title Smaller pre-trained models Bert-base size pre-trained models Jan 28, 2020
@XinnuoXu XinnuoXu changed the title Bert-base size pre-trained models Base-size pre-trained models Jan 28, 2020
@yinhanliu
Copy link

  1. our base model is trained on wiki-bookcorpus only.
  2. will do
  3. we use 16 32gpus for 1 hour (30K steps). so in your case it is 8 hours.

@YizhuLiu
Copy link

YizhuLiu commented Feb 9, 2020

@XinnuoXu Hi, Have you evaluated the bart.large.cnn model? Did you get the same R-2 score on CNN/DM datase as published? I used pre-trained model to fine-tune CNN/DM training. But the ROUGE-2 is 19.19 (R-2 in published paper is 21.28).
Thank you very much!

@yinhanliu
Copy link

@YizhuLiu you need to use the right max-len, min-len, Len-penalty and beam size values.

@YizhuLiu
Copy link

@yinhanliu Thank you for your reply. We set these values as shown in "Evaluating the bart.large.cnn model": beam=4, lenpen=2.0, max_len_b=140, min_len=55. With this setting, the R-2 score is 20.03. Are they right? If not, how can I get the same R-2 score on CNN/DM as published?

@ricardorei
Copy link

Will the Bart base-size(6-layer encoder, 6-layer decoder, hidden size 768) pre-trained models be released? I would like to play with them and it is hard for me to fine-tune the large model.

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

@stale stale bot added the stale label Apr 17, 2022
@stale
Copy link

stale bot commented Apr 28, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

@stale stale bot closed this as completed Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants