Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Works for T5/BART? #13

Closed
danyaljj opened this issue Dec 4, 2020 · 4 comments
Closed

Works for T5/BART? #13

danyaljj opened this issue Dec 4, 2020 · 4 comments

Comments

@danyaljj
Copy link

danyaljj commented Dec 4, 2020

Very cool work!

Does this work for T5/BART models as well?

@danyaljj
Copy link
Author

danyaljj commented Dec 4, 2020

Side note: it'd be good to update the transformers dependency to the latest (v4.0.0).

@lvwerra
Copy link
Member

lvwerra commented Dec 17, 2020

You are right, when I have time I'll upgrade it to v4.0.0. I haven't tested it but I suspect if you take a model with a text generation head it should work. Note that you need add a value head to your model architecture (see here).

@danyaljj
Copy link
Author

I can try it. Other than running with no errors, what other ways I can test that the code is working fine? Is there a benchmark or a quantitative way of verifying the code?

@lvwerra
Copy link
Member

lvwerra commented Jan 17, 2021

Monitoring the rewards on the IMDb dataset would be a good start. For GPT-2 it takes only 1-2h to train.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants