Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should I use t5v1.1, t5narrow and TalkingHeads? #266

Closed
sshleifer opened this issue Jun 16, 2020 · 1 comment
Closed

Should I use t5v1.1, t5narrow and TalkingHeads? #266

sshleifer opened this issue Jun 16, 2020 · 1 comment

Comments

@sshleifer
Copy link

Are there metrics (or guesses) available on the efficiency and/or performance improvements of the new checkpoints?

I'm trying to determine whether it would be valuable to support them in huggingface/transformers.

@craffel
Copy link
Collaborator

craffel commented Jun 27, 2020

Hey Sam, the T5.1.1 checkpoints seem to work a little better on some tasks and a little worse on others (see e.g. https://arxiv.org/abs/2002.08910). The other main difference is that they were pre-trained only on the unsupervised task, so if you want to avoid the (possibly positive or negative) effects of multi-task pre-training you can try these instead.

The Talking Heads models do work a bit better than their original counterparts (see https://arxiv.org/abs/2003.02436), but Talking Heads Attention incurs a nontrivial increase in computational cost:

image

I haven't played with the "narrow" models much.

I wouldn't say it was a huge priority to support any of these because none are obvious wins (which is why we put them in a separate "experimental" README) unless many people are requesting them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants