Should I use t5v1.1, t5narrow and TalkingHeads? #266

sshleifer · 2020-06-16T13:46:22Z

Are there metrics (or guesses) available on the efficiency and/or performance improvements of the new checkpoints?

I'm trying to determine whether it would be valuable to support them in huggingface/transformers.

craffel · 2020-06-27T17:51:50Z

Hey Sam, the T5.1.1 checkpoints seem to work a little better on some tasks and a little worse on others (see e.g. https://arxiv.org/abs/2002.08910). The other main difference is that they were pre-trained only on the unsupervised task, so if you want to avoid the (possibly positive or negative) effects of multi-task pre-training you can try these instead.

The Talking Heads models do work a bit better than their original counterparts (see https://arxiv.org/abs/2003.02436), but Talking Heads Attention incurs a nontrivial increase in computational cost:

I haven't played with the "narrow" models much.

I wouldn't say it was a huge priority to support any of these because none are obvious wins (which is why we put them in a separate "experimental" README) unless many people are requesting them.

craffel closed this as completed Jun 27, 2020

ram02z mentioned this issue Dec 29, 2023

feat: add training scripts ram02z/alinet#13

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should I use t5v1.1, t5narrow and TalkingHeads? #266

Should I use t5v1.1, t5narrow and TalkingHeads? #266

sshleifer commented Jun 16, 2020

craffel commented Jun 27, 2020

Should I use t5v1.1, t5narrow and TalkingHeads? #266

Should I use t5v1.1, t5narrow and TalkingHeads? #266

Comments

sshleifer commented Jun 16, 2020

craffel commented Jun 27, 2020