You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey Sam, the T5.1.1 checkpoints seem to work a little better on some tasks and a little worse on others (see e.g. https://arxiv.org/abs/2002.08910). The other main difference is that they were pre-trained only on the unsupervised task, so if you want to avoid the (possibly positive or negative) effects of multi-task pre-training you can try these instead.
The Talking Heads models do work a bit better than their original counterparts (see https://arxiv.org/abs/2003.02436), but Talking Heads Attention incurs a nontrivial increase in computational cost:
I haven't played with the "narrow" models much.
I wouldn't say it was a huge priority to support any of these because none are obvious wins (which is why we put them in a separate "experimental" README) unless many people are requesting them.
Are there metrics (or guesses) available on the efficiency and/or performance improvements of the new checkpoints?
I'm trying to determine whether it would be valuable to support them in huggingface/transformers.
The text was updated successfully, but these errors were encountered: