Skip to content

What pretrained models using SwiGLU are available? #2184

Answered by rwightman
NightMachinery asked this question in Q&A
Discussion options

You must be logged in to vote

@NightMachinery I think you covered the main models /w weights, it's possible to activate on any of the vit models though with an argument override.

It does yield an improvement, but it also is more expensive since it does end up increasing the size of the MLP and especially the memory required for acvitations, some perf can be regained if you use a fused impl like the one in xformers but I don't have that a s a dependency.

There is a short paper that compares some gated activations https://arxiv.org/abs/2002.05202

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by NightMachinery
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants