difference between 'ViT-B-32' and 'ViT-B-32-quickgelu' #444
Replies: 2 comments 1 reply
-
Also, I am curious whether final accuracy of the model will be affected if |
Beta Was this translation helpful? Give feedback.
-
@JiachengCheng96 the B/32 was the last model trained with the original quickgelu (it's actually slower than the native pytorch GELU that was added after the origin OpenAI models). There is an accuraccy difference which is why both were released, so if you need the full accuraccy for zero-shot the quickgelu is needed, but it's not a huge difference so if you were fine-tuning I'd use the non quick version. In any case the LAION2B weights are much better so I'd use those and not worry about this at all :) |
Beta Was this translation helpful? Give feedback.
-
Hi, thank you again for your amazing work!
I noticed that there are multiple ViT-B/32 models pretrained models:
('ViT-B-32-quickgelu', 'laion400m_e32')
and('ViT-B-32', 'laion400m_e32')
.According to your description in README, it seems
'laion400m_e32'
was trained with quickgelu, and the only difference between('ViT-B-32-quickgelu', 'laion400m_e32')
and('ViT-B-32', 'laion400m_e32')
is in activation. I am not sure if my understanding is correct?Thanks again for your work!
Beta Was this translation helpful? Give feedback.
All reactions