How to replicate ViT results on smaller dataset YFCC15M #162
-
Before I launch long runs on Laion 400M, I wish to confirm my hardware setup is indeed correct. So I set up ViT-B with the repo and ran it with the 32K batch size and default parameters (weight decay: 0.2 and lr: 5e-4 ,32 epoch). But I consistently got top-1 zero-shot accuracy below 10. ViTs are expected to perform worse in lower data regimes, but this seems to be far off. Does anyone encounter similar issues or have an idea of what might have gone wrong? Any help will be highly appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@kyleliang919 I have some old training logs for ViT-B/32 on cc12m and managed ~31% top-1 imagenet-1k zero shot. Best ResNet-50 in that setup was ~36%. cc12m is higher quality than yfcc15m so might want to test with that one. 10% seems low though... but I would definitely expect it to end up worse than 30% |
Beta Was this translation helpful? Give feedback.
@kyleliang919 I have some old training logs for ViT-B/32 on cc12m and managed ~31% top-1 imagenet-1k zero shot. Best ResNet-50 in that setup was ~36%. cc12m is higher quality than yfcc15m so might want to test with that one. 10% seems low though... but I would definitely expect it to end up worse than 30%