Skip to content

Could RN50 image-encoder performance improved with LAION-2B or LAION-5B? #361

Answered by rwightman
PatCH0816 asked this question in Q&A
Discussion options

You must be logged in to vote

@PatCH0816 they might do better, but given the performance relative to the ViT models in the original paper (for amount of compute), didn't seem that enticing. If you haven't noticed, those ResNets aren't like normal ResNets, they have a fairly large self-attn pooling layer at the end, which fixes their resolution (unless you interpolate the pos embed), just like ViT.

I have recently pushed some ConvNeXt-Base models. The convnext_base_w models at 256x256 are sized to be roughly equivalent to the RN50x4 in the original paper in compute. They perform quite a bit better than that when trained with LAION-2B, they are better than the ViT-B-16 for ImageNet Zero Shot. More tests need to be perfo…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@PatCH0816
Comment options

Answer selected by PatCH0816
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants