Could RN50 image-encoder performance improved with LAION-2B or LAION-5B? #361

PatCH0816 · 2023-01-21T18:06:12Z

PatCH0816
Jan 21, 2023

It's a theoretical question, of course the first thought is to change the architecture from a RN50 to a transformer, if one aims for better performance overall. With that being said, there is a limit of complexity in the data that a RN50 image-encoder is able to learn. Could the performance of a RN50 image-encoder using a larger LAION-2B or LAION-5B dataset somehow be estimated? What are the expected performance gains? Maybe someone in this group even tried to train the RN50 image-encoder with these large LAION datasets?

Answered by rwightman

Jan 23, 2023

@PatCH0816 they might do better, but given the performance relative to the ViT models in the original paper (for amount of compute), didn't seem that enticing. If you haven't noticed, those ResNets aren't like normal ResNets, they have a fairly large self-attn pooling layer at the end, which fixes their resolution (unless you interpolate the pos embed), just like ViT.

I have recently pushed some ConvNeXt-Base models. The convnext_base_w models at 256x256 are sized to be roughly equivalent to the RN50x4 in the original paper in compute. They perform quite a bit better than that when trained with LAION-2B, they are better than the ViT-B-16 for ImageNet Zero Shot. More tests need to be perfo…

View full answer

rwightman · 2023-01-23T05:24:26Z

rwightman
Jan 23, 2023
Maintainer

@PatCH0816 they might do better, but given the performance relative to the ViT models in the original paper (for amount of compute), didn't seem that enticing. If you haven't noticed, those ResNets aren't like normal ResNets, they have a fairly large self-attn pooling layer at the end, which fixes their resolution (unless you interpolate the pos embed), just like ViT.

I have recently pushed some ConvNeXt-Base models. The convnext_base_w models at 256x256 are sized to be roughly equivalent to the RN50x4 in the original paper in compute. They perform quite a bit better than that when trained with LAION-2B, they are better than the ViT-B-16 for ImageNet Zero Shot. More tests need to be performed. The ConvNeXt models are fully convolutional for the image tower so can push different sized iamges through.

open_clip/src/open_clip/pretrained.py

Lines 162 to 171 in 2ca5893

    
           _convnext_base_w = dict( 
        
               laion2b_s13b_b82k=_pcfg(hf_hub='laion/CLIP-convnext_base_w-laion2B-s13B-b82K/'), 
        
               laion2b_s13b_b82k_augreg=_pcfg(hf_hub='laion/CLIP-convnext_base_w-laion2B-s13B-b82K-augreg/'), 
        
               laion_aesthetic_s13b_b82k=_pcfg(hf_hub='laion/CLIP-convnext_base_w-laion_aesthetic-s13B-b82K/'), 
        
           ) 
        
           _convnext_base_w_320 = dict( 
        
               laion_aesthetic_s13b_b82k=_pcfg(hf_hub='laion/CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K/'), 
        
               laion_aesthetic_s13b_b82k_augreg=_pcfg(hf_hub='laion/CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K-augreg/'), 
        
           )

https://huggingface.co/laion/CLIP-convnext_base_w-laion2B-s13B-b82K-augreg/blob/main/convnext_base_w_zero_shot.png

1 reply

PatCH0816 Jan 23, 2023
Author

@rwightman thanks for your feedback. So, as far as I understand you think there is a possibility that the ResNet-50 could do a little bit better with more data but you don't expect huge jumps in term of the performance. Thanks for the hint, but I read the description of the modified ResNet architecture in the paper already.

I have no experience of using ConvNext-Base models, I just read about them. Maybe I'll have a look at them in the future. Thanks for the hint again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could RN50 image-encoder performance improved with LAION-2B or LAION-5B? #361

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Could RN50 image-encoder performance improved with LAION-2B or LAION-5B? #361

PatCH0816 Jan 21, 2023

Replies: 1 comment · 1 reply

rwightman Jan 23, 2023 Maintainer

PatCH0816 Jan 23, 2023 Author

PatCH0816
Jan 21, 2023

Replies: 1 comment 1 reply

rwightman
Jan 23, 2023
Maintainer

PatCH0816 Jan 23, 2023
Author