[FEATURE] Add image backbones from `MobileCLIP` paper #2110

rsomani95 · 2024-03-16T10:24:56Z

MobileCLIP is a really fast CLIP architecture for mobile inference - about 3x faster than the fastest publicly available CLIP backbone convnext_base_w for inference on iOS / macOS devices.

They introduce 3 novel image backbones: mci{0|1|2}. It would be amazing if these models were available directly via timm. I believe this would be an essential first step towards getting it into open_clip for fine-tuning.

The arch, defined here, uses MobileOne and FastVIT components, which are already available in timm. I'm not sure how compatible the re-implementation there is with the existing one in timm out of the box, but it smells like integration is definitely possible.

The text was updated successfully, but these errors were encountered:

rwightman · 2024-03-18T19:09:56Z

@rsomani95 the components themselves are equivalent at a functional level, but the naming was remapped, so would have to remap for this model as well...

rwightman · 2024-03-21T20:17:25Z

@rsomani95 I took a closer look at this s1/s2 (mc1/mc2) are the easiest, could probably map those to OpenCLIP w/ a timm FastViT encoder (after a few additions and a key remapping for weights). I think the text encoder for those is compatible.

S0 uses a repmixer based text encoder so would need new code in OpenCLIP as well. The image encoder would map to a tweaked ver of FastViT.

The B model uses a ViT w/ a different stem, doable. I really like ViT NOT having BatchNorm though so a shame that it's now a ViT Base w/ BN in the stem.

rsomani95 · 2024-03-21T20:51:54Z

@rwightman thanks for looking into that. That's really great to hear re. s1/s2 as those, in my eyes, sit in the perfect sweetspot of speed + accuracy. Given your observations, maybe it makes sense to port those two alone first? Is there something in particular I could help with?

rsomani95 added the enhancement New feature or request label Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add image backbones from `MobileCLIP` paper #2110

[FEATURE] Add image backbones from `MobileCLIP` paper #2110

rsomani95 commented Mar 16, 2024

rwightman commented Mar 18, 2024

rwightman commented Mar 21, 2024

rsomani95 commented Mar 21, 2024 •

edited

[FEATURE] Add image backbones from MobileCLIP paper #2110

[FEATURE] Add image backbones from MobileCLIP paper #2110

Comments

rsomani95 commented Mar 16, 2024

rwightman commented Mar 18, 2024

rwightman commented Mar 21, 2024

rsomani95 commented Mar 21, 2024 • edited

[FEATURE] Add image backbones from `MobileCLIP` paper #2110

[FEATURE] Add image backbones from `MobileCLIP` paper #2110

rsomani95 commented Mar 21, 2024 •

edited