[CNN architecture support] #12

CaptainEven · 2023-11-07T08:36:55Z

Out of curiosity, may i ask is there any possibility to make a version of CNN based CROCO self-supervised pipeline?

xjcvip007 · 2023-11-07T08:46:23Z

Maybe 《MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features》can meet your requirements.

PhilippeWeinzaepfel · 2023-11-07T09:20:08Z

Hi,

Masked Image Modeling (MIM) methods in general are well desiged for patch-based architectures such as ViTs.
There have been some attemps to extend MIM to CNNs, eg Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN or ConvMAE: Masked Convolution Meets Masked Autoencoders.
Such approaches could most likely be successfully integrated into CroCo but we are planning to work on that in the future.

Best
Philippe

CaptainEven · 2023-11-07T09:40:58Z

Thansk for the reply!

PhilippeWeinzaepfel closed this as completed Nov 7, 2023

Provide feedback