Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CNN architecture support] #12

Closed
CaptainEven opened this issue Nov 7, 2023 · 3 comments
Closed

[CNN architecture support] #12

CaptainEven opened this issue Nov 7, 2023 · 3 comments

Comments

@CaptainEven
Copy link

CaptainEven commented Nov 7, 2023

Out of curiosity, may i ask is there any possibility to make a version of CNN based CROCO self-supervised pipeline?

@xjcvip007
Copy link

Maybe 《MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features》can meet your requirements.

@PhilippeWeinzaepfel
Copy link

Hi,

Masked Image Modeling (MIM) methods in general are well desiged for patch-based architectures such as ViTs.
There have been some attemps to extend MIM to CNNs, eg Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN or ConvMAE: Masked Convolution Meets Masked Autoencoders.
Such approaches could most likely be successfully integrated into CroCo but we are planning to work on that in the future.

Best
Philippe

@CaptainEven
Copy link
Author

Thansk for the reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants