We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of curiosity, may i ask is there any possibility to make a version of CNN based CROCO self-supervised pipeline?
The text was updated successfully, but these errors were encountered:
Maybe 《MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features》can meet your requirements.
Sorry, something went wrong.
Hi,
Masked Image Modeling (MIM) methods in general are well desiged for patch-based architectures such as ViTs. There have been some attemps to extend MIM to CNNs, eg Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN or ConvMAE: Masked Convolution Meets Masked Autoencoders. Such approaches could most likely be successfully integrated into CroCo but we are planning to work on that in the future.
Best Philippe
Thansk for the reply!
No branches or pull requests
Out of curiosity, may i ask is there any possibility to make a version of CNN based CROCO self-supervised pipeline?
The text was updated successfully, but these errors were encountered: