-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image reconstruction result comparision of the pretrained ConvNeXt-V1/V2 model #11
Comments
Yes we have visualized the reconstruction results several times and found the reconstructed results very similar to those obtained by ViT MAE (or Swin SimMIM), even before adding the GRN layer (ie. using the v1 model). In other words, it seems one cannot really judge the representation quality from the pixel space reconstruction quality: even we get “perfect” reconstructions, the finetuning results can still have a large gap. |
@s9xie Does ConvNeXt-v1/v2 model equipped without sparse conv also work well in image reconstruction? |
Since the gap between pretraining and finetuning in the self-supervised paradigm, could we introcude the MAE-based ConvNeXt Encoder in a Semi-supvised framework such as FixMatch? |
@shwoo93 @s9xie hi, did you make a comparision about the effect of image reconstruction of the pretrained V1/V2 model? How about comparing with MAE-pretrained ViT?
The text was updated successfully, but these errors were encountered: