Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image reconstruction result comparision of the pretrained ConvNeXt-V1/V2 model #11

Open
songkq opened this issue Jan 9, 2023 · 4 comments

Comments

@songkq
Copy link

songkq commented Jan 9, 2023

@shwoo93 @s9xie hi, did you make a comparision about the effect of image reconstruction of the pretrained V1/V2 model? How about comparing with MAE-pretrained ViT?

@s9xie
Copy link
Collaborator

s9xie commented Jan 9, 2023

Yes we have visualized the reconstruction results several times and found the reconstructed results very similar to those obtained by ViT MAE (or Swin SimMIM), even before adding the GRN layer (ie. using the v1 model). In other words, it seems one cannot really judge the representation quality from the pixel space reconstruction quality: even we get “perfect” reconstructions, the finetuning results can still have a large gap.

@songkq
Copy link
Author

songkq commented Jan 9, 2023

@s9xie Does ConvNeXt-v1/v2 model equipped without sparse conv also work well in image reconstruction?

@songkq
Copy link
Author

songkq commented Jan 9, 2023

Yes we have visualized the reconstruction results several times and found the reconstructed results very similar to those obtained by ViT MAE (or Swin SimMIM), even before adding the GRN layer (ie. using the v1 model). In other words, it seems one cannot really judge the representation quality from the pixel space reconstruction quality: even we get “perfect” reconstructions, the finetuning results can still have a large gap.

Since the gap between pretraining and finetuning in the self-supervised paradigm, could we introcude the MAE-based ConvNeXt Encoder in a Semi-supvised framework such as FixMatch?

@songkq
Copy link
Author

songkq commented Jan 9, 2023

@s9xie @shwoo93 Did you make a comparison among the supervised ConvNeXt-v2 models across different model sizes?
As model size shrinks, does the superiority of MAE-pretrained vanish compared with the supervised ones?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants