New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion about fine-tune #11
Comments
Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification). |
Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation. |
I did not quite follow your steps. Is it the following comparison: SimMIM pre-training + segmentation fine-tune (red) |
SimMIM pre-training backbone + segmentation fine-tune (red) |
Thank you for your clarification. In general, the model with pretraining will converge much faster. Yes, it is probably because the head is heavy compared to backbone. Another possible explanation could be that this problem is relatively simple, that both methods converge very fast. |
Would it be possible to explain what exactly you mean by "second-stage supervised pretraining"? Is there any documentation you could link concerning this? Thanks! |
I also have the same problem. @834799106 Do u solve the problem? |
Thank you very much for your work. I tried to use your method to conduct pre-training on my data set, and then compared with the initialization model and imagenet supervised training model respectively. The results showed that the convergence speed of SimMIM pre-training model was similar to the initialization model, although the accuracy would be gradually better than the initialization model after several iterations. But not as good as the convergence of the supervised training model. Is your method as described by MAE in Fine-tune : still improves accuracy after many iterations? Looking forward to your reply.
The text was updated successfully, but these errors were encountered: