Confusion about fine-tune #11

Breeze-Zero · 2022-01-22T06:48:23Z

Thank you very much for your work. I tried to use your method to conduct pre-training on my data set, and then compared with the initialization model and imagenet supervised training model respectively. The results showed that the convergence speed of SimMIM pre-training model was similar to the initialization model, although the accuracy would be gradually better than the initialization model after several iterations. But not as good as the convergence of the supervised training model. Is your method as described by MAE in Fine-tune : still improves accuracy after many iterations? Looking forward to your reply.

ancientmooner · 2022-01-22T07:46:10Z

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Breeze-Zero · 2022-01-22T07:56:11Z

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation.

Breeze-Zero · 2022-01-22T07:59:01Z

Supplement training data for reference

ancientmooner · 2022-01-22T08:03:13Z

I did not quite follow your steps. Is it the following comparison:

SimMIM pre-training + segmentation fine-tune (red)
vs. supervised pre-training + segmentation fine-tune (blue)

Breeze-Zero · 2022-01-22T08:09:07Z

SimMIM pre-training backbone + segmentation fine-tune (red)
vs. Initialization weight backbone + segmentation fine-tune (blue)

ancientmooner · 2022-01-22T09:57:21Z

SimMIM pre-training backbone + segmentation fine-tune (red) vs. Initialization weight backbone + segmentation fine-tune (blue)

Thank you for your clarification. In general, the model with pretraining will converge much faster.

Yes, it is probably because the head is heavy compared to backbone. Another possible explanation could be that this problem is relatively simple, that both methods converge very fast.

Asers387 · 2022-06-07T15:48:37Z

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Would it be possible to explain what exactly you mean by "second-stage supervised pretraining"? Is there any documentation you could link concerning this? Thanks!

ywdong · 2022-07-14T11:10:14Z

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation.

I also have the same problem. @834799106 Do u solve the problem？

ancientmooner closed this as completed Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about fine-tune #11

Confusion about fine-tune #11

Breeze-Zero commented Jan 22, 2022

ancientmooner commented Jan 22, 2022

Breeze-Zero commented Jan 22, 2022

Breeze-Zero commented Jan 22, 2022

ancientmooner commented Jan 22, 2022

Breeze-Zero commented Jan 22, 2022 •

edited

ancientmooner commented Jan 22, 2022

Asers387 commented Jun 7, 2022

ywdong commented Jul 14, 2022

Confusion about fine-tune #11

Confusion about fine-tune #11

Comments

Breeze-Zero commented Jan 22, 2022

ancientmooner commented Jan 22, 2022

Breeze-Zero commented Jan 22, 2022

Breeze-Zero commented Jan 22, 2022

ancientmooner commented Jan 22, 2022

Breeze-Zero commented Jan 22, 2022 • edited

ancientmooner commented Jan 22, 2022

Asers387 commented Jun 7, 2022

ywdong commented Jul 14, 2022

Breeze-Zero commented Jan 22, 2022 •

edited