Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I know it is pretraining works instead of longer finetuning epochs? #40

Closed
rayleizhu opened this issue May 22, 2023 · 5 comments
Closed

Comments

@rayleizhu
Copy link

I notice that you set the finetuning epochs as 200 or 400.

'convnext_base': (4096, 400, 20, 'adam', 0.0001, 0.7, 0.01, 0.8, 3, 0.4, 0.9999),

However,

  • the standard supervised training only runs for 300 epochs (w/o 1600 or 800 pretraining epochs).
  • in MIM works (e.g. MAE and ConvNextV2), they typically finetune 100 epochs.

Did you try 100 epoch schedule? Can you also kindly share the result under such a setting?

@keyu-tian
Copy link
Owner

keyu-tian commented May 22, 2023

We basically follow A2-MIM's 300-epoch finetuning setting (i.e., the Resnet Strikes Back/RSB A2), and set 200/400 ep for smaller/larger models respectively. We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

btw, the convnextv2 uses 400 or 600 for their smaller models.

@rayleizhu
Copy link
Author

Thanks for your quick response.

We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

I think the 100 ep setting is important, otherwise, it is difficult for follow-up works to compare with existing works (Spark, ConvNextV2, etc.) in a fair way because of inconsistent evaluation protocols.

Besides, I think it is more reasonable to finetune pre-trained models with no more than 300 epochs, which is used in the supervised baseline. Otherwise, it is hard to say whether the performance gain comes from longer finetuning or better initialization provided by MIM.

@keyu-tian
Copy link
Owner

keyu-tian commented May 23, 2023

I see. But I would suggest not focusing too much on ImageNet finetuning. I feel the best way to justify whether MIM makes sense is to evaluate it on REAL downstream tasks (i.e., not on ImageNet), because doing pretraining and finetuning on the same dataset can be kind of like a "data leakage", and dosen't match our eventual goals of self-supervised learning.

On real downstream tasks (COCO object detection & instance segmentation), SparK can outperform Swin+MIM, Swin+Supervised, Conv+Supervised, Conv+Contrastive Learning, so these are like solid proofs of SparK's effectiveness.

@rayleizhu
Copy link
Author

I see. But I would suggest not focusing too much on ImageNet finetuning. I feel the best way to justify whether MIM makes sense is to evaluate it on REAL downstream tasks (i.e., not on ImageNet), because doing pretraining and finetuning on the same dataset can be kind of like a "data leakage", and dosen't match our eventual goals of self-supervised learning.

This makes sense to me. Thanks for the explanation.

@ds2268
Copy link

ds2268 commented Aug 10, 2023

We basically follow A2-MIM's 300-epoch finetuning setting (i.e., the Resnet Strikes Back/RSB A2), and set 200/400 ep for smaller/larger models respectively. We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

btw, the convnextv2 uses 400 or 600 for their smaller models.

But for the B and H/L models, they use the same 50 and 100 epochs (ConvNext v2 paper, A.1, Table 11) fine-tuning schedule. It would be nice to compare apple to apples in terms of fine-tuning epochs. What are the results after 50 epochs SparK fine-tunning of the ConvNext-B and 100 epochs for ConvNext-H?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants