How do I know it is pretraining works instead of longer finetuning epochs? #40

rayleizhu · 2023-05-22T08:51:42Z

I notice that you set the finetuning epochs as 200 or 400.

Line 16 in a64bdf7

    
           'convnext_base':      (4096, 400, 20, 'adam', 0.0001,  0.7, 0.01, 0.8, 3, 0.4,  0.9999),

However,

the standard supervised training only runs for 300 epochs (w/o 1600 or 800 pretraining epochs).
in MIM works (e.g. MAE and ConvNextV2), they typically finetune 100 epochs.

Did you try 100 epoch schedule? Can you also kindly share the result under such a setting?

keyu-tian · 2023-05-22T15:11:56Z

We basically follow A2-MIM's 300-epoch finetuning setting (i.e., the Resnet Strikes Back/RSB A2), and set 200/400 ep for smaller/larger models respectively. We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

btw, the convnextv2 uses 400 or 600 for their smaller models.

rayleizhu · 2023-05-23T08:58:30Z

Thanks for your quick response.

We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

I think the 100 ep setting is important, otherwise, it is difficult for follow-up works to compare with existing works (Spark, ConvNextV2, etc.) in a fair way because of inconsistent evaluation protocols.

Besides, I think it is more reasonable to finetune pre-trained models with no more than 300 epochs, which is used in the supervised baseline. Otherwise, it is hard to say whether the performance gain comes from longer finetuning or better initialization provided by MIM.

keyu-tian · 2023-05-23T09:21:02Z

I see. But I would suggest not focusing too much on ImageNet finetuning. I feel the best way to justify whether MIM makes sense is to evaluate it on REAL downstream tasks (i.e., not on ImageNet), because doing pretraining and finetuning on the same dataset can be kind of like a "data leakage", and dosen't match our eventual goals of self-supervised learning.

On real downstream tasks (COCO object detection & instance segmentation), SparK can outperform Swin+MIM, Swin+Supervised, Conv+Supervised, Conv+Contrastive Learning, so these are like solid proofs of SparK's effectiveness.

rayleizhu · 2023-05-23T09:43:07Z

I see. But I would suggest not focusing too much on ImageNet finetuning. I feel the best way to justify whether MIM makes sense is to evaluate it on REAL downstream tasks (i.e., not on ImageNet), because doing pretraining and finetuning on the same dataset can be kind of like a "data leakage", and dosen't match our eventual goals of self-supervised learning.

This makes sense to me. Thanks for the explanation.

ds2268 · 2023-08-10T17:55:44Z

We basically follow A2-MIM's 300-epoch finetuning setting (i.e., the Resnet Strikes Back/RSB A2), and set 200/400 ep for smaller/larger models respectively. We exclude the 100-epoch RSB A3 setting since it uses a different resolution (160), but if it is of interests we would have a try.

btw, the convnextv2 uses 400 or 600 for their smaller models.

But for the B and H/L models, they use the same 50 and 100 epochs (ConvNext v2 paper, A.1, Table 11) fine-tuning schedule. It would be nice to compare apple to apples in terms of fine-tuning epochs. What are the results after 50 epochs SparK fine-tunning of the ConvNext-B and 100 epochs for ConvNext-H?

keyu-tian closed this as completed May 24, 2023

ds2268 mentioned this issue Aug 10, 2023

Finetuning epochs #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I know it is pretraining works instead of longer finetuning epochs? #40

How do I know it is pretraining works instead of longer finetuning epochs? #40

rayleizhu commented May 22, 2023

keyu-tian commented May 22, 2023 •

edited

rayleizhu commented May 23, 2023

keyu-tian commented May 23, 2023 •

edited

rayleizhu commented May 23, 2023

ds2268 commented Aug 10, 2023

How do I know it is pretraining works instead of longer finetuning epochs? #40

How do I know it is pretraining works instead of longer finetuning epochs? #40

Comments

rayleizhu commented May 22, 2023

keyu-tian commented May 22, 2023 • edited

rayleizhu commented May 23, 2023

keyu-tian commented May 23, 2023 • edited

rayleizhu commented May 23, 2023

ds2268 commented Aug 10, 2023

keyu-tian commented May 22, 2023 •

edited

keyu-tian commented May 23, 2023 •

edited