-
Notifications
You must be signed in to change notification settings - Fork 550
I need some help to reproduce DeiT-III finetuning result #167
Comments
Hi @bhheo, |
Thank you for sharing 192 model. |
Hi, @TouvronHugo I still haven't succeeded in reproducing the finetune results, but I want to share my progress I have tried finetuning with your 192 model weight I found that Next, I will try to downgrade my library versions, Regards |
I got similar results with I don't know what should I do next to reproduce. Best |
This is quite strange as the most complex procedure is clearly pre-training and not finetuning. |
What is interesting in your logs is that from epochs 0 it looks a bit worse than in our logs. But after the epochs 0 normally the model weights don't change too much the lr is very low. Maybe there is a problem with the interpolation of the position encoding or with the loading of the weights. |
Thank you for your kindness. Yes. I'm using the DeiT repo a2ffd162 with minor changes, such as logger and bugfix. I put a tab here, because it makes an undefined Line 341 in 9bfdc73
Except this, I think my changes don't affect the training process. I observed that my train loss is much lower than your logs. |
I'll check that. Yes, you are right, a tab was missing. I fix that ;) |
Hi @bhheo, Did you solve your finetuning issue? I haven't had the time to compare my internal code and the public repo yet, but I should have some time in the next months. Best, Hugo |
Hi @TouvronHugo Unfortunately, the finetuning issue is not solved yet. Best |
Hi @TouvronHugo and @bhheo, I also failed to reproduce the finetune performance using the official released code & pre-trained weight. Specifically, under the same setting & configuration of @bhheo (I use the pre-trained weight @ 192 px here #167 (comment) and set crop-pct to 1.0 during inference), my best fine-tuning performance is 83.47 (got this result within the first 5 epochs), which is also similar to @bhheo. Since the official released code has crop-pct=0.875, so I guess there must be some other differences between the internal code and the public repo of DEiT III. Best |
Hi I haven't reached to 83.8% accuracy.
Lamb optimizer improve the performance. Fine-tuning costs only 20 epochs. So, I can test diverse settings. Best |
Hi, I also have trouble reproducing some of the results. For example, I tried to reproduce
But at my epoch 55, I got Best, |
Hi @tangjiasheng & @TouvronHugo, I guess there is something wrong with the input size for the ViT-H model. |
Hi @tangjiasheng, Hugo |
Hi @Yuxin-CV, Yes, good catch for the resolutions. With ViT-H/14 at resolution 128 and 160 the code works it only removes a few pixels from the border of the image which does not have a significant effect. But it's cleaner to use 126 and 154 ;) Best, Hugo |
Hi @Yuxin-CV and @bhheo, Line 419 in cb1f48a
The model must be in training mode for the finetuning of deit III. So try to replace set_training_mode=args.finetune == '' by set_training_mode=True if you have the time don't hesitate to test this ;)(Without training mode drop-path is not activate) Best, Hugo (If this doesn't solve the problem I will look into it further as promised by early September ;) ) |
@TouvronHugo Best |
I use the same setting. Validating the model on 1k set by mapping the data using Best, |
Hi I got the result, and it is almost the same as the official log. Thank you for your advice @TouvronHugo Best |
Great! I just fixed that in the code by adding a |
Hi
Thank you for sharing finetune code & training logs
On IN-1k pretraining, I got similar results to your log: ViT-S 81.43 and ViT-B 82.88
But, I failed to reproduce finetune performance even with your official finetuning setting
So, I would like to ask for advice or help.
Here is my fine-tune result with ViT-B on IN-1k.
![image](https://user-images.githubusercontent.com/8871141/171699657-f7d2a63c-f687-42a8-9aef-060e67832682.png)
I expected performance will increase as your fine-tune log, but. instead, the finetune degrades the performance.
I can't use
submitit
, so I used the following command on 1 node 8 GPUs A100 machineand full args printed on the command line
I think it is the same as your finetune setting.
I double-checked my code but I still don't know why the result is totally different.
I'm using different library versions
torch : 1.11.0a0+b6df043
,torchvision: 0.11.0a0
,timm: 0.5.4
It might cause some problems, but there was no problem in pretraining and the performance difference is too severe for a simple library version issue.
I'm sorry to keep bothering you, but could you please let me know if there is something wrong with my setting?
Or could you please share the ViT-B weights pretrained on IN-1k 192x192 resolution without finetuning on 224x224?
If you share the weights before finetune, I can verify my finetune code without doubting my pretraining.
The text was updated successfully, but these errors were encountered: