Fine-tuning details #45

nakashima-kodai · 2021-01-14T08:20:58Z

Hi,

I am trying to replicate the results of the paper that have been fine-tuned to datasets such as CIFAR-10 and Stanford Cars. Could you give details about hyper-parameters used (like batch size, learning rate etc.)

Thanks.

haoweiz23 · 2021-01-15T15:18:04Z

follow this issue. I have same question

Christine620 · 2021-01-22T07:35:08Z

I have the same question too

TouvronHugo · 2021-01-22T07:57:26Z

Hi @nakashima-kodai , @Zhu-haow and @Christine620,

Thanks for your question, for CIFAR-10 and Cars we use:

Image size: 224 or 384 (it's to simplify because we don't change the patche size.)
Batch size: 768
lr: 0.01
optimizer: SGD
weight-decay: 1e-4
epochs: 1000

We remove random erasing and stochastic depth. All other elements are the same as for training on ImageNet.
You can also use AdamW for finetuning in this case just take a smaller lr and keep the weight decay used for ImageNet training.
Do not hesitate if you have other questions.

Best,

Hugo

nicolas-dufour · 2021-01-24T13:19:46Z

Hi @TouvronHugo
I was wondering if some particular procedure is needed to finetune the distiled model?
Thank you for your help!

TouvronHugo · 2021-01-25T12:12:55Z

Hi @nicolas-dufour,
It is better to keep the distillation signal during the fine-tuning otherwise nothing special.
Do not hesitate if you have other questions.
Best,
Hugo

claverru · 2021-01-26T16:17:47Z

Hello @TouvronHugo, first of all congratulations on your great work, and thanks for replying here. I have two questions regarding to this topic:

How do we keep the distillation signal when finetuning on different dataset with different classes? Am I supose to have another conv teacher model trained on the same task before? If not needed, what signal is suposed to be ok, same as in the standard head? Maybe something softer?
Is there any way to finetune a pretrained model on higher image size? E.g. I take 384 base and finetune with images of size 512. I've read in the main.py script that you interpolate position embedding, but it isn't clear to me, even after reading your paper, what do you do with that. Can I interpolate pos embedding to finetune in a higher size for a different task?

forjiuzhou · 2021-01-29T04:11:18Z

Hi @nakashima-kodai , @Zhu-haow and @Christine620,

Thanks for your question, for CIFAR-10 and Cars we use:

Image size: 224 or 384 (it's to simplify because we don't change the patche size.)

Batch size: 768

lr: 0.01

optimizer: SGD

weight-decay: 1e-4

epochs: 1000

We remove random erasing and stochastic depth. All other elements are the same as for training on ImageNet.
You can also use AdamW for finetuning in this case just take a smaller lr and keep the weight decay used for ImageNet training.
Do not hesitate if you have other questions.

Best,

Hugo

Hi, a question about finetuning on CIFAR. How can one train CIFAR with 224 or 384 image size? What does 224 or 384 image size mean here?

TouvronHugo · 2021-01-29T08:24:03Z

Hi @claverru,
Thanks for your questions,

For transfer learning I think that doing fine-tuning with a teacher fine-tune on the target dataset is the best. Nevertheless, not using the distillation signal and doing a classic fine-tuning also works (We did that in the DeiT).
You can fine-tune the model at any resolution just by interpolating the position embedding and fine-tune the network.

I hope I have answered your questions ,
Best,
Hugo

TouvronHugo · 2021-01-29T08:27:01Z

Hi @forjiuzhou,

224 or 384 image size means using image with resolution 224x224 pixels or 384x384 pixels. On CIFAR it is necessary to do an interpolation of the original images that are of size 32x32.

Best,
Hugo

zizhaozhang · 2021-02-14T22:54:37Z

@TouvronHugo Hi

I tried this script, could you help me verify?

I found this 768 batch size will OOM even use deit_small. I have to decrease to 256 batch size.

MODEL=deit_small_distilled_patch16_224
FT='https://dl.fbaipublicfiles.com/deit/deit_small_distilled_patch16_224-649709d9.pth'
IS=224

LR=0.01
WD=1e-4
EPO=1000

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port=10000 --nproc_per_node=4 --use_env main.py --model ${MODEL} --batch-size 768 --data-path ${SAVEPATH} --output_dir ${CKPTPATH} --finetune ${FT} --data-set 'CIFAR' --lr ${LR} --weight-decay ${WD} --epochs ${EPO} --opt 'sgd' --input-size=${IS} --num_workers=4

haoweiz23 · 2021-02-22T05:46:22Z

@TouvronHugo Hi,
I wonder if smaller batch size need to set smaller lr and min-lr ?
Your baseline batch size is 768?

Will finetune longer epoch (more than 30) get higher accuracy?

Does bigger model (deit-large) needs to finetune more epochs or bigger learning rate?

TouvronHugo · 2021-05-20T11:40:09Z

As there is no more activity on this issue I will close it but feel free to reopen it if needed.

cashincashout · 2021-08-30T07:46:07Z

@TouvronHugo Hi, I'm also wondering about the training recipe for CIFAR-100 and Flowers dataset. Thanks for your help!

TouvronHugo · 2021-08-30T08:42:16Z

Hi @jizxny ,
For CIFAR-100 and Flowers dataset you can use the same hparams as for CIFAR-10 and Cars.
Best,
Hugo

TouvronHugo added the question Further information is requested label Jan 22, 2021

haoweiz23 mentioned this issue Feb 23, 2021

About the learning rate in finetuning stage #64

Closed

TouvronHugo mentioned this issue Mar 16, 2021

Results on CIFAR10/100 dataset #69

Closed

NielsRogge mentioned this issue Apr 2, 2021

Question about fine-tuning DeiT #82

Closed

TouvronHugo closed this as completed May 20, 2021

Reinhard-Tichy mentioned this issue Jul 30, 2021

Fine-tuning details on iNaturalist dataset #105

Closed

aelnouby mentioned this issue Aug 24, 2021

Fine-tuning configurations facebookresearch/xcit#15

Closed

Godofnothing mentioned this issue Oct 17, 2021

Optimal lr used for fine-tuning with larger LR on ImageNet #120

Closed

TouvronHugo mentioned this issue Jan 3, 2022

Finetuning to down stream tasks facebookresearch/dino#144

Open

falcon-xu mentioned this issue Mar 6, 2022

Questions about hyper-parameters in the finetuning stage #152

Open

TouvronHugo mentioned this issue Oct 30, 2022

What are the hyperparameters for finetuning on cifar10/100, and other small datasets #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning details #45

Fine-tuning details #45

nakashima-kodai commented Jan 14, 2021 •

edited

Loading

haoweiz23 commented Jan 15, 2021

Christine620 commented Jan 22, 2021

TouvronHugo commented Jan 22, 2021

nicolas-dufour commented Jan 24, 2021

TouvronHugo commented Jan 25, 2021

claverru commented Jan 26, 2021

forjiuzhou commented Jan 29, 2021

TouvronHugo commented Jan 29, 2021

TouvronHugo commented Jan 29, 2021

zizhaozhang commented Feb 14, 2021

haoweiz23 commented Feb 22, 2021

TouvronHugo commented May 20, 2021

cashincashout commented Aug 30, 2021

TouvronHugo commented Aug 30, 2021

Fine-tuning details #45

Fine-tuning details #45

Comments

nakashima-kodai commented Jan 14, 2021 • edited Loading

haoweiz23 commented Jan 15, 2021

Christine620 commented Jan 22, 2021

TouvronHugo commented Jan 22, 2021

nicolas-dufour commented Jan 24, 2021

TouvronHugo commented Jan 25, 2021

claverru commented Jan 26, 2021

forjiuzhou commented Jan 29, 2021

TouvronHugo commented Jan 29, 2021

TouvronHugo commented Jan 29, 2021

zizhaozhang commented Feb 14, 2021

haoweiz23 commented Feb 22, 2021

TouvronHugo commented May 20, 2021

cashincashout commented Aug 30, 2021

TouvronHugo commented Aug 30, 2021

nakashima-kodai commented Jan 14, 2021 •

edited

Loading