Inquiry on pretraining acc #62

yyou1996 · 2021-01-30T03:15:31Z

Thanks for your excellent work. Would you mind me asking what is the pretraining acc on imagenet2012 that then used for finetuning?

andsteing · 2021-02-01T07:10:28Z

Note that the published checkpoints were pretrained on imagenet21k. Pretraining accuracy on the validation set at the endof the pretraining was:

name	val_prec_1
ViT-B_16	47.88%
ViT-B_32	44.04%
ViT-L_16	49.90%
ViT-L_32	45.42%
ViT-H_14	49.06%

cissoidx · 2021-07-29T03:51:29Z

@andsteing hello, do you have, by any chance, the acc of pretraining on imagenet1k? I mean pretraining from absolute scratch, not finetuning.
I reached 48.4% using ViT-B_16 on the imanget1k validation set, would like to have reference if you have it.

andsteing · 2021-07-29T10:17:18Z

Sure, results after 300 ep (edit: L/32 and L/16 were trained for 90 ep) training on i1k from scratch are below:

name	val_prec_1
ViT-B/32 i1k	69.19%
ViT-B/16 i1k	74.79%
ViT-L/32 i1k	66.90%
ViT-L/16 i1k	72.59%

cissoidx · 2021-07-30T01:24:05Z

thanks. I trained much less ep. will try again.

andsteing · 2021-07-30T05:13:21Z

For training ViT from scratch you'll find that data augmentation and model regularization really help with medium-sized datasets (such as ImageNet, and ImageNet-21k), but with even longer training schedules (1000 ep for ImageNet, and 300 ep for ImageNet-21k):

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers (from this repo)
Training data-efficient image transformers & distillation through attention (using timm)

cissoidx · 2021-07-30T06:18:10Z

thanks for info.

cissoidx · 2021-08-13T04:22:31Z

@andsteing What is the loss function you use when doing pretraining? Is it the same as what you do in finetuning? I have seen people use semantic loss, is it necessary to reach sota?

andsteing · 2021-08-16T09:29:46Z

We used sigmoid crossentropy during pre-training (and we're using softmax crossentropy for fine-tuning).

cissoidx · 2021-08-25T03:13:37Z

Hi @andsteing,

afaik, normally we use sigmoid ce loss in multi-label task, since we assume that the labels are independent, and we use softmax ce loss in single label task, since we are looking for the max class. And these two are actually the same in binary classification.

pretraining is not a multi-label task, why do you use sigmoid ce loss?

cissoidx · 2021-08-25T06:02:53Z

I found out that if I use ce loss, the loss curve decreases fast. but if I use sigmoid ce loss, the loss curve barely decreases. Is it normal?

andsteing · 2021-08-25T09:33:15Z

We experimented with both softmax ce and sigmoid ce, and found that sigmoid ce works better even with single label i1k - see also Are we done with imagenet? paper similar results.

As for training loss evolution, we observed the following evolution:

cissoidx · 2021-08-25T10:26:29Z

Thanks for replying

cissoidx · 2021-09-06T10:16:22Z

@andsteing are these accuracies you mentioned obtained with 224*224 resolution?

cissoidx · 2021-09-08T02:52:42Z

@andsteing Can you please confirm that these pretraining acc are obtained with 224*224 resolution?

andsteing · 2021-09-21T10:26:59Z

(just came back from holiday)

Yes, the i1k pre-training accuracies from above are indeed for 224*224 resolution. We only changed resolution for fine-tuning runs.

cissoidx · 2021-09-22T07:50:01Z

@andsteing Thanks for your help. I finally reached 75.5% validation accuracy in pretraining in1k using b/16, even without some tricks that are mentioned in your papers, like stochastic depths (i do not use it), linear scheduler (i used cosine), ADAM (I used SGD), grad norm (I do not use it). I just wonder if there is an official statement of the accuracy you mentioned above. How should I cite your work in a decent way?

cissoidx · 2021-10-11T03:51:13Z

@andsteing in paper "how to train your vit", figure 4, left plot. vit on imagenet1k 300 epochs reached 83%. your above comment does not match these numbers. I might have missed something, can you please clarify?

Sure, results after 300 ep (edit: L/32 and L/16 were trained for 90 ep) training on i1k from scratch are below:

name val_prec_1
ViT-B/32 i1k 69.19%
ViT-B/16 i1k 74.79%
ViT-L/32 i1k 66.90%
ViT-L/16 i1k 72.59%

cissoidx · 2021-10-11T03:53:47Z

this is the result that I am refering to. Looking forward to your reply.

andsteing · 2021-10-11T05:51:52Z

Hi @cissoidx

This thread started on January 30th 2021 and is about the i1k from-scratch training in the original ViT paper. The paper how to train your ViT applies additional AugReg to increase those numbers, but it was published only in June 2021, so I thought it would not apply to the original question (and I thought starting to mix numbers from two different papers could make the thread more confusing).

You have all the data about the pre-training and fine-tuning of the how to train your ViT in the Colab
https://colab.research.google.com/github/google-research/vision_transformer/blob/master/vit_jax_augreg.ipynb

Best, Andreas

justHungryMan · 2021-11-22T05:22:45Z

Hi @cissoidx
I'm training ViT-b16 from scratch on imagenet1k now.
I just get 47.6% on val accruacy and you also got 48.4% (#62 (comment))
Can you tell me how you improved the accuracy?
All my parameters are same as vit paper.

cissoidx · 2021-11-23T11:17:16Z

@justHungryMan I guess it is not possible to reach paper sota with the default hyperparams. Since they do not release the code, you have to tune the hyperparams. some suggestions: use imagenet aug (proposed by the randaug package), weight decay = 0.004.

andsteing · 2021-11-23T17:01:02Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry on pretraining acc #62

Inquiry on pretraining acc #62

yyou1996 commented Jan 30, 2021

andsteing commented Feb 1, 2021

cissoidx commented Jul 29, 2021 •

edited

andsteing commented Jul 29, 2021 •

edited

cissoidx commented Jul 30, 2021

andsteing commented Jul 30, 2021

cissoidx commented Jul 30, 2021

cissoidx commented Aug 13, 2021

andsteing commented Aug 16, 2021

cissoidx commented Aug 25, 2021

cissoidx commented Aug 25, 2021

andsteing commented Aug 25, 2021

cissoidx commented Aug 25, 2021

cissoidx commented Sep 6, 2021

cissoidx commented Sep 8, 2021

andsteing commented Sep 21, 2021

cissoidx commented Sep 22, 2021

cissoidx commented Oct 11, 2021

cissoidx commented Oct 11, 2021

andsteing commented Oct 11, 2021

justHungryMan commented Nov 22, 2021

cissoidx commented Nov 23, 2021

andsteing commented Nov 23, 2021

Inquiry on pretraining acc #62

Inquiry on pretraining acc #62

Comments

yyou1996 commented Jan 30, 2021

andsteing commented Feb 1, 2021

cissoidx commented Jul 29, 2021 • edited

andsteing commented Jul 29, 2021 • edited

cissoidx commented Jul 30, 2021

andsteing commented Jul 30, 2021

cissoidx commented Jul 30, 2021

cissoidx commented Aug 13, 2021

andsteing commented Aug 16, 2021

cissoidx commented Aug 25, 2021

cissoidx commented Aug 25, 2021

andsteing commented Aug 25, 2021

cissoidx commented Aug 25, 2021

cissoidx commented Sep 6, 2021

cissoidx commented Sep 8, 2021

andsteing commented Sep 21, 2021

cissoidx commented Sep 22, 2021

cissoidx commented Oct 11, 2021

cissoidx commented Oct 11, 2021

andsteing commented Oct 11, 2021

justHungryMan commented Nov 22, 2021

cissoidx commented Nov 23, 2021

andsteing commented Nov 23, 2021

cissoidx commented Jul 29, 2021 •

edited

andsteing commented Jul 29, 2021 •

edited