Question about ViT-augreg ("How to train?") fine-tuning transfer #60

lucasb-eyer · 2023-11-07T10:21:29Z

We got the following question by e-mail by @alexlioralexli but think it's of general interest:

lucasb-eyer · 2023-11-07T10:22:03Z

First answer by @andsteing

We used our default transfer config: big_vision/configs/transfer.py, which uses inception crop (config -> preprocessing) and random horizontal flip (config).
As for pre-training, refer to these configs: big_vision/configs/vit_i21k.py and big_vision/configs/vit_i1k.py (see module pydoc for more information).

lucasb-eyer · 2023-11-07T10:24:00Z

And addition by me, checking the old training logs and providing free-form text summary:

We select these on minival (held out from train):

Couple probably important and fixed (not swept) settings, should be visible in the config Andreas linked:

inception-crop and flip-lr almost everywhere except where it's a completely silly thing to do (see BiT paper appendix)
no dropout or stochastic depth
no mixup, no randaugment
no weight decay
batch-size 512
softmax cross-entropy loss
SGDMomentum optimizer, momentum 0.9 in bfloat16.

lucasb-eyer added the question Further information is requested label Nov 7, 2023

google-research locked and limited conversation to collaborators Nov 7, 2023

lucasb-eyer converted this issue into discussion #62 Nov 7, 2023

Provide feedback