Model architecture search in TinyViT framework #141

NKSagarReddy · 2022-12-21T10:33:03Z

I have tried finding the search algorithm to find tinier versions of the parent model, using "constrained local search" as mentioned in the paper for reproducing your work.

Could you release the search algorithm where you have used the progressive model contraction approach to find better architectures with good performance?

wkcn · 2022-12-29T03:30:48Z

Hi @NKSagarReddy , thanks for your attention to our work.

You can follow the detail, provided in the supplementary material.

We start with a 21M model and generate a set of candidate models around the basic model by adjusting the contraction factors.

For example, the embedding dim of each stage can be increased or decreased by 32 x k. The window size could be 7 or 14.

The models which satisfy the constraints on the number of parameters and throughput will be selected, and the corresponding config file *.yaml will be generated.

We train these models from scratch on 99% ImageNet-1k training set and evaluate them on 1% ImageNet-1k training set.

The models with best validation accuracy will be utilized as the basic models of the next step for more strict constraints.

NKSagarReddy · 2022-12-29T04:15:06Z

@wkcn Thank you for the information.

I have one more doubt.

Is the population at each stage constant(like genetic algorithm) or is it done by changing the contraction factors manually for each stage/generation with varying population?

Because in the paper pretraining 21M model was taking 140GPU days,

So if I have to train each submodel from scratch via genetic algorithm, the time taken would be 140np days,
n - generations/evolutions,
p - population.
which could be quite large for a normal Search phase.

And could you share how many V100 GPUS were used during the Search phase of this paper?

wkcn · 2022-12-29T05:39:34Z

Is the population at each stage constant(like genetic algorithm) or is it done by changing the contraction factors manually for each stage/generation with varying population?

We did not dive into the setting. We used a constant population 8, i.e. sample around 8 models randomly which satisify the constraints. I think it may be better to vary the population. We did not try it.

Training cost for searching model architecture

When searching the model architecture, the model is trained on ImageNet-1k w/o knowledge distillation. It takes around 12 V100 GPU days for a model. (#107). The NAS method (train a supernet and evaluate the subnets) like AutoFormer may be better to reduce the training cost.

wkcn added the TinyViT label Dec 29, 2022

wkcn closed this as completed Feb 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model architecture search in TinyViT framework #141

Model architecture search in TinyViT framework #141

NKSagarReddy commented Dec 21, 2022 •

edited

wkcn commented Dec 29, 2022

NKSagarReddy commented Dec 29, 2022

wkcn commented Dec 29, 2022 •

edited

Model architecture search in TinyViT framework #141

Model architecture search in TinyViT framework #141

Comments

NKSagarReddy commented Dec 21, 2022 • edited

wkcn commented Dec 29, 2022

NKSagarReddy commented Dec 29, 2022

wkcn commented Dec 29, 2022 • edited

NKSagarReddy commented Dec 21, 2022 •

edited

wkcn commented Dec 29, 2022 •

edited