Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model architecture search in TinyViT framework #141

Closed
NKSagarReddy opened this issue Dec 21, 2022 · 3 comments
Closed

Model architecture search in TinyViT framework #141

NKSagarReddy opened this issue Dec 21, 2022 · 3 comments
Labels

Comments

@NKSagarReddy
Copy link

NKSagarReddy commented Dec 21, 2022

I have tried finding the search algorithm to find tinier versions of the parent model, using "constrained local search" as mentioned in the paper for reproducing your work.

Could you release the search algorithm where you have used the progressive model contraction approach to find better architectures with good performance?

@wkcn
Copy link
Contributor

wkcn commented Dec 29, 2022

Hi @NKSagarReddy , thanks for your attention to our work.

You can follow the detail, provided in the supplementary material.

We start with a 21M model and generate a set of candidate models around the basic model by adjusting the contraction factors.

For example, the embedding dim of each stage can be increased or decreased by 32 x k. The window size could be 7 or 14.

The models which satisfy the constraints on the number of parameters and throughput will be selected, and the corresponding config file *.yaml will be generated.

We train these models from scratch on 99% ImageNet-1k training set and evaluate them on 1% ImageNet-1k training set.

The models with best validation accuracy will be utilized as the basic models of the next step for more strict constraints.

@wkcn wkcn added the TinyViT label Dec 29, 2022
@NKSagarReddy
Copy link
Author

@wkcn Thank you for the information.

I have one more doubt.

Is the population at each stage constant(like genetic algorithm) or is it done by changing the contraction factors manually for each stage/generation with varying population?

Because in the paper pretraining 21M model was taking 140GPU days,

So if I have to train each submodel from scratch via genetic algorithm, the time taken would be 140np days,
n - generations/evolutions,
p - population.
which could be quite large for a normal Search phase.

And could you share how many V100 GPUS were used during the Search phase of this paper?

@wkcn
Copy link
Contributor

wkcn commented Dec 29, 2022

Is the population at each stage constant(like genetic algorithm) or is it done by changing the contraction factors manually for each stage/generation with varying population?

We did not dive into the setting. We used a constant population 8, i.e. sample around 8 models randomly which satisify the constraints. I think it may be better to vary the population. We did not try it.

Training cost for searching model architecture

When searching the model architecture, the model is trained on ImageNet-1k w/o knowledge distillation. It takes around 12 V100 GPU days for a model. (#107). The NAS method (train a supernet and evaluate the subnets) like AutoFormer may be better to reduce the training cost.

@wkcn wkcn closed this as completed Feb 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants