New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model architecture search in TinyViT framework #141
Comments
Hi @NKSagarReddy , thanks for your attention to our work. You can follow the detail, provided in the supplementary material. We start with a 21M model and generate a set of candidate models around the basic model by adjusting the contraction factors. For example, the embedding dim of each stage can be increased or decreased by 32 x k. The window size could be 7 or 14. The models which satisfy the constraints on the number of parameters and throughput will be selected, and the corresponding config file We train these models from scratch on 99% ImageNet-1k training set and evaluate them on 1% ImageNet-1k training set. The models with best validation accuracy will be utilized as the basic models of the next step for more strict constraints. |
@wkcn Thank you for the information. I have one more doubt. Is the population at each stage constant(like genetic algorithm) or is it done by changing the contraction factors manually for each stage/generation with varying population? Because in the paper pretraining 21M model was taking 140GPU days, So if I have to train each submodel from scratch via genetic algorithm, the time taken would be 140np days, And could you share how many V100 GPUS were used during the Search phase of this paper? |
We did not dive into the setting. We used a constant population 8, i.e. sample around 8 models randomly which satisify the constraints. I think it may be better to vary the population. We did not try it.
When searching the model architecture, the model is trained on ImageNet-1k w/o knowledge distillation. It takes around 12 V100 GPU days for a model. (#107). The NAS method (train a supernet and evaluate the subnets) like AutoFormer may be better to reduce the training cost. |
I have tried finding the search algorithm to find tinier versions of the parent model, using "constrained local search" as mentioned in the paper for reproducing your work.
Could you release the search algorithm where you have used the progressive model contraction approach to find better architectures with good performance?
The text was updated successfully, but these errors were encountered: