StacNAS: Towards Stable and Consistent Optimization for Differentiable Architecture Search

The algorithm proposed a stable and consistent optimization strategy for differentiable architecture search. The mean accuracy is largely improved over DARTS and variance over repeated experiment (withou multiple run and pick the best strategy) is significantly reduced.

Two Stages

Note: it is optional to keep the search stage operation ZERO in the final architecture, which can serve as an attention to learn the relative contribution of the two paths passing to the same node

Two Stages

Requirements

Python >= 3.5.5, PyTorch >= 1.0.0, torchvision >= 0.2.0

Datasets

CIFAR-10 and CIFAR-100 can be automatically downloaded by torchvision, ImageNet needs to be manually downloaded (preferably to a SSD) following the instructions here.

Pretrained models

The easiest way to get started is to evaluate our pretrained StacNAS models.

CIFAR-10/CIFAR-100 (cifar10.pth.tar) (cifar100.pth.tar)

python test.py \
    --name job_name \
    --dataset cifar10 \
    --data_dir /path/to/dataset/ \
    --save_dir /path/to/results/ \
    --seed 1 \
    --stage test \
    --batch_size 96 \
    --load_model_dir "models/cifar10.pth.tar"

Expected result: 1.88% test error rate with 4.48M model params. For CIFAR100, replace --dataset cifar10 with --dataset cifar100
Expected result: 12.9% test error rate with 4.36M model params.

IMAGENET(mobile setting) (imagenet.pth.tar)

Expected result: 1.88% test error rate with 4.48M model params. (Code coming soon)

Architecture search stage1

To carry out architecture search using

python search.py \
    --name job_name \
    --dataset cifar10 \
    --data_dir /path/to/dataset/ \
    --save_dir /path/to/results/ \
    --seed 6 \
    --stage search1 \
    --batch_size 64 \
    --init_channels 16 \
    --num_cells 14 \
    --epochs 80 \
    --alpha_share

Note that train_ratio is the train-valid split ratio, train_ratio=1 means we use all the 50000 training images to search the architecure and use the final result as the best results.

Architecture search stage2

To carry out architecture search2, run

python search.py \
    --name job_name \
    --dataset cifar10 \
    --data_dir /path/to/dataset/ \
    --save_dir /path/to/results/ \
    --seed 6 \
    --stage search2 \
    --batch_size 64 \
    --init_channels 16 \
    --num_cells 20 \
    --epochs 80 \
    --alpha_share

Note that train_ratio is the train-valid split ratio, train_ratio=1 means we use all the 50000 training images to search the architecure and use the final result as the best results.

Architecture evaluation (using full-sized models)

To evaluate our best cells by training from scratch, run

python augment.py \
    --name job_name \
    --dataset cifar10 \
    --data_dir /path/to/dataset/ \
    --save_dir /path/to/results/ \
    --seed 6 \
    --stage augment \
    --batch_size 96 \
    --init_channels 36 \
    --num_cells 20 \
    --grad_clip 5 \
    --aux_weight 0.4 \
    --epochs 600 \
    --alpha_share

Note that aux_weight=0.4 means add an auxiliary head at the 2/3 position of the network, and add aux_weight * auxiliary_loss to the final loss.

Feature clustering

To visualizatize feature clustering, run

python feature.py \
    --name job_name \
    --dataset cifar10 \
    --data_dir /path/to/dataset/ \
    --save_dir /path/to/results/ \
    --seed 1 \
    --stage search1 \
    --batch_size 64

Note that feature.py will load the saved model of search stage1.

Run all

To run all the above directly, run

python run_all.py

Note that you should modify the parameters in file run_all.py before running.

Results

CIFAR10 Benchmark: We reported the mean and standard deviation over 8 single runs (i.e. search once and evaluate once, repeats the procedure 8 times with different seeds). base represents for that the final architecture training procedure is exactly the same with DARTS for a fair comparison; fancy represents for that the finaltraining adopts training tricks for comparison with other NAS methods.

IMAGENET mobile setting Benchmark

Can architecture weight really learn the relative importance?

Two Stages

Found architectures

CIFAR-10

CIFAR-100

ImageNet

Reference

https://github.com/quark0/darts

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
images		images
README.md		README.md
augment.py		augment.py
augment_cnn.py		augment_cnn.py
auto_augment.py		auto_augment.py
cells.py		cells.py
cifar10.pth.tar		cifar10.pth.tar
config.py		config.py
data_loader.py		data_loader.py
feature.py		feature.py
feature_map.py		feature_map.py
genotypes.py		genotypes.py
operators.py		operators.py
run_all.py		run_all.py
search.py		search.py
search_cnn.py		search_cnn.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StacNAS: Towards Stable and Consistent Optimization for Differentiable Architecture Search

Requirements

Datasets

Pretrained models

Architecture search stage1

Architecture search stage2

Architecture evaluation (using full-sized models)

Feature clustering

Run all

Results

Can architecture weight really learn the relative importance?

Found architectures

Reference

About

Releases

Packages

Languages

susan0199/StacNAS

Folders and files

Latest commit

History

Repository files navigation

StacNAS: Towards Stable and Consistent Optimization for Differentiable Architecture Search

Requirements

Datasets

Pretrained models

Architecture search stage1

Architecture search stage2

Architecture evaluation (using full-sized models)

Feature clustering

Run all

Results

Can architecture weight really learn the relative importance?

Found architectures

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages