### Guide: Predictor-based Architecture Search in PIDS

This is a guideline regarding the architecture search part in the PIDS paper. Note that for the sample stage of predictor-based NAS, it is the most efficient to run 1 search on a single GPU, and distribute the workload during the sample process.



In [None]:
# Set your project folder as follows. The working directory should be the same as the project folder. 
import os
os.chdir("../..")

### Download the Dataset

##### We upload the dataset to dropbox. You can download the related runtime via the links below.

In [None]:
https://duke.box.com/s/dajlest7kwcla1syqio0dma7rk284s3k

### Re-compile CPP wrappers

It is better to re-run the compliation to build the CPP wrappers based on local kernel. You should run the shell under `pids_core/cpp_wrappers` under `pids_core/cpp_wrappers`.

## Sample Stage
A lot of architectures are sampled to get ready for the predictor training. You may need ~2K samples to train a sufficiently good predictor (~2*8 V100s).

### Sample Architecture-Performance Pairs

#### Architecture Search on SemanticKITTI (Sample Stage)

In [None]:
!CUDA_VISIBLE_DEVICES=2 python -u main_sample.py \
    --task semantickitti \
    --search_config search_semantickitti_cfg \
    --model_root ./experiments/experiments-pids-new/semantickitti-pids-distribute0 \
    --budget 500

#### Architecture Search on ModelNet40 (Sample Stage)

In [None]:
!CUDA_VISIBLE_DEVICES=0 python -u main_sample.py \
    --task modelnet40 \
    --search_config search_modelnet40_cfg \
    --model_root ./experiments/experiments-pids-new/modelnet40-pids-distribute0 \
    --budget 500

#### Architecture Search on S3DIS (Sample Stage)

Warning: the current setting for S3DIS is not appropriate for NAS because current validation (Area 5) is the same as the outcome of NAS search. If you wish to do NAS on S3DIS, it's better to split a hold-out validation set from the training dataset (Area 1/2/4/6). 

In [None]:
!CUDA_VISIBLE_DEVICES=0 python -u main_sample.py \
    --task s3dis \
    --search_config search_s3dis_cfg \
    --model_root ./experiments/experiments-pids-new/s3dis-pids-distribute0 \
    --budget 500

### Sample Architecture-Flops Pairs (Optional)

#### Flops on SemanticKITTI (Sample Stage)

In [None]:
!CUDA_VISIBLE_DEVICES=2 python -u main_sample.py \
    --task semantickitti-flops \
    --search_config search_semantickitti_cfg \
    --model_root ./experiments/experiments-pids-new/semantickitti-pids-flops-distribute0 \
    --budget 500

#### S3DIS (Sample Stage)

In [None]:
!CUDA_VISIBLE_DEVICES=0 python -u main_sample.py \
    --task s3dis-flops \
    --search_config search_s3dis_cfg \
    --model_root ./experiments/experiments-pids-new/s3dis-pids-flops-distribute0 \
    --budget 500

#### ModelNet40 (Sample Stage)

In [None]:
!CUDA_VISIBLE_DEVICES=0 python -u main_sample.py \
    --task modelnet40-flops \
    --search_config search_modelnet40_cfg \
    --model_root ./experiments/experiments-pids-new/modelnet40-pids-flops-distribute0 \
    --budget 500

## Training Dense-Sparse Predictor for accurate accuracy prediction.

A performance predictor is needed to map architectures to their predicted performance. To start with, we have to first train an accurate *FLOPS predictor* that accurately predicts FLOPS. We take SemanticKITTI as an example and it's similar for the rest of the benchmarks. 

In [None]:
!CUDA_VISIBLE_DEVICES=0 python train_predictor.py \
    --root_dir ./experiments/experiments-pids-new/ \
    --pattern semantickitti-pids-flops-distribute* \
    --record_name semantickitti-flops-pids.records \
    --task semantickitti-flops \
    --map_fn_name dense-sparse \
    --nn_arch dense-sparse-nn \
    --hparams_json_path ./predictor/hparams/hparams_dense_sparse_nn_semantickitti_flops.json \
    --save_ckpt_path ./predictor/semantickitti_predictors/semantickitti_flops_pred_dense_sparse.pt

Change the path of `save_ckpt_path` when needed.

Then, you can proceed with the training of accuracy/mIOU predictor.

In [None]:
!CUDA_VISIBLE_DEVICES=0 python train_predictor.py --root_dir ./experiments/experiments-pids-new/ \
    --pattern semantickitti-pids-distribute* \
    --record_name semantickitti-pids.records \
    --task semantickitti \
    --map_fn_name dense-sparse \
    --nn_arch dense-sparse-nn \
    --hparams_json_path ./predictor/hparams/hparams_dense_sparse_nn_semantickitti.json \
    --save_ckpt_path ./predictor/acc_prediction/semantickitti_predictive_dense_sparse_nnmodel.pt \
    --pretrain_ckpt_path ./predictor/flops_prediction/semantickitti_predictive_dense_sparse_nnmodel.pt

You should change the path when necessary. Note that `--pretrain_ckpt_path` gives you the pretrained predictor on FLOPS prediction tasks (which should be good).

If you want to see the cross-validation results on architecture-performance pair mapping (performance prediction), you may try `train_predictor_multisplit.py` to see the full cross-validation result.

Note that hyperparameters selected for each benchmark can be found in `./predictor/hparams/`. If you switch to a different task, you can check the hyperparameters provided. You can also explore the hyperparameters automatically by setting `single_run=False` in **line 330** of `train_predictor.py`. We provide the pretrained predictors at that time for reference.

### Search for the best architecture, using trained predictor.

We use SemanticKITTI as an example.

In [None]:
!CUDA_VISIBLE_DEVICES=2 python main_search.py --task semantickitti \
    --acc_predictor_ckpt_path ./predictor/acc_prediction/semantickitti-predictive-dense_sparse_nnmodel.pt \
    --flops_predictor_ckpt_path ./predictor/flops_prediction/semantickitti-predictive-dense_sparse_nnmodel.pt \
    --method regularized-ea \
    --dump_json_path ./searched_archs/semantickitti/regularized-ea-dense-sparse-fp0.2-new/results.json \
    --flops_penalty_coef 0.2

*Note*: `flops_penalty_coef` should be changed according to the predictor training regime. Note that a larger constraint will produce smaller architectures. 0.2-0.5 should be generally fine if the samples are not too biased. Train the top-5 models from scratch will generally give you a good model for final evaluation. In this example, the best architectures are stored in `./searched_archs/semantickitti/regularized-ea-dense-sparse-fp0.2-new/results.json` and you should try the top-5 models to get the best model.

By default, we use regularized evolution as it leads to the best result. You can also try the implementation of RL if you need.