# NAS (Neural Architecture Search) for PointPillars Lidar Detection on Vertex AI with TF-vision

Make sure that you have read the [required documentations](https://cloud.google.com/vertex-ai/docs/training/neural-architecture-search/overview#reading_order)
before executing this notebook.
NOTE: This notebook is meant to run pre-built trainer code with pre-built search-spaces. If you want to run your own trainer
code or create your own NAS search-space from scratch, then do not use this notebook.

Expected best model result:
- The mAP/mAPH on Waymo-Open-Dataset: 49.33/48.59
- The inference latency on 1 V100 GPU: 57.3 ms

The mAP/mAPH of our chosen [baseline model](https://github.com/tensorflow/models/tree/master/official/projects/pointpillars) is 45.96/45.35. The best model searched by NAS improved metrics by 10% without significant latency change. Note that the information is not supposed to used as a benchmark.
This example is just meant to demonstrate NAS LiDAR search space against our chosen baseline.

Experiment setup to reproduce the result:
- Stage-1 search
    - Number of trials: 1300
    - Number of GPUs per trial: 4
    - GPU type: TESLA_V100
    - Average time per trial: 4 hours
    - Number of parallel trials: 10
    - GPU quota used: 40 V100 GPUs
    - Time to run: 21.6 days
    Since the number of days is more than 14 days, you will have to [resume the search job](https://cloud.google.com/vertex-ai/docs/training/neural-architecture-search/nas-client#nas-cli-search-resume).
    If you have higher GPU quota, then the runtime will decrease proportionately.
    - GPU hours: 20,800 V100 GPU hours
- Stage-2 full-training with top 5 models
    - Number of trials: 5
    - Number of GPUs per trial: 8
    - GPU type: TESLA_V100
    - Average time per trial: 12 days
    - Number of parallel trials: 5
    - GPU quota used: 40 V100 GPUs
    - Time to run: 12 days
    - GPU hours: 11,520 V100 GPU hours

Stage1 search cost: ~$85,000
Stage2 full-training cost: ~$45,000
Total cost: ~$130,000


Here are the **pre-requisites** before you can start using this notebook: 
1. Your GCP project should have been (a) clear-listed and (b) a GPU quota should have been allocated for the NAS jobs, refert to [this](https://cloud.google.com/vertex-ai/docs/training/neural-architecture-search/environment-setup#device-quota) for requesting GPU quota. This notebook requires 200 GPUs to run experiments.
2. You have selected a python3 kernel to run this notebook.

# Install required libraries

In [None]:
%%sh

# Libraries required for using the NAS client.
pip install pyglove==0.1.0

# Libraries required for pre-processing Waymo-Open-Dataset. 
pip install keras==2.6.0
pip install tensorflow==2.6.0
pip install tf-models-official==2.7.2
pip install waymo-open-dataset-tf-2-6-0
pip install apache-beam[gcp]==2.42.0 --user

**NOTE: Please restart the notebook after installing above libraries successfully.**

# Download source code
This needs to be done just once.


In [None]:
%%sh

# NOTE: It is ok for this step to fail if the directory exists.
mkdir ~/nas_experiment

In [None]:
%%sh
rm -r -f ~/nas_experiment/nas_codes
git clone git@github.com:google/vertex-ai-nas.git ~/nas_experiment/nas_codes

# Set code path

In [None]:
import os
os.chdir('/home/jupyter/nas_experiment/nas_codes')

# Set up environment variables
Here we set up the environment variables.
NOTE: These have to be set-up every time you run a new session because the later code-blocks use them.


In [None]:
# Set a unique USER name. This is used for creating a unique job-name identifier.
%env USER=<fill>
# Set a region to launch jobs into.
%env REGION=<fill>
# Set any unique docker-id for this run. When the next section builds a docker, then this id will be used to tag it.
%env TRAINER_DOCKER_ID=<fill>
# The GCP project-id must be the one that has been clear-listed for the NAS jobs. 
%env PROJECT_ID=<fill>
# Set an output working directory for the NAS jobs. The GCP project should have write access to 
# this GCS output directory. A simple way to ensure this is to use a bucket inside the same GCP project.
# NOTE: The region of the bucket must be the same as job's.
%env GCS_ROOT_DIR=<fill>
# Set the accelerator device type.
%env DEVICE=<fill>

**NOTE:** The following set up steps need to be done just once.

In [None]:
%%sh

# Authenticate docker for your artifact registry.
gcloud auth configure-docker ${REGION}-docker.pkg.dev

In [None]:
%%sh

# NOTE: This needs to be just once for the first time. It is ok for this to FAIL if the GCS bucket already exists.

# Create the output directory. 
gsutil mkdir $GCS_ROOT_DIR

# Prepare dataset

We use [Waymo-Open-Dataset](https://waymo.com/open/) as the example dataset for training and evaluation. We run a script to pre-process the Waymo-Open-Dataset, which converts the raw [Lidar frame data](https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L370) to a format that can be fed into the model.

In [None]:
%%sh

# Set the GCS path of source Waymo-Open-Dataset. The GCP project should have read access to the data-location.
SRC_DIR='gs://waymo_open_dataset_v_1_2_0_individual_files'
# Set the GCS path of processed dataset. The GCP project should have write access to the data-location.
DST_DIR=<fill>

# Set the runner for beam. See https://beam.apache.org/documentation/#runners for distributed runners.
RUNNER="DirectRunner"

export PYTHONPATH="/home/jupyter/nas_experiment/nas_codes/:$PYTHONPATH"
python3 tf_vision/pointpillars/tools/process_wod.py \
--src_dir=${SRC_DIR} \
--dst_dir=${DST_DIR} \
--pipeline_options="--runner=${RUNNER}"

In [None]:
# Set the GCS paths to the training and validation datasets. The GCP project should have read access to the data-location.
%env STAGE1_TRAINING_DATA_PATH=<fill>
%env STAGE1_VALIDATION_DATA_PATH=<fill>
%env STAGE2_TRAINING_DATA_PATH=<fill>
%env STAGE2_VALIDATION_DATA_PATH=<fill>

# Validate and Visualize data format

The following code verifies that the data can be loaded properly before you run the experiments.

In [None]:
%matplotlib inline
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.image as mm
import matplotlib.patches as patches
import numpy as np
import tensorflow as tf
from tf_vision.pointpillars.configs import pointpillars as cfg
from tf_vision.pointpillars.dataloaders import decoders

def draw_labels(plt, ax, example):
  classes = example['gt_classes'].numpy()
  boxes = example['gt_boxes'].numpy()
  headings = example['gt_attributes']['heading'].numpy()
  for i in range(len(classes)):
    xmin, xmax, ymin, ymax, heading, clss = \
        boxes[i][1], boxes[i][3], boxes[i][0], boxes[i][2], headings[i][0], classes[i]
    length = xmax - xmin
    width = ymax - ymin
    center_x = xmin + length * 0.5
    center_y = ymin + width * 0.5

    color = 'red'
    if clss == 1: color = 'green'
    elif clss == 2: color = 'red'
    elif clss == 3: color = 'orange'
    t = mpl.transforms.Affine2D().rotate_around(center_x, center_y, -heading) + ax.transData

    rect = patches.Rectangle((xmin, ymin), length, width, linewidth=4, edgecolor=color, facecolor='none')
    rect.set_transform(t)
    ax.add_patch(rect)
    plt.scatter(center_x, center_y, s=100, c=color, marker='o', linewidths=3)

def draw_example(example, height, width):
  pillars = example['pillars'].numpy()
  indices = example['indices'].numpy()
  fig, ax = plt.subplots(figsize=(height*0.08, width*0.08))
  img = np.zeros([height, width])

  for i in range(len(pillars)):
    index = indices[i]
    img[index[0]][index[1]] = 1
  plt.imshow(img)

  draw_labels(plt, ax, example)
  plt.show()


pillars_config = cfg.PillarsConfig()
image_config = cfg.ImageConfig()
decoder = decoders.ExampleDecoder(image_config, pillars_config)
dataset_files = tf.io.gfile.glob(os.environ['STAGE1_TRAINING_DATA_PATH'])
dataset = tf.data.TFRecordDataset(dataset_files[2], compression_type='GZIP')
for example in dataset:
  frame = decoder.decode(example)
  draw_example(frame, image_config.height, image_config.width)
  break

# Build container
The container must be built the first time and then every time the source-code is modified. Otherwise, there is no need to run this step. This step internally builds the *Dockerfile* in the source-code directory and then pushes the docker to the cloud. 

In [None]:
%%sh

# NOTE: This step can take several minutes when run for the first time.

python3 vertex_nas_cli.py build \
--project_id=${PROJECT_ID} \
--trainer_docker_id=${TRAINER_DOCKER_ID} \
--region=${REGION} \
--trainer_docker_file="tf_vision/pointpillars.Dockerfile"

# Launch NAS stage 1 job
If you want to customize this notebook for your own dataset, then you must
read the [required documentations](https://cloud.google.com/vertex-ai/docs/training/neural-architecture-search/overview#reading_order)
to ensure that you set up the proxy-task and other settings properly.

In [None]:
%%sh

DATE="$(date '+%Y%m%d_%H%M%S')"
JOB_ID="${USER}_nas_pointpillars_${DATE}"

CMD="
python3 vertex_nas_cli.py search \
--project_id=${PROJECT_ID} \
--region=${REGION} \
--trainer_docker_id=${TRAINER_DOCKER_ID} \
--job_name=${JOB_ID} \
--max_nas_trial=3000 \
--max_parallel_nas_trial=50 \
--max_failed_nas_trial=300 \
--use_prebuilt_trainer=True \
--prebuilt_search_space="pointpillars" \
--accelerator_type=${DEVICE} \
--num_gpus=4 \
--master_machine_type="n1-highmem-32" \
--root_output_dir=${GCS_ROOT_DIR} \
--search_docker_flags \
params_override="tf_vision/configs/experiments/pointpillars_search_gpu.yaml" \
training_data_path=${STAGE1_TRAINING_DATA_PATH} \
validation_data_path=${STAGE1_VALIDATION_DATA_PATH} \
model="pointpillars" \
multiple_eval_during_search=false
"

echo Executing command: ${CMD}
    
${CMD}

# Inspect NAS search progress
A periodic evaluation while the search is going on can help decide if the search job has converged. This code-block shows how to generate summary of top N trials so far.

In [None]:
# Set the stage1 search-job id. It's a numeric value returned by the Vertex service.
%env JOB_ID=<fill>

In [None]:
%%sh

mkdir -p /home/jupyter/nas_experiment/jobs

OUTPUT_FILE=/home/jupyter/nas_experiment/jobs/${JOB_ID}.yaml

python3 vertex_nas_cli.py list_trials \
--project_id=${PROJECT_ID} \
--job_id=${JOB_ID} \
--region=${REGION} \
--trials_output_file=${OUTPUT_FILE}

cat ${OUTPUT_FILE}

# Launch NAS stage 2 job

In [None]:
%%sh

DATE="$(date '+%Y%m%d_%H%M%S')"

# TRIAL_ID is one of the best performing trials which has to be finetuned.
TRIAL_IDS=<fill> # The top trials chosen for training to converge.

CMD="

python3 vertex_nas_cli.py train \
--project_id=${PROJECT_ID} \
--region=${REGION} \
--trainer_docker_id=${TRAINER_DOCKER_ID} \
--use_prebuilt_trainer=True \
--prebuilt_search_space="pointpillars" \
--train_accelerator_type=${DEVICE} \
--train_num_gpus=8 \
--train_master_machine_type="n1-highmem-32" \
--root_output_dir=${GCS_ROOT_DIR} \
--search_job_id=${JOB_ID} \
--search_job_region=${REGION} \
--train_nas_trial_numbers=${TRIAL_IDS} \
--train_job_suffix="stage2_${DATE}" \
--train_docker_flags \
params_override="tf_vision/configs/experiments/pointpillars_search_finetune_gpu.yaml" \
training_data_path=${STAGE2_TRAINING_DATA_PATH} \
validation_data_path=${STAGE2_VALIDATION_DATA_PATH} \
model="pointpillars"
"

echo Executing command: ${CMD}
    
${CMD}