# NAS (Neural Architecture Search) for Object-Detection on Vertex AI with TF-vision

NOTE: This notebook is meant to run pre-built trainer code with pre-built search-spaces. If you want to run your own trainer
code or create your own NAS search-space from scratch, then do not use this notebook. Instead, please follow instructions in
[Using custom trainer with NAS](https://cloud.google.com/vertex-ai/docs/neural-architecture-search/custom-trainer) documentation.

This notebook shows example of [SpineNet](https://arxiv.org/pdf/1912.05027.pdf) paper result on CoCo data.
According to Table 3 in this paper, SpineNet-49 achieves 40.8 AP score with 85.4B FLOPs, 
which is higher performance than R50-FPN which achieves 37.8 AP score with 96.8B FLOPs.
However, this notebook uses GPUs instead of TPUs for training and significantly less number of 1000 total trials for search.
With this notebook, the expected Stage2 AP score on Spinenet is 39.1 AP with 94.43 FLOPS which is still higher performance than R50-FPN.
The detailed settings for this notebook are:
- Stage-1 search
    - Number of trials: 1000
    - Number of GPUs per trial: 2
    - GPU type: TESLA_V100
    - Avg single trial training time: 2.5 hours
    - Number of parallel trials: 10
    - GPU quota used: 10*2 = 20 V100 GPUs
    - Time to run: 10.4 days
    If you have higher GPU quota, then the runtime will decrease proportionately.
    - GPU hours: 5000 V100 GPU hours
- Stage-2 full training with top 5 models
    - Number of trials: 5
    - Number of GPUs per trial: 8
    - GPU type: TESLA_V100
    - Avg single trial training time: 3 days 17 hrs
    - Number of parallel trials: 5
    - GPU quota used: 5*8 = 40 V100 GPUs
    You can also run this with just 24 V100 GPUs by running the job twice
    with just 3 models at a time instead of all 5 in parallel.
    - Time to run: 3 days 17 hrs
    - GPU hours: 3560 V100 GPU hours

Stage1 search cost: ~$22,000
Stage2 full-training cost: ~$14,000
Total cost: ~$36,000


Here are the **pre-requisites** before you can start using this notebook: 
1. Your GCP project should have been (a) allow-listed and (b) [a GPU quota should have been allocated](https://cloud.google.com/vertex-ai/docs/neural-architecture-search/environment-setup#device-quota) for the NAS jobs.
2. You have selected a python3 kernel to run this notebook.

# Install required libraries

In [None]:
%%sh

pip install tensorflow==2.7.0 --user
pip install tf-models-official==2.7.1
pip install pyglove==0.1.0

**NOTE: Please restart the notebook after installing above libraries successfully.**

# Download source code
This needs to be done just once.

**NOTE:** Please delete any previous *nas_codes.zip* file from both the local download folder and this notebook-instance machine.

1. Click this [link](https://storage.cloud.google.com/cloud-ml-nas-private-beta-sharing/nas_codes.zip) to download a fresh copy of the nas_codes.zip file. 
2. Upload the downloaded *nas_codes.zip* file to this notebook-instance machine. By default the file should be uploaded to the */home/jupyter* folder (also referenced as *~/*).


In [None]:
%%sh

# NOTE: It is ok for this step to fail if the directory exists.
mkdir ~/nas_experiment

In [None]:
%%sh
rm -r -f ~/nas_experiment/nas_codes
unzip -q -d ~/nas_experiment/nas_codes ~/nas_codes.zip

# Set code path

In [None]:
import os
os.chdir('/home/jupyter/nas_experiment/nas_codes')

# Set up environment variables
Here we set up the environment variables.
NOTE: These have to be set-up every time you run a new session because the later code-blocks use them.


In [None]:
# Set a unique USER name. This is used for creating a unique job-name identifier.
%env USER=<fill>
# Set a region to launch jobs into.
%env REGION=<fill>
# Set any unique docker-id for this run. When the next section builds a docker, then this id will be used to tag it.
%env TRAINER_DOCKER_ID=<fill>
%env LATENCY_CALCULATOR_DOCKER_ID=<fill>
# The GCP project-id must be the one that has been clear-listed for the NAS jobs. 
%env PROJECT_ID=<fill>
# Set an output working directory for the NAS jobs. The GCP project should have write access to 
# this GCS output directory. A simple way to ensure this is to use a bucket inside the same GCP project.
# NOTE: The region of the bucket must be the same as job's.
%env GCS_ROOT_DIR=<fill>
# Set the accelerator device type.
%env DEVICE=<fill>


# Set the GCS paths to the training and validation datasets. The GCP project should have read access to the data-location.
# Please read the "Data-Preparation" section 
# (https://cloud.google.com/vertex-ai/docs/neural-architecture-search/pre-built-trainer#data-preparation)
# in the documentation to ensure that the data is in an appropriate format
# suitable for the NAS pipeline. You can run the "Validate and Visualize data format" section in this notebook 
# to verify that the data can be loaded properly.
%env STAGE1_TRAINING_DATA_PATH=gs://cloud-nas-public-eu/detection/coco/train-00[0-1]??-of-00256.tfrecord,gs://cloud-nas-public-eu/detection/coco/train-002[0-3]?-of-00256.tfrecord
%env STAGE1_VALIDATION_DATA_PATH=gs://cloud-nas-public-eu/detection/coco/train-002[4-5]?-of-00256.tfrecord
%env STAGE2_TRAINING_DATA_PATH=gs://cloud-nas-public-eu/detection/coco/train*
%env STAGE2_VALIDATION_DATA_PATH=gs://cloud-nas-public-eu/detection/coco/val*

**NOTE:** The following set up steps need to be done just once.

In [None]:
%%sh

# Authenticate docker for your artifact registry.
gcloud auth configure-docker ${REGION}-docker.pkg.dev

In [None]:
%%sh

# NOTE: This needs to be just once for the first time. It is ok for this to FAIL if the GCS bucket already exists.

# Create the output directory. 
gsutil mkdir $GCS_ROOT_DIR

# Validate and Visualize data format

The following code verifies that the data can be loaded properly before you run the experiments.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
from official.vision.beta.dataloaders import tf_example_decoder
import cloud_nas_utils

dataset = tf.data.Dataset.list_files(os.environ.get('STAGE1_TRAINING_DATA_PATH'), shuffle=False).apply(tf.data.TFRecordDataset)
dataset = dataset.map(tf_example_decoder.TfExampleDecoder().decode).batch(1)

num_examples = 4
for (i, example) in enumerate(dataset.take(num_examples)):
    for k, v in example.items():
        example[k] = v.numpy()[0]
    
    image_with_boxes = cloud_nas_utils.draw_boxes(example['image'].copy(), example['groundtruth_boxes'], example['groundtruth_classes'], [1.0] * len(example['groundtruth_classes']), max_boxes=10, min_score=0.01)

    _, ax = plt.subplots(1, 1, figsize=(100, 64))
    ax.imshow(image_with_boxes)
    ax.grid(False)

# Build container
The container must be built the first time and then every time the source-code is modified. Otherwise, there is no need to run this step. This step internally builds the *Dockerfile* in the source-code directory and then pushes the docker to the cloud. 

In [None]:
%%sh

# NOTE: This step can take several minutes when run for the first time.

python3 vertex_nas_cli.py build \
--project_id=${PROJECT_ID} \
--trainer_docker_id=${TRAINER_DOCKER_ID} \
--region=${REGION} \
--trainer_docker_file="tf_vision/nas_multi_trial.Dockerfile" \
--latency_calculator_docker_id=${LATENCY_CALCULATOR_DOCKER_ID} \
--latency_calculator_docker_file="tf_vision/latency_computation_using_saved_model.Dockerfile"

### Launch NAS stage 1 job
If you want to customize this notebook for your own dataset other than CoCo, then you must
read the [Best Practices and Suggested Workflow](https://cloud.google.com/vertex-ai/docs/neural-architecture-search/suggested-workflow)
section in the documentation to ensure that you set up the proxy-task and other settings properly.

In [None]:
%%sh

DATE="$(date '+%Y%m%d_%H%M%S')"
JOB_ID="${USER}_nas_tfvision_iod_${DATE}"

CMD="
python3 vertex_nas_cli.py search \
--project_id=${PROJECT_ID} \
--region=${REGION} \
--trainer_docker_id=${TRAINER_DOCKER_ID} \
--job_name=${JOB_ID} \
--max_nas_trial=1000 \
--max_parallel_nas_trial=10 \
--max_failed_nas_trial=200 \
--use_prebuilt_trainer=True \
--prebuilt_search_space="spinenet" \
--accelerator_type=${DEVICE} \
--num_gpus=2 \
--root_output_dir=${GCS_ROOT_DIR} \
--search_docker_flags \
params_override="tf_vision/configs/experiments/spinenet_search_gpu.yaml" \
training_data_path=${STAGE1_TRAINING_DATA_PATH} \
validation_data_path=${STAGE1_VALIDATION_DATA_PATH} \
model="retinanet"
"

echo Executing command: ${CMD}
    
${CMD}

# Inspect NAS search progress
A periodic evaluation while the search is going on can help decide if the search job has converged. This code-block shows how to generate summary of top N trials so far.

In [None]:
# Set the stage1 search-job id. It's a numeric value returned by the Vertex service.
%env JOB_ID=<fill>

In [None]:
%%sh

mkdir -p /home/jupyter/nas_experiment/jobs
python3 vertex_nas_cli.py list_trials \
--project_id=${PROJECT_ID} \
--job_id=${JOB_ID} \
--region=${REGION} \
--trials_output_file=/home/jupyter/nas_experiment/jobs/${JOB_ID}.yaml

cat /home/jupyter/nas_experiment/jobs/${JOB_ID}.yaml

# Launch NAS stage 2 job

In [None]:
%%sh

DATE="$(date '+%Y%m%d_%H%M%S')"

# Please modify the "JOB_ID", "TRIAL_ID", and the finetuning config file before running.
# JOB_ID is numeric value you can find from the job UI in Pantheon.
JOB_ID=<fill>
# TRIAL_ID is one of the best performing trials which has to be finetuned.
TRIAL_IDS=<fill> # The top trials chosen for training to converge.

CMD="

python3 vertex_nas_cli.py train \
--project_id=${PROJECT_ID} \
--region=${REGION} \
--trainer_docker_id=${TRAINER_DOCKER_ID} \
--use_prebuilt_trainer=True \
--prebuilt_search_space="spinenet" \
--train_accelerator_type=${DEVICE} \
--train_num_gpus=8 \
--root_output_dir=${GCS_ROOT_DIR} \
--search_job_id=${JOB_ID} \
--search_job_region=${REGION} \
--train_nas_trial_numbers=${TRIAL_IDS} \
--train_job_suffix="stage2_${DATE}" \
--train_docker_flags \
params_override="tf_vision/configs/experiments/spinenet_search_finetune_gpu.yaml" \
training_data_path=${STAGE2_TRAINING_DATA_PATH} \
validation_data_path=${STAGE2_VALIDATION_DATA_PATH} \
model="retinanet"
"

echo Executing command: ${CMD}
    
${CMD}

# Post Training Export and Analysis

### Create SavedModel

In [None]:
# Set the output path for the saved-model.
%env SAVED_MODEL_DIR=/home/jupyter/nas_experiment/saved_models/
# The directory of the finished job.
%env JOB_DIR=<fill>
# The trial you want to export.
%env TRIAL=<fill>
# The config file you used to launch the job, like tf_vision/configs/experiments/spinenet_search_gpu.yaml.
%env CONFIG=<fill>

In [None]:
import dataclasses
import tensorflow as tf

# Import from nas_codes.
import pyglove as pg
import cloud_nas_utils
import search_spaces
from tf_vision import registry_imports
from tf_vision import config_utils

# Import from tf-vision library.
from official.vision.beta.serving import export_saved_model_lib

@dataclasses.dataclass
class Args():
    model = 'retinanet'
    config_file = os.environ.get('CONFIG')
    params_override = None
    job_dir = os.environ.get('JOB_DIR')
    training_data_path = os.environ.get('STAGE1_TRAINING_DATA_PATH')
    validation_data_path = os.environ.get('STAGE1_VALIDATION_DATA_PATH')
    use_tpu = None

args = Args()
search_space = 'spinenet'
job_dir = os.environ.get('JOB_DIR')
trial_dir = os.path.join(job_dir, os.environ.get('TRIAL'))
nas_params_str = os.path.join(trial_dir, 'nas_params_str.json')

tunable_functor_or_object = cloud_nas_utils.parse_and_save_nas_params_str(
    search_spaces.get_search_space(search_space), nas_params_str, job_dir)
tunable_object = tunable_functor_or_object()
serialized_tunable_object = pg.to_json_str(tunable_object, json_indent=2, hide_default_values=False)
params = config_utils.create_params(
    args,
    search_space,
    serialized_tunable_object,
    None)
latest_checkpoint = tf.train.latest_checkpoint(trial_dir)

export_saved_model_lib.export_inference_graph(
    input_type='image_bytes',
    batch_size=1,
    input_image_size=params.task.model.input_size[0:2],
    params=params,
    checkpoint_path=latest_checkpoint,
    export_dir=os.environ.get('SAVED_MODEL_DIR'))

In [None]:
%%sh

saved_model_cli show --dir=${SAVED_MODEL_DIR} --all

### Run Prediction and Visualize results

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import cloud_nas_utils

model = tf.saved_model.load(os.environ.get('SAVED_MODEL_DIR'))
detect_fn = model.signatures['serving_default']

In [None]:
keys_to_features = {
    'image/encoded': tf.io.FixedLenFeature((), tf.string),
    'image/height': tf.io.FixedLenFeature((), tf.int64),
    'image/width': tf.io.FixedLenFeature((), tf.int64),
    'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
    'image/object/class/label': tf.io.VarLenFeature(tf.int64),
    'image/object/area': tf.io.VarLenFeature(tf.float32),
    'image/object/is_crowd': tf.io.VarLenFeature(tf.int64),
}
def _parse_function(serialized_example):
    parsed_tensors = tf.io.parse_single_example(serialized_example, keys_to_features)
    for k in parsed_tensors:
        if isinstance(parsed_tensors[k], tf.SparseTensor):
            if parsed_tensors[k].dtype == tf.string:
                parsed_tensors[k] = tf.sparse.to_dense(parsed_tensors[k], default_value='')
            else:
                parsed_tensors[k] = tf.sparse.to_dense(parsed_tensors[k], default_value=0)

    image = tf.io.decode_image(parsed_tensors['image/encoded'], channels=3)
    image.set_shape([None, None, 3])
    xmin = parsed_tensors['image/object/bbox/xmin']
    xmax = parsed_tensors['image/object/bbox/xmax']
    ymin = parsed_tensors['image/object/bbox/ymin']
    ymax = parsed_tensors['image/object/bbox/ymax']
    boxes =  tf.stack([ymin, xmin, ymax, xmax], axis=-1)
    decoded_tensors = {
      'height': parsed_tensors['image/height'],
      'width': parsed_tensors['image/width'],
      'groundtruth_classes': parsed_tensors['image/object/class/label'],
      'groundtruth_boxes': boxes,
      'image': image,
      'image_bytes': parsed_tensors['image/encoded'],
    }
    return decoded_tensors

dataset = tf.data.Dataset.list_files(os.environ.get('VALIDATION_DATA_PATH'), shuffle=False).apply(tf.data.TFRecordDataset)
dataset = dataset.map(_parse_function).batch(1)

num_examples = 1
for (i, example) in enumerate(dataset.take(num_examples)):
    outputs = detect_fn(example['image_bytes'])
    
    for k, v in example.items():
        example[k] = v.numpy()[0]
    for k, v in outputs.items():
        outputs[k] = v.numpy()[0]
    
    print('Got {} predicted bboxes.'.format(outputs['num_detections']))
    image_with_groundtruth_bbox = cloud_nas_utils.draw_boxes(example['image'].copy(), example['groundtruth_boxes'], example['groundtruth_classes'], [1.0] * len(example['groundtruth_classes']), max_boxes=10, min_score=0.01)
    image_with_prediction_bbox = cloud_nas_utils.draw_boxes(example['image'].copy(), outputs['detection_boxes'], outputs['detection_classes'], outputs['detection_scores'], max_boxes=10, min_score=0.01)

    _, ax = plt.subplots(1, 2, figsize=(100, 64))
    ax[0].imshow(image_with_groundtruth_bbox)
    ax[0].grid(False)
    ax[0].set_title('Groundtruth')
    ax[1].imshow(image_with_prediction_bbox)
    ax[1].grid(False)
    ax[1].set_title('Prediction')