# Chapter 5. Training Custom Object Detector

## 5.1 Two options for training

(1) Use a pre-trained model and then use transfer learning to learn a new object.

(2) Learn new objects from scratch.

The benefit of transfer learning is that training can be much quicker, and the required data that you might need is much less. For this reason, we're going to be doing transfer learning here.

## 5.2 TensorFlow pre-trained models

### 5.2.1 [Configuring jobs documentation](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md)

### 5.2.2 [Sample configurations](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs)

## 5.3 Train our object detector

We select the mobilenet model because it is fast and we intend to do the real-time object detection.

### 5.3.1 Download the configuration file and the checkpoint of mobilenet.

(1) Configuration file

```bash
$ mkdir training
$ cd training
$ wget https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/samples/configs/ssd_mobilenet_v1_pets.config
```

(2) Checkpoint

```bash
$ cd models/research/object_detection
$ wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz
$ tar -xvzf ssd_mobilenet_v1_coco_2017_11_17.tar.gz
```

### 5.3.2 Modify the configuration file.

(1) Search for all of the `PATH_TO_BE_CONFIGURED` points and change them.

(2) Modify batch size.

Currently, it is set to 24 in my configuration file. Other models may have different batch sizes. If you get a memory error, you can try to decrease the batch size to get the model to fit in your VRAM.

(3) Change the checkpoint name/path, num_examples to 22 (for airplane) or 12 (for macaroni), and label_map_path: "training/airplane-detection.pbtxt" or "training/macaroni-detection.pbtxt".

For airplane:

```
# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 10
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "ssd_mobilenet_v1_coco_2017_11_17/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "data/airplane_train.record"
  }
  label_map_path: "training/airplane-detection.pbtxt"
}

eval_config: {
  num_examples: 22
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/airplane_test.record"
  }
  label_map_path: "training/airplane-detection.pbtxt"
  shuffle: false
  num_readers: 1
}
```

For macaroni:

```
# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 10
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "ssd_mobilenet_v1_coco_2017_11_17/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "data/macaroni_train.record"
  }
  label_map_path: "training/macaroni-detection.pbtxt"
}

eval_config: {
  num_examples: 12
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/macaroni_test.record"
  }
  label_map_path: "training/macaroni-detection.pbtxt"
  shuffle: false
  num_readers: 1
}
```

### 5.3.3 Add `airplane-detection.pbtxt` and `macaroni-detection.pbtxt` in `training` dir.

(1) `airplane-detection.pbtxt`:

```
item {
  id: 1
  name: 'airplane'
}
```

(2) `macaroni-detection.pbtxt`:

```
item {
  id: 1
  name: 'macncheese'
}
```

## 5.4 Train our model

Note that 

* Before training our model, we need to copy `data` and `training` dirs to `models/research/object_detection`.
* **The training script won't automatically stop and it has to be stopped by `Ctrl+C`.** When we see that the final loss is around 1 or at least smaller than 2 (usually more than 10000 steps), we should use `Ctrl+C` to manually stop the training. 

```bash
# Add models/research and models/research/slim to the environment variable $PYTHONPATH
$ cd models/research
$ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

$ cd object_detection
$ python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_airplane.config
$ python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_macaroni.config 
```

To view the training progress, use `tensorboard` and expand the `TotalLoss` diagram at http://127.0.0.1:6006.

```bash
$ tensorboard --logdir='training'
```

## 5.5 Use PaperSpace to train our model

### 5.5.1 Register a PaperSpace account

https://www.paperspace.com/&R=1L988BL

You will immediately receive $10 credit after your first login.

### 5.5.2 Log into PaperSpace.

### 5.5.3 Create a machine.

(1) Select region as "WEST COAST (CA1)".

(2) Select OS as the public template "Ubuntu 16.04 ML-in-a-Box Desktop Edition (Beta)".

(3) Select machine as P4000 (hourly).

(4) Disselect "Auto Snapshot".

### 5.5.4 Launch the machine and set up the TensorFlow-GPU environment.

(1) Install Cuda 9.0 and reboot.

Cuda 8.0 is already installed on the machine but the latest version of TensorFlow-GPU requires Cuda 9.0. **Note that the Cuda version should be exactly 9.0 and Cuda 9.1 won't work with the latest version of TensorFlow-GPU (1.7.0).**

* Remove the Cuda package which conflicts with Cuda 9.0.

```bash
$ sudo dpkg --purge cuda-repo-ubuntu1404
```

* Install Cuda 9.0

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=debnetwork

```bash
$ sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo apt update
$ sudo apt install cuda=9.0.176-1
```

* Check Cuda version.

```bash
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
```

(2) Reboot and check GPU card info.

```bash
$ nvidia-smi
Wed Apr  4 22:45:49 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:00:05.0  On |                  N/A |
| 46%   33C    P8     8W / 105W |    330MiB /  8119MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2430      G   /usr/lib/xorg/Xorg                           152MiB |
|    0      2748      G   /usr/bin/gnome-shell                         120MiB |
|    0      3306      G   ...-token=D5964AC9D49D4EB49AE3F83AAB47DC22    44MiB |
+-----------------------------------------------------------------------------+
```

(3) Install cuDNN for Cuda 9.0.

https://developer.nvidia.com/rdp/cudnn-download
http://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#installlinux-deb

* Download the cuDNN deb files.

** Note that to work with TensorFlow 1.5, we have to install cuDNN 7.0.** 

* Install the runtime library.

```bash
$ sudo dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb
```

* Install the developer library.

```bash
$ sudo dpkg -i libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb
```

* Install the code samples and the cuDNN Library User Guide.

```bash
$ sudo dpkg -i libcudnn7-doc_7.0.5.15-1+cuda9.0_amd64.deb
```

(4) Verify the cuDNN installation.

* Copy the cuDNN sample to a writable path.

```bash
$ cp -r /usr/src/cudnn_samples_v7/ $HOME
```

* Go to the writable path.

```bash
$ cd $HOME/cudnn_samples_v7/mnistCUDNN
```

* Compile the mnistCUDNN sample.

```bash
$ make clean && make
```

* Run the mnistCUDNN sample.

```bash
$ ./mnistCUDNN
```

If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following:

```bash
Test passed!
```

(5) Check the cuDNN version.

```bash
$ cat /usr/include/x86_64-linux-gnu/cudnn_v7.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"
```

(6) Install TensorFlow-GPU.

```bash
$ pip install --upgrade pip

# Need to upgrade the numpy version from 0xb to 0xc.
$ pip install --upgrade numpy

# The version of installed TensorFlow is 1.4 which is outdated.
$ pip uninstall tensorflow

# The latest version of TensorFlow is 1.7 but we are having an issue 
# of "illegal instruction" on Ubuntu 16.04LTS. For details, please see 
# https://github.com/tensorflow/tensorflow/issues/17411
$ pip install tensorflow==1.5
$ pip install tensorflow-gpu==1.5
```

(7) Verify the TensorFlow-GPU installation.

```bash
$ python
>>> import tensorflow as tf
/home/paperspace/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
>>> print(tf.__version__)
1.5.0
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess =tf.Session()
2018-04-04 23:01:57.164099: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-04-04 23:01:57.251426: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-04 23:01:57.251656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:00:05.0
totalMemory: 7.93GiB freeMemory: 7.58GiB
2018-04-04 23:01:57.251683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Quadro P4000, pci bus id: 0000:00:05.0, compute capability: 6.1)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
```

### 5.5.5 Train our model

```bash
$ git clone https://github.com/renweizhukov/learning-ml.git

$ cd tensorflow/tensorflow-object-detection-api-tutorial/
$ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

$ cd object_detection
$ python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_airplane.config
$ python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_macaroni.config 
```