# Tensorflow Object Detection API Experiences on Windows

The Tensorflow object detection API(ODA) is one of the several ways to detect the objects in the images. It is built on the top of the TensorFlow that should make it easy to construct, train and deploy object detection models.

In this article, I am not going into the great details of the steps to train the model using the ODA but some of the challanges/errors I ran into and what can be the solutions for that. To understand the steps to train models following articles can be helpful.

https://www.kdnuggets.com/2018/02/building-toy-detector-tensorflow-object-detection-api.html 
https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_pets.md 
https://pythonprogramming.net/introduction-use-tensorflow-object-detection-api-tutorial/

## Challenge #1: Find a data set

In order to learn about ODA, the first decision is to make whether to use a dataset which is already labeled or create your own dataset. Based on the posts above, **first I decided to find a unlabeled images and label it by hand** to get an experience.

So I started looking into several datasets as listed below:

* CVOnline List of Dataset (http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm)
* UCI Data sets (https://archive.ics.uci.edu/ml/datasets/)
* CV Papers (http://www.cvpapers.com/datasets.html)

From the CVOnline, I've find out MIT CBCL Street Scenes data(http://cbcl.mit.edu/software-datasets/streetscenes/). 
After downloading that, I've started labeling them using the [LabelImg](https://tzutalin.github.io/labelImg/) as suggested in the tutorials. 

The LabelImg is a very handle tool and annotations are created in the Pascal VOC format. This is the format required by the Tensoflow ODA to convert them into the TfRecodrd format. I'll describe later what's TfRecord format. ![Create BBox Using LableImg](img/Creating_Bbox.PNG) 

Accroding to the tutorials above you need at least 250-300 annotated images for decent object detection. So when I started annotating this images, it started taking lot of time and due to the lack of enough time, I'd stopped this activity. **Now I decided to get the  dataset with the annotation**

When searching on the internet to get the the images with the Annotations, I found following dataset.

* COCO (http://cocodataset.org/)
* Google's Open Images dataset (https://storage.googleapis.com/openimages/web/index.html)
* Pascal VOC (http://host.robots.ox.ac.uk/pascal/VOC/)
* KITTI (http://www.cvlibs.net/datasets/kitti/)

Among all above, I **chose Pascal VOC**. Because it is a small dataset compare to all other. Also, it comes with the labeling format which the ODA can consume. 

## Challenge #2: Install the ODA.

The installation of the ODA is straightforward but it came up with couple of issues. I've installed on the Windows, Anaconda 2 and the Python 3.6.5.

Following are the steps at high level ([Official Steps](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md)):

1. Install Tensorflow by following [Tensorflow Install Instruction](https://www.tensorflow.org/install/)
2. Install following packages.
    + pip install --user Cython
    + pip install --user contextlib2
    + pip install --user pillow
    + pip install --user lxml
    + pip install --user jupyter
    + pip install --user matplotlib
3. Check out the [Tensoflow model repo](https://github.com/tensorflow/models)
4. Download the [ProtoBuf Compiler](https://github.com/protocolbuffers/protobuf/releases) based on the environment. For windows, it will come up with the .exe to compile the Protobufs.
    * Using the downloaded protoc.exe compile protos from in /models/research/object_detection/protos as below.
    * Run From /models/research/
    > protoc object_detection/protos/*.proto --python_out=.
5. Add libraries to PYTHONPATH
    > export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
    
**Issue #1:** In step 4, the  protoc command didn't work for all the files at a same time. So, created individual commands for each files in object_detection/protos as below:

    + protoc-3.6.1-win32\bin\protoc.exe object_detection\protos\argmax_matcher.proto --python_out=.
    + protoc-3.6.1-win32\bin\protoc.exe object_detection\protos\bipartite_matcher.proto --python_out=.
    + protoc-3.6.1-win32\bin\protoc.exe object_detection\protos\box_coder.proto --python_out=.

**Issue #2:** The **PYTHONPATH env var is MUST** as in step 5. If it is not set then it will generate errors when the training is started. The only addition of the /models/research/object_detection won't be suffice as the above tutorial suggest.

**Issue #3:** The official docs suggest that to use the COCO evaluation metrics copyt pycocotools dir into the /models/research/. But on Windows the `make` command won't work and merely copying the pycocotools will generate the error like **ImportError: No module named _mask**. So here, follow the instructions on [COCO API on Win with Py3](https://github.com/philferriere/cocoapi).  It will compile and install the library in the python.

## Challenge #3: Creating TFR datasets

In order to train the model on custom dataset the Tensorflow requires data in the Tensorflow Record(TFR) set. It is quite easy provided script. But before creating TFRecords, if we want to detect a perticular type of object from dataset then it needs to be filtered out. In my experiment, I chose **airplane** dataset from Pascal VOC.


#### Script to Filter Out airplane dataset.

In [None]:
import os
import shutil
import xml.etree.ElementTree as ET
import random

object='aeroplane' # Type of obj to determine
data_type='train' # Converting train or val data?
noise_perc=0.8 # Percentage of noise images out of more than 16000 total images.

img_set_list=r"C:\my_projects\tensorflow_obj_detection\object_detection\VOCdevkit\VOC2012\ImageSets\Main\\" + object +"_" + data_type + ".txt"
img_set_final_list=r"C:\my_projects\tensorflow_obj_detection\object_detection\VOCdevkit\VOC2012\ImageSets\Main\\" + object +"_only_" + data_type + ".txt"

img_set_file=open(img_set_list, 'r')
dest_img_set_file=open(img_set_final_list, 'w')
print(img_set_final_list)

all_imgs=img_set_file.readlines()
total_imgs=len(all_imgs)
total_noise_imgs=(total_imgs*noise_img_perc)/100
noise_imgs_cnt=0
final_imgs=[]
print(total_noise_imgs)

for img in all_imgs:
    t= img.split()
    name=t[0]
    is_obj=t[1] # Is the item related to class?
    if int(is_obj)==1:
        print(name)
        final_imgs.append(name+'\n')
    elif noise_imgs_cnt<total_noise_imgs:
        print("noise:%s"%(name))
        final_imgs.append(name+'\n')
        noise_imgs_cnt+=1


random.shuffle(final_imgs)
dest_img_set_file.write("".join(final_imgs))

Now it's time to convert filtered dataset to the TF record sets.


The Tensorflow provides several sample scripts in `models\research\object_detection\dataset_tools\create_*_tf_record.py`. It is provides the scripts for the well-known datasets like kitti, coco, google open image data set etc. 

In my experiements case, I used the create_pascal_tf_record.py. I'd set the flags as below for my datset.

In [None]:
FLAGS.set = 'val'
FLAGS.year='VOC2012'
FLAGS.data_dir= r"C:\my_projects\tensorflow_obj_detection\object_detection\VOCdevkit\VOC2012"
FLAGS.output_path=r"C:\my_projects\tensorflow_obj_detection\object_detection\tfrecords_voc2012\plane_val"

I ran the command as below for airplane data from Pascal VOC 2012 dataset. 

* For Training:
`C:\my_projects\tensorflow\models\research> python object_detection\dataset_tools\create_pascal_tf_record.py --data_dir=C:\\data\\VOCdevkit --year=VOC2012 --set=train --output_path=C:\\my_prj\\obj_detection\\data\\airplanes_train.record`

* For Val:
`C:\my_projects\tensorflow\models\research> python object_detection\dataset_tools\create_pascal_tf_record.py --data_dir=C:\\data\\VOCdevkit --year=VOC2012 --set=val --output_path=C:\\my_prj\\obj_detection\\data\\airplanes_val.record`

#### Class Info:

The Tensorflow requires class label info as below. In my case, I've only 1 class to detect so it will look like as below. I've save in the `C:\\my_prj\\obj_detection\\data\\my_class_label.pbtxt`

`
item {
  id: 1
  name: 'aeroplane'
}
`

For multiple classes, the above records should be comma separated. There're many examples available in `models\research\object_detection\data`

## Challenge 4: Model Training

Once the TFR datasets are created, then first you need to decide if you will use an existing model and fine tune it or build from scratch. It is advisable to use the pre-trained models as it can take less time for training and most of the features that are learnt by CNNs are often object agnostic. 

Tensorflow provides several pre-trained models. The list of models, speed and accuracy can be seen at [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md)

For my project, I've chose `ssd_mobilenet_v1_coco` model. In order use it, download the zip file. Unzip it and copy `model.ckpt.*` files in the `C:\\my_prj\\obj_detection\\data`

Now, the model training requires a config file where the model can be fine tuned and the paths of all the files above can be specified. The config files for all the models is available at `models\research\object_detection\samples\configs`. 

**Caution 1:** Choose the same model config file as the model chosen in the previous step. In my case I chose `ssd_mobilenet_v1_coco_my.config`.

Copy this file to `C:\\my_prj\\obj_detection\\data`. Now modify this file. Change the `PATH_TO_BE_CONFIGURED` at all the places. 

For example, under `train_input_reader` give the `input_path` to the `airplanes_train.record` and for the `label_map_path` provide path to the `.pbtext`. Similarly for the `eval_input_reader`. The path to `label_map_path` will be same as training.

Also, don't forget to change `num_classes`.

**Caution 2**: For `fine_tune_checkpoint`, DO NOT change the extension for the model.ckpt. Here, Provide the path to the file `model.ckpt.data-00000-of-00001`

**Caution 3**: Change the `num_steps` under `train_config`. By default it comes with 200000 steps.

In my example case, the dir path to the all of above will be `C:\\my_prj\\obj_detection\\data`.

Now in the directory `C:\\my_prj\\obj_detection\\data` should have following files:

+ ssd_mobilenet_v1_coco_my.config
+ aeroplane_train.record
+ aeroplane_val.record
+ my_class_label.pbtxt
+ model.ckpt.data-00000-of-00001
+ model.ckpt.index
+ model.ckpt.meta

Now,we can start the training with the following command:

> C:\my_projects\tensorflow\models\research>python object_detection/model_main.py 
--pipeline_config_path=C:\my_projects\tensorflow_obj_detection\object_detection\data\ssd_mobilenet_v1_coco_my.config --model_dir=C:\my_projects\tensorflow_obj_detection\object_detection\train_output \
--num_train_steps=100  --sample_1_of_n_eval_examples=1  --alsologtostderr

Here, `model_dir` is output of the training where the model checkpints and logs will be stored.

**Issue #1:** During the training, on the python3.6 you may run into the error as below. 

> TypeError: can't pickle dict_values objects [[Node: PyFunc_3 = PyFunc[Tin=[], Tout=[DT_FLOAT], token="pyfunc_5", _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

The fix for the issue will be find a code snippet as below in `models\research\object_detection\model_lib_orig.py` 

`# Eval metrics on a single example.
eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
eval_config, list(category_index.values()), eval_dict)`

Remove list casting of the `category_index.values()`. [Ref to the Solution](https://github.com/tensorflow/models/issues/4780)

Then restart the training again.

Once the training is started, start the tensorboard to monitor the progress. Run the following command on another command line.

> tensorboard --logdir=C:\my_projects\tensorflow_obj_detection\checkpoints

Open the printed URL on browser and check the progress. As the number of iterations increases, precision should go up and the loss should go down as figures in below. 

![Precision Progress over Itrations](img/precision_progress.PNG)

![Loss Reduction over Iterations](img/loss_progress.png)

Once the training completes, aeroplane should be detected as follow

![Plane 1](img/plane1.png)
![Plane 2](img/plane2.png)