In [4]:
import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

import pandas as pd

import PIL # install using pip install pillow

# Object Detection 

In previous course, we already have learned on how to make image classificatoin model using Tensorlow and Keras respectively. If we look the object detection task, it was a combination of **what** and **where**. In other words, to achieve object detection we also need to do classification and localization task. This graph below see the common roadmap of computer vision in past development. 

![](https://machinelearningmastery.com/wp-content/uploads/2019/05/Object-Recognition.png)

As its task contain of two subtask, you may guessed that the architecture will have **2 objective** function. The general and early Deep Learning architecture of object detection can be tracked on Region Based CNN (R-CNN), which learn how to create a box and classifiy what's inside it. 

The R-CNN Architecture consisted of three main components:
- Region Selector : Select and Extract region that might represents an object, called Region of Iterest (ROI)
- CNN : Process every warped region (why would every ROI need to be warped?)
- Classifier : Consisted of shallow dense network to classify the processed ROI 

![](https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2021/06/R-CNN-architecture.jpg?w=1392&ssl=1)

The architecture then evolving to become Fast R-CNN which faster than R-CNN because it combines the 1st and 2nd process of R-CNN model into one CNN model. Fast R-CNN receives an image and a set of RoIs and returns a list of bounding boxes and classes of the objects detected in the image like illustration below
![](https://machinelearningmastery.com/wp-content/uploads/2019/03/Summary-of-the-Fast-R-CNN-Model-Architecture.png)

Apart from the R-CNN model family, there are another family that are widely used in current years : 
- [YOLO model](https://www.section.io/engineering-education/introduction-to-yolo-algorithm-for-object-detection/)
- [SSD model](https://developers.arcgis.com/python/guide/how-ssd-works/)

References:
- [R-CNN paper](https://arxiv.org/pdf/1311.2524.pdf)
- [Fast R-CNN paper](https://arxiv.org/pdf/1504.08083.pdf)

___

# Making Object Detection Dataset

A difference between object detection task and classification can be seen from its training data. While classification dataset might only consisted of images inside a specific folder for its class, an object detection will also need additional information regarding what object exists on that specific images and which class it fall into. It's not necessary to organize each class to a folder since there might be multiple object or classes inside one image.

![](https://raw.githubusercontent.com/tzutalin/labelImg/master/demo/demo3.jpg)

## Create Bounding-box and label 
In this section we will transform our images inside `data/` repository and do hand-labeling to create object detection dataset. The whole process might be written as these simple flow : 

- **input** : Raw images 
- **process** : Manually hand-labeling with 3rd party apps
- **output**: Xml files contains box coordinates and its label inside each images


**Task 1: Install labelimg** \
In order to do the hand-labeling, we need a tool to help us label the image and create bounding box for each images. Basically you can use any tools, but for convenience we will use python app from https://github.com/tzutalin/labelImg. You can go to its github repo and see the installation guide, or you can just install using `pip install labelimg`.

Install it using your terminal, then try to run `labelimg`using terminal and you should see the app is running

**Task 2: Hand labeling**
1. Open the labelimg app then locate the image using "open dir" for training or test images
2. Make sure to use "Pascal/VOC" format. We will be using this format in this capstone
3. Create RectBox on every specific object and label it as you 
4. If you finished to do all the annotation, click save and name it with exact same filename in a specific folder of your own. It should automatically save the file with .xml extension. Below are the final folder structure should looks like.

        +image_folder
        | +train
        | | -img1.jpg
        | | -img1.xml
        | | -...
        | | -imgn.png
        | | -imgn.xml
        | +test
        | | -img1.jpg
        | | -img1.xml
        | | -...
        | | -imgn.png
        | | -imgn.xml

7. Select next image and repeat process 5-6 untill all images is annotated
5. For model configuration, create/edit `labelmap.pbtxt` and write down your class-mapping

## Convert XML to CSV file


In [9]:
def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


for folder in ['train', 'test']:
    image_path = os.path.join(os.getcwd(), ('data/' + folder))
    xml_df = xml_to_csv(image_path)
    xml_df.to_csv(('data/'+folder+'_labels.csv'), index=None)
print('Successfully converted xml to csv.')

Successfully converted xml to csv.


## Create TFRecord

We already provide a python file to generate the tfrecord, `generate_tfrecord.py`. Before you run it, please make sure that the `class_text_to_int` function has the correct label as you stated in `labelmap.pbtxt`. 

**Task : Adjust `class_text_to_int` function inside `generate_tfrecord.py`**. The function looks like this : 

In [10]:
def class_text_to_int(row_label):
    if row_label =='your_label_1':
        return 1
    elif row_label =='your_label_2':
        return 2
    else:
        return None

Then on your terminal, move into `generate_tfrecord.py` directory and run the following command to generate train.record:
```bash
python generate_tfrecord.py --csv_input=<path_to_train_labels.csv> --image_dir=<path_to_train_iamges> --output_path=train.record
```

And this one to generate test.record :
```bash
python generate_tfrecord.py --csv_input=<path_to_test_labels.csv> --image_dir=<path_to_test_images> --output_path=test.record
```

*notes: don't forget to adjust each <path> for csv input and image dir*


## Modeling

For the modeling part and further, we provide different notebook "object_detection_modeling.ipynb". We suggest to run it using GPU environment. Hence, we also provide the collab notebook 

Few thinngs to be considered before proceeding to modeling step is : 
- Make sure to have train.record and test.record
- Make sure to have labelmap.pbtxt with correct labels mapping