# Example of YOLO version 2 module.

This is a example usage of YOLO version 2 module.

**YOLO9000: Better, Faster, Stronger**  
Joseph Redmon, Ali Farhadi  
https://arxiv.org/abs/1612.08242

### Over view of yolo v2

First, introduces overview of YOLOv2. YOLO v2 is improved version of YOLO at detection speed, accuracy and number of avairable class.

Following fugure is Convolutional neural net work used in yolo v2. This architecture is called Darknet19.

Comparing to darknet which is used in YOLO v1, darknet19 has batch normalization layer after each convolution layer.
And fully connected layers(dense layers) is removed. This allows YOLOv2 to perform object detection with multi scale image.

In [1]:
import os
import time
import numpy as np
import renom as rm
from tqdm import tqdm
import matplotlib.pyplot as plt

from renom_img.api.detection.yolo_v2 import Yolov2, create_anchor
from renom_img.api.utility.distributor.distributor import ImageDistributor
from renom_img.api.utility.augmentation import Augmentation
from renom_img.api.utility.augmentation.process import *
from renom_img.api.utility.load import parse_xml_detection
from renom_img.api.utility.misc.display import draw_box

from renom.cuda import set_cuda_active
set_cuda_active(True)

## Data preparation

**The PASCAL Visual Object Classes Homepage**  
http://host.robots.ox.ac.uk/pascal/VOC/

In [2]:
if not os.path.exists("VOCdevkit/VOC2007"):
    !wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
    !tar xfv VOCtrainval_06-Nov-2007.tar
    
if not os.path.exists("VOCdevkit/VOC2012"):
    !wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
    !tar xfv VOCtrainval_11-May-2012.tar

## Devide data into train and validation set.

In [11]:
image_voc_2007 = "VOCdevkit/VOC2007/JPEGImages/"
label_voc_2007 = "VOCdevkit/VOC2007/Annotations/"
image_voc_2012 = "VOCdevkit/VOC2012/JPEGImages/"
label_voc_2012 = "VOCdevkit/VOC2012/Annotations/"

train_voc_2007 = [line.strip() for line in open("VOCdevkit/VOC2007/ImageSets/Main/train.txt").readlines()]
train_voc_2007 += [line.strip() for line in open("VOCdevkit/VOC2007/ImageSets/Main/val.txt").readlines()]
train_voc_2012 = [line.strip() for line in open("VOCdevkit/VOC2012/ImageSets/Main/train.txt").readlines()]
valid_voc_2012 = [line.strip() for line in open("VOCdevkit/VOC2012/ImageSets/Main/val.txt").readlines()]

train_image_path_list = []
train_label_path_list = []
valid_image_path_list = []
valid_label_path_list = []

# Use training dataset of VOC2007, VOC2012 and validation dataset of 2007 as training data.
for path in train_voc_2007:
    train_image_path_list.append(os.path.join(image_voc_2007, path+'.jpg'))
    train_label_path_list.append(os.path.join(label_voc_2007, path+'.xml'))

for path in train_voc_2012:
    train_image_path_list.append(os.path.join(image_voc_2012, path+'.jpg'))
    train_label_path_list.append(os.path.join(label_voc_2012, path+'.xml'))

# Use validation dataset of VOC2012 as validation data.
for path in valid_voc_2012:
    valid_image_path_list.append(os.path.join(image_voc_2012, path+'.jpg'))
    valid_label_path_list.append(os.path.join(label_voc_2012, path+'.xml'))

train_annot, class_map = parse_xml_detection(train_label_path_list)
valid_annot, _ = parse_xml_detection(valid_label_path_list)

print("Dataset size")
print("  Train:{}".format(len(train_annot)))
print("  Valid:{}\n".format(len(valid_annot)))

print("Class list")
for i, name in enumerate(class_map):
    print("  {:02d} {}".format(i, name))

Dataset size
  Train:10728
  Valid:5823

Class list
  00 aeroplane
  01 bicycle
  02 bird
  03 boat
  04 bottle
  05 bus
  06 car
  07 cat
  08 chair
  09 cow
  10 diningtable
  11 dog
  12 horse
  13 motorbike
  14 person
  15 pottedplant
  16 sheep
  17 sofa
  18 train
  19 tvmonitor


## Initialize Yolo v2 model.

ReNomIMG provides yolo v2 model. 
This module requires, following arguments.

- class_map (list): List of class name.
- anchor (AnchorYolov2): Anchor. Anchor can be created using "create_anchor" function.
- imsize (tuple): Image size. This is used for prediction.
- load_pretrained_weight (bool): If this is True, pretrained weight will be downloaded and loaded.
- train_whole_network (bool): If this is True, backpropagation will be performed through whole net work.


In [6]:
model = Yolov2(class_map=class_map,
               anchor=create_anchor(train_annot, base_size=(416, 416)),
               imsize=(32*10, 32*10),
               load_pretrained_weight=True,
               train_whole_network=True)

### Train YOLO v2 model using 'fit function'.

The model object has `fit` method. It allows us to train yolo2 only giving image data path list and annotation list.
Following arguments can be give to `fit` method.

- train_img_path_list (list): Image path list used for training.
- train_annotation_list (list): Annotation list used for training.
- valid_img_path_list (list): Image path list used for validation.
- valid_annotation_list (list): Annotation list used for validation.
- epoch (int): Number of training epoch.
- batch_size (int): Number of batch size.
- imsize_list (list): List of image size. Image size must be muplitples of 32.
- augmentation (Augmentation): Augmentation object.
- callback_end_epoch (function): Given function will be called at end of epoch.

Because of the fully convolutional architecture, YOLO v2 can be trained with multiple image size. Available image size is multiple of 32.
If `imsize_list` is given, a image size will be randomly selected per each 10 batch.

**Note**: Running following code with following parameters requires 11GB of GPU memory.

In [None]:
model.fit(
    # Feeds image and annotation data.
    train_image_path_list,
    train_label_path_list,
    valid_image_path_list,
    valid_label_path_list,
    epoch=8,
    batch_size=8,
    # Giving 11 variations of image size.
    imsize_list=[(32*i, 32*i) for i in range(9, 20)])

### Prediction

For using trained model, `model.predict` method can be used. This method requires following arguments.

- img_list (list, ndarray): Image path, list of image path or numpy array can be given.

If one image path is given, `predict` method returns following data. 
```python
[
    {  # 1st predicted object for input image path.
        "box":[x(float), y, w, h],  
        "score": confidencial_score(float),  
        "class": class_id(int),  
        "name": class_name(str)
    },
    {  # 2nd predicted object for input image path.
        "box":[x(float), y, w, h],  
        "score": confidencial_score(float),  
        "class": class_id(int),  
        "name": class_name(str)
    },
    ...
]  
```

If a list of image path or numpy array is given, `predict` method returns following data. 
```python
[
    [ # Predictions of 1st image.
        {  # 1st predicted object for 1st image path.
            "box":[x(float), y, w, h],  
            "score": confidencial_score(float),  
            "class": class_id(int),  
            "name": class_name(str),  
        },
        {  # 2nd predicted object for 1st image path.
            "box":[x, y, w, h],  
            "score": confidencial score(float),  
            "class": class_id(int),  
            "name": class_name(str),  
        },
    ],
    [ # Predictions of 2nd image.
        {  # 1st predicted object for 2nd image path.
            "box":[x(float), y, w, h],  
            "score": confidencial score(float),  
            "class": class_id(int),  
            "name": class_name(str),  
        },
        {  # 2nd predicted object for 2nd image path.
            "box":[x, y, w, h],  
            "score": confidencial score(float),  
            "class": class_id(int),  
            "name": class_name(str),  
        },
    ],
    ...
]
```

**Note**: The coordinate of box repesents ratio to the image size.
Therefore the range of predicted box coordinate is `0 <= x, y, w, h <= 1`.

If you want to change detection image size, you can set the attribute `model.imsize`.

ReNomIMG also provides draw bounding box function.
`renom_img.api.utility.misc.display.draw_box` can be used for show prediction result.
The function requires image path and prediction result.

In [None]:
# You can change the image size for prediction.
# model.imsize = (32*12, 32*12)
for i in range(40):
    path = valid_image_path_list[i]
    # Output of predict method can be given directly.
    plt.imshow(draw_box(path, model.predict(path)))
    plt.show()