### Task 1.2 Fine-tune to your data YOLOv2

This notebook illustrates the method followed in order to perform the fine-tune one of the most famous object Detectors in Deep Learning: YOLO. 
We decided to fine-tune YOLO, because it’s usually faster than the others, and it allows us to have more time to deal with developing inconveniences. 


## YOLOv2

We tried to adapt the provided code from github

https://github.com/experiencor/keras-yolo2

following these steps:


### 1 Data preparation:
Organizing the dataset into 4 folders 

train_image_folder <= the folder that contains the train images.

train_annot_folder <= the folder that contains the train annotations in VOC format.

valid_image_folder <= the folder that contains the validation images.

valid_annot_folder <= the folder that contains the validation annotations in VOC format.


#### How to convert our annotations in VOC format

In [None]:
import xml.etree.ElementTree as ET
import glob
from PIL import Image
from reading import read_annotations_file

In [None]:
groundtruth_xml_path = "C:\\Users\\sarac\\Desktop\\CLASE\\M6\\yolo_custom\\keras-yolo2\\m6-full_annotation.xml"
video_path = "C:\\Users\\sarac\\Desktop\\CLASE\\M6\\yolo_custom\\keras-yolo2\\vdo.avi"
trainpath = "C:\\Users\\sarac\\Desktop\\CLASE\\M6\\yolo_custom\\keras-yolo2\\cars\\images\\val\\*.jpg"

In [None]:
for n,file in enumerate(glob.glob(trainpath)):
    print(n)
    annotation = ET.Element('annotation')
    annotation.text = '\n'    
    folder = ET.SubElement(annotation, 'folder')
    folder.text = 'images'
    folder.tail = '\n'  # empty line after the celldata element
    filename = ET.SubElement(annotation, 'filename')
    filename.text = '{}'.format(os.path.basename(file))
    filename.tail = '\n'
    path = ET.SubElement(annotation, 'path')
    path.text = '{}'.format(file)
    path.tail = '\n'
    source = ET.SubElement(annotation, 'source')
    source.text = '\n'
    source.tail = '\n'
    database = ET.SubElement(source, 'database')
    database.text = 'Unknown'
    database.tail = '\n'

    print(file)
    im = Image.open(file)
    w, h = im.size
    size = ET.SubElement(annotation, 'size')
    size.text = '\n'
    size.tail = '\n'
    width = ET.SubElement(size, 'width')
    width.text = '{}'.format(w)
    width.tail = '\n'
    height = ET.SubElement(size, 'height')
    height.text = '{}'.format(h)
    height.tail = '\n'
    depth = ET.SubElement(size, 'depth')
    depth.text = '3'
    depth.tail = '\n'
    segmented = ET.SubElement(annotation, 'segmented')
    segmented.text = '0'
    segmented.tail = '\n'

    objs = [x for x in groundtruth_list if x.frame==n+535]
    for obj in objs:
        object = ET.SubElement(annotation, 'object')
        object.text = '\n'
        object.tail = '\n'
        name = ET.SubElement(object, 'name')
        name.text = 'car'
        name.tail = '\n'
        pose = ET.SubElement(object, 'pose')
        pose.text = 'Unspecified'
        pose.tail = '\n'
        truncated = ET.SubElement(object, 'truncated')
        truncated.text = '0'
        truncated.tail = '\n'
        difficult = ET.SubElement(object, 'difficult')
        difficult.text = '0'
        difficult.tail = '\n'
        bndbox = ET.SubElement(object, 'bndbox')
        bndbox.text = '\n'
        bndbox.tail = '\n'
        xmin = ET.SubElement(bndbox, 'xmin')
        xmin.text = '{}'.format(obj.xtl)
        xmin.tail = '\n'
        ymin = ET.SubElement(bndbox, 'ymin')
        ymin.text = '{}'.format(obj.ytl)
        ymin.tail = '\n'
        xmax = ET.SubElement(bndbox, 'xmax')
        xmax.text = '{}'.format(obj.width)
        xmax.tail = '\n'
        ymax = ET.SubElement(bndbox, 'ymax')
        ymax.text = '{}'.format(obj.height)
        ymax.tail = '\n'
        
    tree = ET.ElementTree(annotation)
    tree.write("xmllabels\\{}.xml".format(os.path.splitext(os.path.basename(file))[0],
               encoding='utf-8', xml_declaration=True))

In [3]:
# Example of VOC format annotations.

<annotation>
    <folder>images</folder>
    <filename>frame_0001.jpg</filename>
    <path>C:\Users\sarac\Desktop\CLASE\M6\yolo_custom\keras-yolo2\cars\images\train\frame_0001.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>1920</width>
        <height>1080</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>car</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>558</xmin>
            <ymin>94</ymin>
            <xmax>663</xmax>
            <ymax>169</ymax>
        </bndbox>
    </object>
    <object>
        <name>car</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>1285</xmin>
            <ymin>363</ymin>
            <xmax>1516</xmax>
            <ymax>546</ymax>
        </bndbox>
    </object>
    <object>
        <name>car</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>931</xmin>
            <ymin>78</ymin>
            <xmax>1013</xmax>
            <ymax>146</ymax>
        </bndbox>
    </object>
</annotation>

SyntaxError: invalid syntax (<ipython-input-3-9bb47ea802b8>, line 3)

### 2- Edit the configuration file
Tiny Yolo using weights from mask-RCNN trained with COCO, change input size, paths to folders..


{
    "model" : {
        "backend":              "Tiny Yolo",
        "input_size":           480,
        "anchors":              [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828],
        "max_box_per_image":    10,
        "labels":               ["car"]
    },

    "train": {
        "train_image_folder":   "/home/grupo04/m6/yolo_custom/keras-yolo2/dataset/train_image_folder/",
        "train_annot_folder":   "/home/grupo04/m6/yolo_custom/keras-yolo2/dataset/train_annot_folder/",

        "train_times":          5,
        "pretrained_weights":   "",
        "batch_size":           8,
        "learning_rate":        1e-4,
        "nb_epochs":            1,
        "warmup_epochs":        2,

        "object_scale":         3.0 ,
        "no_object_scale":      1.0,

        "coord_scale":          1.0,
        "class_scale":          1.0,

        "saved_weights_name":   "yolo-tiny-small-in.h5",
        "debug":                true
    },

    "valid": {
        "valid_image_folder":    "/home/grupo04/m6/yolo_custom/keras-yolo2/dataset/valid_image_folder/",
        "valid_annot_folder":    "/home/grupo04/m6/yolo_custom/keras-yolo2/dataset/valid_annot_folder/",

        "valid_times":          1
    }
}


### 3. Generate anchors for our dataset: 
Run the following and copy the generated anchors printed on the terminal to the anchors setting in configuration file
Start the training process


In [4]:
python gen_anchors.py -c config.json 

SyntaxError: invalid syntax (<ipython-input-4-189172816f16>, line 1)

### 4 - Start the training process


In [None]:
python train.py -c config.json

In [None]:
Problems
We found some problems training this network, because input size in this case is so big (1920x1080). We didn’t find a way to reduce the input size without compromising the framework development. Therefore the resulting network was 1) too heavy and we’d been force to have a batch size of 8 (if not, we exceeded the memory limit of the server). Big sizes and slow batch size, result in a 2) very slow training. We have to reduce the number of epochs until 3 and it last more than 6 hours.

### 5. Results
#### Problems
We found some problems training this network, because input size in this case is so big (1920x1080). We didn’t find a way to reduce the input size without compromising the framework development. Therefore the resulting network was 1) too heavy and we’d been force to have a batch size of 8 (if not, we exceeded the memory limit of the server). Big sizes and slow batch size, result in a 2) very slow training. We have to reduce the number of epochs until 3 and it last more than 6 hours.

#### Results - Improvements
With these premises, we can not achieve satisfactory results and for the next we will try to implement our own code instead of using external frameworks

##### Epoch 00000: val_loss improved from inf to 10.00356, saving model to yolo-tiny-small-in.h5
4308s - loss: 10.0414 - val_loss: 10.0036

##### Epoch 00001: val_loss improved from 10.00356 to 10.00269, saving model to yolo-tiny-small-in.h5
3522s - loss: 10.0029 - val_loss: 10.0027

##### Epoch 00002: val_loss improved from 10.00269 to 0.05315, saving model to yolo-tiny-small-in.h5
4241s - loss: 0.0486 - val_loss: 0.0531

##### Final mAP
mAP: 0.3030