# Preparing Data for YOLO v3

While creating YOLO v3 model on custom dataset (Road damage), I followed [this](https://github.com/AntonMu/TrainYourOwnYOLO) github repository as boilerplate. This notebook is just to mimic the directory structure with the referenced repository.

In [None]:
# copying YOLOv3 directory from drive to colab environment
!cp -r '/content/drive/MyDrive/RDD/YOLO/YOLOv3' '/content/'

In [None]:
# importing important libraries
import os
import shutil
import glob
import pandas as pd
import argparse
import xml.etree.ElementTree as ET

In [None]:
# setting path
%cd /content/YOLOv3/

# creating some new empty folders
os.mkdir('Training/src')
os.mkdir('Data/Source_Images')
os.mkdir('Data/Source_Images/Training_Images')
os.mkdir('Data/Source_Images/Training_Images/files_to_train')
os.mkdir('Data/Source_Images/Test_Images')
os.mkdir('Data/Source_Images/Test_Image_Detection_Results')

/content/YOLOv3


In [None]:
# setting path
%cd /content/YOLOv3/Training/src/

# cloning helper github repository
!git clone https://github.com/qqwweee/keras-yolo3

/content/YOLOv3/Training/src
Cloning into 'keras-yolo3'...
remote: Enumerating objects: 144, done.[K
remote: Total 144 (delta 0), reused 0 (delta 0), pack-reused 144[K
Receiving objects: 100% (144/144), 151.08 KiB | 7.19 MiB/s, done.
Resolving deltas: 100% (65/65), done.


In [None]:
# setting path
%cd /content/YOLOv3/Training/src/

# renaming the folder name as per we want
os.rename('keras-yolo3', 'keras_yolo3')

/content/YOLOv3/Training/src


In [None]:
# setting path
%cd /content/YOLOv3/Training/src/keras_yolo3/

# removing two files from YOLOv3
os.remove('yolo3/model.py')
os.remove('yolo.py')

/content/YOLOv3/Training/src/keras_yolo3


Right after this step make sure to upload **model.py** and **yolo.py** file to the respective locations.

In [None]:
# copying data.tar.gz file to the colab environment
# I created this file earlier, which typically contain two folders, train(80%) and test(20%)
# both the folders contain images and xml files

!cp -r '/content/drive/MyDrive/RDD/data.tar.gz' '/content/'

In [None]:
# setting path
%cd /content/

# extracting tar.gz file
!tar -xvf  'data.tar.gz'

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
data/images/train/Japan_000829.xml
data/images/train/Japan_008427.xml
data/images/train/Japan_003047.jpg
data/images/train/Japan_012361.jpg
data/images/train/Japan_000291.jpg
data/images/train/India_007122.jpg
data/images/train/Japan_005137.jpg
data/images/train/Japan_001859.jpg
data/images/train/Japan_000577.xml
data/images/train/Japan_010791.jpg
data/images/train/Japan_000484.xml
data/images/train/Japan_000225.jpg
data/images/train/Japan_001364.jpg
data/images/train/Japan_006388.jpg
data/images/train/Japan_012196.xml
data/images/train/Czech_002165.xml
data/images/train/Japan_011783.jpg
data/images/train/Japan_012810.xml
data/images/train/India_006094.jpg
data/images/train/Japan_001065.jpg
data/images/train/Japan_010245.jpg
data/images/train/Czech_000995.xml
data/images/train/Japan_011435.xml
data/images/train/Japan_000889.jpg
data/images/train/Japan_006283.xml
data/images/train/Japan_006561.xml
data/images/train/Czech_0

In [None]:
# converting xml files bounding box data to csv format
def xml_to_csv(path):
    """Iterates through all .xml files (generated by labelImg) in a given directory and combines them in a single Pandas datagrame.
    Parameters:
    ----------
    path : {str}
        The path containing the .xml files
    Returns
    -------
    Pandas DataFrame
        The produced dataframe
    """
    xml_list = []
    for xml_file in glob.glob(path + "/*.xml"):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        size_root = root.find('size')
        for member in root.findall('object'):
            for cord in member.findall('bndbox'):
              xmin = int(cord[0].text)
              ymin = int(cord[1].text)
              xmax = int(cord[2].text)
              ymax = int(cord[3].text)
              value = (
                root.find("filename").text,
                        xmin,
                        ymin,
                        xmax,
                        ymax,
                        member[0].text,
            )
            xml_list.append(value)
    column_name = [
        "image",
        "xmin",
        "ymin",
        "xmax",
        "ymax",
        "label",
    ]
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df

In [None]:
# calling xml_to_csv
final_df = xml_to_csv('/content/data/images/train')
final_df.shape

(19999, 6)

In [None]:
final_df.head()

Unnamed: 0,image,xmin,ymin,xmax,ymax,label
0,Japan_011285.jpg,218,478,283,496,D10
1,India_009748.jpg,304,589,389,632,D40
2,India_009748.jpg,338,677,423,720,D40
3,India_009748.jpg,361,637,437,676,D40
4,Japan_012618.jpg,100,347,464,592,D20


In [None]:
final_df.isnull().sum()

image    0
xmin     0
ymin     0
xmax     0
ymax     0
label    0
dtype: int64

In [None]:
# calling xml_to_csv
final_df_test = xml_to_csv('/content/data/images/test')
final_df_test.shape

(5047, 6)

In [None]:
final_df_test.head()

Unnamed: 0,image,xmin,ymin,xmax,ymax,label
0,Japan_003482.jpg,218,392,379,598,D20
1,Japan_003482.jpg,44,379,117,427,D20
2,Japan_004480.jpg,221,329,416,597,D20
3,Japan_004480.jpg,142,346,194,387,D20
4,Czech_002182.jpg,187,453,216,520,D00


In [None]:
final_df_test.isnull().sum()

image    0
xmin     0
ymin     0
xmax     0
ymax     0
label    0
dtype: int64

In [None]:
# saving csv file
final_df.to_csv('Annotations-export.csv', index=False)
final_df_test.to_csv('Annotations-export-test.csv', index=False)

In [None]:
# copying the csv file to Training_Images/files_to_train
!cp -r '/content/Annotations-export.csv' '/content/YOLOv3/Data/Source_Images/Training_Images/files_to_train'
!cp -r '/content/Annotations-export-test.csv' '/content/YOLOv3/Data/Source_Images/Test_Images'

In [None]:
# setting path
%cd /content/data/images/

# copying all the training jpg files to Training_Images
for file in os.listdir('train'):
  if 'jpg' in file:
    shutil.copy('/content/data/images/train/' + file, '/content/YOLOv3/Data/Source_Images/Training_Images/' + file)

/content/data/images


In [None]:
# setting path
%cd /content/data/images/

# copying all the training jpg files to Training_Images/files_to_train too
for file in os.listdir('train'):
  if 'jpg' in file:
    shutil.copy('/content/data/images/train/' + file, '/content/YOLOv3/Data/Source_Images/Training_Images/files_to_train/' + file)

/content/data/images


In [None]:
# setting path
%cd /content/data/images/

# copying all the test jpg files to Test_Images
for file in os.listdir('test'):
  if 'jpg' in file:
    shutil.copy('/content/data/images/test/' + file, '/content/YOLOv3/Data/Source_Images/Test_Images/' + file)

/content/data/images


In [None]:
# creating tar file for YOLOv3 folder
import tarfile

def make_tarfile(output_filename, source_dir):
  # Reference : https://stackoverflow.com/questions/2032403/how-to-create-full-compressed-tar-file-using-python
  with tarfile.open(output_filename, "w:gz") as tar:
    tar.add(source_dir, arcname=os.path.basename(source_dir))

In [None]:
# setting path
%cd /content/

# making tar
make_tarfile('yolo_data.tar.gz', '/content/YOLOv3')

/content


In [None]:
# finally copying tar file to Road Damage Detection
!cp -r '/content/yolo_data.tar.gz' '/content/drive/MyDrive/RDD/YOLO'

Once **yolo_data.tar.gz** created. It's time to train YOLO v3 on Road damage data.