# SIGtor: Supplementary Synthetic Image Generation for Object Detection and Segmentation Datasets

## Introduction
A significant challenge in deep learning tasks, particularly in classification, detection, and segmentation, is the requirement for extensive, well-balanced training datasets. This challenge is especially pronounced in detection and segmentation tasks, where creating large datasets is often a time-consuming, tedious, and error-prone process. Consequently, data augmentation has become essential in training deep learning models, enabling the expansion of small datasets through various morphological or geometrical transformations, applied either on-the-fly during the training process or offline.

This notebook demonstrates a method for artificially generating a theoretically unlimited number of supplementary datasets for object detection or segmentation from an existing dataset, regardless of its initial size. The algorithm presented here employs a simple yet robust copy-paste augmentation technique that effectively handles object overlap, dynamic placement on background images, and supports both object-level and image-wide augmentations. The generated synthetic images will include instance segmentation masks and tightly fitting bounding boxes. We refer to this system as SIGtor, which stands for Synthetic-Image-Generator.

## Introduction

A significant challenge in deep learning tasks, particularly in classification, detection, and segmentation, is the requirement for extensive, well-balanced training datasets. This challenge is especially pronounced in detection and segmentation tasks, where creating large datasets is often a time-consuming, tedious, and error-prone process. Consequently, data augmentation has become essential in training deep learning models, enabling the expansion of small datasets through various morphological or geometrical transformations, applied either on-the-fly during the training process or offline.

This notebook demonstrates a method for artificially generating a theoretically unlimited number of supplementary datasets for object detection or segmentation from an existing dataset, regardless of its initial size. The algorithm presented here employs a simple yet robust copy-paste augmentation technique that effectively handles object overlap, dynamic placement on background images, and supports both object-level and image-wide augmentations. The generated synthetic images will include instance segmentation masks and tightly fitting bounding boxes. We refer to this system as SIGtor, which stands for Synthetic-Image-Generator.

## Assumptions

<div style="margin-bottom: 20px;">
    <strong>Dataset Availability</strong><br>
    You have a dataset that you intend to extend using SIGtor. <strong>Note</strong>: SIGtor is an offline dataset generator, not an augmentation technique for use during the training of deep learning models. The generated images, with or without the original images, can be used to train a model.
</div>

<div style="margin-bottom: 20px;">
    <strong>Dataset Annotation</strong><br>
    Your dataset should be annotated in YOLO format, with bounding boxes specified as follows:
</div>

<div style="border: 2px solid #4CAF50; padding: 10px; background-color: #f9f9f9; font-family: monospace; font-size: 14px; margin-bottom: 20px;">
    ./Datasets/Source/Images/image1.jpg $x_1$,$y_1$,$x_2$,$y_2$,$A$ $x_1$,$y_1$,$x_2$,$y_2$,$B$ $x_1$,$y_1$,$x_2$,$y_2$,$C$ $x_1$,$y_1$,$x_2$,$y_2$,$D$
</div>

    
For this demo, I will use Pascal VOC or COCO Object Detection and Instance Segmentation Masks, downloaded from Kaggle or the COCO dataset site. Tools for converting either the Pascal VOC or COCO dataset into YOLO format can be found in the `tools` folder of this project.

While the project folder structure is flexible, it is recommended to organize your files as shown below:

<div style="border: 2px solid #007BFF; padding: 15px; background-color: #f0f8ff; font-size: 14px; border-radius: 5px; font-family: monospace; margin-bottom: 20px;">
SIGtor/<br>
├── Dataset/<br>
│   ├── Source/<br>
│   │   ├── images/<br>
│   │   ├── masks/<br>
│   │   └── source_annotations.txt<br>
│   ├── Background/<br>
│   └── Sigtored/<br>
│   │   ├── augmented_images/<br>
│   │   ├── augmented_masks/<br>
│   │   └── sigtored_annotations.txt<br>
├── tools/<br>
├── augmentations.py<br>
├── config.py<br>
├── data_processing.py<br>
├── data_utils.py<br>
├── demo.ipynb<br>
├── expand_annotations.py<br>
├── file_operations.py<br>
├── image_compositions.py<br>
├── image_processing.py<br>
├── index_generator.py<br>
├── License.txt<br>
├── requirements.py<br>
├── readme.md<br>
├── sigtor.py<br>
├── test_sigtor.py<br>
└── utils.py
</div>

<div style="margin-bottom: 20px;">
    <strong>Background Images</strong><br>
    Download some images from the internet to be used as background images and place them in the `Background` folder as shown above. While not mandatory, using realistic background images instead of plain backgrounds can enhance the quality of the generated synthetic images. You can automate the download process using tools such as <a href="https://github.com/hardikvasa/google-images-download">this one</a> or <a href="https://levelup.gitconnected.com/how-to-download-google-images-using-python-2021-82e69c637d59">this one</a>. Once downloaded, manually remove any background images that contain objects from your dataset classes, as including unannotated objects could confuse your model's loss functions.
</div>



## SIGtor: The Steps

The synthetic image generation process involves two main steps:

<ul>
<div style="margin-bottom: 20px;">
<li><strong>Step 1: Expand the Source Annotation.</strong></li>
Consider the example below where an image contains four annotated objects (A, B, C, and D):

<center><img src="./misc/example1.png" width="500" /></center>

The annotations might look like this:

<div style="border: 2px solid #4CAF50; padding: 10px; background-color: #f9f9f9; font-family: monospace; font-size: 14px;">
./Datasets/Source/Images/image1.jpg $x_1$,$y_1$,$x_2$,$y_2$,$A$ $x_1$,$y_1$,$x_2$,$y_2$,$B$ $x_1$,$y_1$,$x_2$,$y_2$,$C$ $x_1$,$y_1$,$x_2$,$y_2$,$D$
</div>

<i>(Note: A, B, C, and D represent the integer indices of the object classes, and the coordinates will differ based on the actual positions of the objects.)</i>

The `expand_annotation.py` script processes these annotations by automatically calculating the Intersection over Union (IoU) for each pair of objects. It then re-annotates the lines based on the following rules:

<div style="border: 2px solid #007BFF; padding: 15px; background-color: #f0f8ff; font-size: 14px; border-radius: 5px;">
    <ul style="margin-left: 20px;">
        <li><strong>Non-overlapping objects:</strong> Each non-overlapping object is assigned its own annotation line, as with object <b>D</b>.</li>
        <li><strong>Completely embedded objects:</strong> Objects that are entirely within the bounds of a larger object (e.g., <b>B</b> within <b>A</b>) also receive their own annotation line.</li>
        <li><strong>Overlapping objects:</strong> Larger objects that partially overlap with others (e.g., the relationship between <b>A</b> and <b>C</b>) or contain smaller objects within their coordinates (e.g., the relationship between <b>A</b> and <b>B</b>) are annotated together.</li>
    </ul>
</div>

After expansion, the original annotation line will be divided into at least three lines:

<div style="border: 2px solid #4CAF50; padding: 10px; background-color: #f9f9f9; font-family: monospace; font-size: 14px;">
./Datasets/Source/Images/image1.jpg $x_1$,$y_1$,$x_2$,$y_2$,$D$<br>    
./Datasets/Source/Images/image1.jpg $x_1$,$y_1$,$x_2$,$y_2$,$B$<br>
./Datasets/Source/Images/image1.jpg $x_1$,$y_1$,$x_2$,$y_2$,$A$ $x_1$,$y_1$,$x_2$,$y_2$,$B$ $x_1$,$y_1$,$x_2$,$y_2$,$C$
</div>

To complete this step, simply run the `expand_annotation.py` script with the appropriate command-line arguments, or without arguments if you have already configured the `sig_argument.txt` file with the correct inputs.
</div>
<br>

<div style="margin-top: 20px;">
<li><strong>Step 2: Generate the Artificial Images.</strong></li>

The next step involves generating synthetic images. The details of this process are illustrated in the GIF below:

<center><img src="./misc/SIGtor.gif" width="900"></center>

Sample SIGtored images and masks can be found in the project's `Datasets/SIGtored` folder. To generate new artificial images, clone this project and run `synthetic_image_generator.py` as is. If you want to work with your own dataset or other public datasets like COCO and VOC for your next object detection or segmentation training, edit the `sig_argument.txt` file accordingly and follow the two steps described above.
</div>


# Conclusion

This experimental project has demonstrated the effectiveness of using synthetic image generation to improve the performance of object detection models. By applying this approach, I was able to enhance the accuracy of YOLOv3, as well as my own versions, MultiGridDet and DenseYOLO (a lighter implementation of YOLOv2), gaining a few extra percentage points in accuracy compared to the original YOLO models. This experience has reinforced my belief that copy-paste augmentation is a powerful tool for training deep learning models.

However, there are important considerations to keep in mind:

<ol>
<li><strong>Class Balance:</strong> It is crucial to avoid over-representing certain object classes. Although I have incorporated mechanisms to under-sample over-represented classes like "Person" and "Car" to mitigate imbalance, users are encouraged to experiment with these features to suit their specific needs.</li>

<li><strong>Dataset Variability:</strong> Ensuring that the training dataset is not overly repetitive is vital to prevent overfitting. A diverse dataset will lead to more robust model performance.</li>
</ol>

Finally, I have not observed any significant negative impact on model performance due to potential artifacts from the SIGtored objects or the lack of perfect seamlessness in the pasted objects. While training the model for extended periods might increase the likelihood of the network recognizing these artifacts, this is generally true for any model trained extensively on a fixed dataset.
