# Offline Augmentation - Rotate

This script is for creating new images "offline" (i.e. as a pre-processing step) by performing rotation transformations. Automatic checks ensure that objects of interest will not overflow the boundaries of the image. Bounding boxes are always recalculated to ensure they are always tight around the object of interest. If the transformations lead to artefacts such as significant changes in the segmentation mask area then the new images will be rejected.

In [1]:
import os

import sys

sys.path.insert(1, '../..')

from data_processing.data_proc_lib.transforms import augment, rotate, random_rotate_90
from data_processing.data_proc_lib.utilities import combine, convert_to_yolo

## 1 Create a Rotation-Augmented Dataset from the Cleaned Dataset

**Prerequisites:** The dataset should already have been cleaned. See the notebook `clean.ipynb`.

Set the following constants below according to your environment.

* `NUM_AUGMENTATIONS`: The number of times each image in the input dataset is passed through the augmentation process. Intuitively, a larger `NUM_AUGMENTATIONS` number means a larger final dataset. Note that every augmented image goes through some automated quality checks (see the paper for details). These don't guarantee to find every possible problem but they can detect some issues with image artefacts. If any problems are detected then the augmented image is not saved. Hence, the final augmented dataset size will be *slightly less than* `NUM_AUGMENTATIONS` times the input dataset size.
* `CLEANED_PATH`: Path to the input data (for example, this will often be the location of the cleaned data output by notebook `clean.ipynb`).
* `OUTPUT_PATH`: Path to the location to store the augmented dataset created by running `augment` below.

Set the Albumentations transform required. By default, rotation-crop-scale-augmented dataset. This will be passed into the call to `augment` below. You could alternatively choose `rotate_object` to just augment with rotation

In [2]:
transform = rotate()

If you want to balance dataset across classes by oversampling the under-represented classes, add the argument `balance=True` to the call to `augment` below.

In [None]:
NUM_AUGMENTATIONS = 3
OUTPUT_PATH = "/mnt/data/rgn_ijcnn/augmented/rotated_x3"  # can be an absolute path, e.g. "/mnt/data/rgn-formal/augmented/rotate_crop_scale"
CLEANED_PATH = "/mnt/data/rgn_ijcnn/cleaned"  # can be an absolute path, e.g. "/mnt/data/rgn-formal/cleaned"
augment(NUM_AUGMENTATIONS, OUTPUT_PATH, CLEANED_PATH, transform, make_backgrounds=True, balance=True)

Note, you may see a warning when this runs:

`UserWarning: Affine could work incorrectly in ReplayMode for other input data because its' params depend on targets`

This is fine. It's just telling us that the Albumentations `ReplayCompose` function might work differently if try to repeat the same transform on different inputs. We are not doing that though, so this is not a problem for our use case.

When the execution completes successfully, you'll see a message saying `Completed!' along with a count of the number of background images (images containing no galaxies) that were created.

Now combine the cleaned original data with the augmented data:

In [None]:
combine(OUTPUT_PATH, CLEANED_PATH)

The data to use for training will now be in path `OUTPUT_PATH/combined`.

This can be converted to YOLO format. Set the `YOLO_PATH` to be the location to place the YOLO-formatted files. This should be an absolute path (not relative) so set it for your environment.

In [5]:
YOLO_PATH = "/mnt/data/rgn_ijcnn/yolo/yolo_rotated_x3"
OUTPUT_PATH = "/mnt/data/rgn_ijcnn/augmented/rotated_x3/combined"

In [None]:
convert_to_yolo(OUTPUT_PATH, YOLO_PATH)

The YOLO-formatted data will now be in directory `/mnt/data/rgntest/YOLO_test`.