# Offline Augmentation - Make Background Images

This script is for creating new images "offline" (i.e. as a pre-processing step) by performing rotate, crop and scale transformations. Automatic checks ensure that objects of interest will not overflow the boundaries of the image. Bounding boxes are always recalculated to ensure they are always tight around the object of interest. If the transformations lead to artefacts such as significant changes in the segmentation mask area then the new images will be rejected.

In [1]:
import albumentations as A
from PIL import Image
import cv2
import json
import numpy as np
import os
import random
from pycocotools.coco import COCO

import sys

sys.path.insert(1, '../..')

from data_processing.data_proc_lib.transforms import do_background_transform
from data_processing.data_proc_lib.cleaning import find_invalid_annotations
from data_processing.data_proc_lib.utilities import create_directories

In [2]:
def make_backgrounds(output_path, input_path):
    """
    Make background images.
    :param output_path: Path to the place the augmented output files.
    :param input_path: Path to the input files to augment.
    :return: None.
    """

    create_directories(output_path)
    output_image_path = output_path
    train_annos = os.path.join(input_path, "annotations", "train.json")

    coco = COCO(train_annos)
    image_ids = coco.getImgIds()
    exclude = list(find_invalid_annotations(coco))

    for image_id in [i for i in image_ids if i not in exclude]:

        image_file_name = coco.loadImgs(image_id)[0]["file_name"]
        image_path = os.path.join(input_path, "train", image_file_name)
        # print(f"image file name: {image_file_name}")

        aug_file_name = (
                "bg" + "_" + image_file_name
        )
        output_image_file_path = os.path.join(output_image_path, aug_file_name)

        # Read the image and associated annotations
        image = np.array(Image.open(image_path))
        annotation_ids = coco.getAnnIds(imgIds=image_id)
        annotations = coco.loadAnns(annotation_ids)

        # In case a transform has shifted all galaxies out of an image and no annotations remain.
        if len(annotations) == 0:
            continue

        if len(annotations) == 1:
            image = do_background_transform(annotations, image, coco)
            if not np.any(image):
                continue
        else:
            print("More than one annotation! Rejected.")
            continue

        # Write the transformed image to a png file.
        Image.fromarray(image).save(output_image_file_path)

    print(f"Completed! Background images in {output_path}.")

This is the Albumentations transform used to create the rotation-augmented dataset. This will be passed into the call to `augment` below.

## 1 Create Backgrounds from the Cleaned Dataset

**Prerequisites:** The dataset should already have been cleaned. See the notebook `clean.ipynb`.

Set the following constants below according to your environment.

* `NUM_AUGMENTATIONS`: The number of times each image in the input dataset is passed through the augmentation process. Intuitively, a larger `NUM_AUGMENTATIONS` number means a larger final dataset. Note that every augmented image goes through some automated quality checks (see the paper for details). These don't guarantee to find every possible problem but they can detect some issues with image artefacts. If any problems are detected then the augmented image is not saved. Hence, the final augmented dataset size will be *slightly less than* `NUM_AUGMENTATIONS` times the input dataset size.
* `CLEANED_PATH`: Full path to the input data (for example, this will often be the location of the cleaned data output by notebook `clean.ipynb`).
* `OUTPUT_PATH`: Full path to the location to store the augmented dataset created by running `augment` below.

In [None]:
OUTPUT_PATH = "/mnt/data/rgn_ijcnn/augmented/backgrounds"
CLEANED_PATH = "/mnt/data/rgn_ijcnn/cleaned"
make_backgrounds(OUTPUT_PATH, CLEANED_PATH)