# Adding Synthetic/Generated Galaxies to Training Images

In [1]:
import os
import sys

sys.path.insert(1, "../..")

from data_processing.data_proc_lib.transforms import augment
from data_processing.data_proc_lib.cut_and_paste import paste
from data_processing.data_proc_lib.utilities import combine, convert_to_yolo, create_directories

## 1 Adding a Single Galaxy to Training Images

These are the instructions to add a single synthetic galaxy to training images.

**Prerequisites:** Cutouts have been created and synthetic galaxies have been created (see create_synthetic_galaxies.ipynb).

Set `CLEANED_PATH` to be the directory containing the cleaned data.

Set `OUTPUT_PATH` to be the directory to place the new augmented images with a cutout pasted into them.

set `CUTOUTS_PATH` to be the location of the transformed galaxy cutouts. This will be the *same* as the `CUTOUTS_PATH` used in create_synthetic_galaxies.ipynb. This directory contains the separate directories named 0-3, each one representing a galaxy category. Each of these directories will contain the `annotations` directory containing the annotation files for the cutouts and a directory called `transformed_galaxy_cutouts` containing the transformed galaxy cutout images.

Set `IMAGE_DIR` to be the name of the directory (not a full path) holding the image files to paste into those in the `INPUT_PATH`. This will usually be `transformed_galaxy_cutouts` or `train`.

In [2]:
CLEANED_PATH = "/mnt/data/rgn_ijcnn/cleaned"
INPUT_PATH = CLEANED_PATH
#INPUT_PATH = "/mnt/data/rgn_ijcnn/augmented/1_fda_galaxy/combined"
OUTPUT_PATH = "/mnt/data/rgn_ijcnn/augmented/1_vae_galaxy"
CUTOUTS_PATH = "/mnt/data/rgn_ijcnn/augmented/vae_by_class"
IMAGE_DIR = "train"
create_directories(OUTPUT_PATH)

In [None]:
paste(INPUT_PATH, OUTPUT_PATH, CUTOUTS_PATH, IMAGE_DIR)

The `OUTPUT_PATH` will now contain training images and annotations augmented with one additional synthetic galaxy pasted in to each image.

## 2 Combine with the Cleaned Data

After adding galaxies we need to recombine the augmented data (with multiple galaxies) with the unaugmented, cleaned training data. The following step does this. The combined out put will be placed in directory `OUTPUT_PATH/combined`.

Note: This step uses `OUTPUT_PATH` set in the previous parts of the notebook, so if you have created images with +1 galaxy, and +2 galaxies, and +3 galaxies etc, the following step will only perform the `combine` operation on the last one used (i.e. the current value of `OUTPUT_PATH`). If you want to repeat this for the others, just change the `OUTPUT PATH` to point to the appropriate directory.

In [None]:
combine(OUTPUT_PATH, CLEANED_PATH)

## 3 Adding A Second Galaxy to Training Images

These are the instructions to add further synthetic galaxies to training images. It works by "chaining" the copy-paste process described in step 1 above.

**Prerequisite:** First add one synthetic galaxy to the training images using the steps in section 1. Then combine the output with the cleaned dataset.

Paths to set:
1. Set `INPUT_PATH` to the path `OUTPUT_PATH/combined` which was the output of step 2. (In effect, the images augmented with one galaxy in step 1, plus the cleaned data it was combined with in step 2 become the input to this step.)
2. Set `OUTPUT_PATH` to a *new* path where you want to put the training images and annotations augmented with two synthetic galaxies.

Here is an example of suitable settings (modify these for the paths you want to use).

In [5]:
INPUT_PATH = "/mnt/data/rgn_ijcnn/augmented/1_vae_galaxy/combined"
OUTPUT_PATH = "/mnt/data/rgn_ijcnn/augmented/2_vae_galaxies"
CUTOUTS_PATH = "/mnt/data/rgn_ijcnn/augmented/vae_by_class"
create_directories(OUTPUT_PATH)

Now run the copy-paste operation. After this, `OUTPUT_PATH` will contain training images and annotations augmented with two galaxies.

In [None]:
paste(INPUT_PATH, OUTPUT_PATH, CUTOUTS_PATH, IMAGE_DIR)

## 4 Combine with the Cleaned Data

After adding galaxies we need to recombine the augmented data (with multiple galaxies) with the unaugmented, cleaned training data. The following step does this. The combined out put will be placed in directory `OUTPUT_PATH/combined`.

Note: This step uses `OUTPUT_PATH` set in the previous parts of the notebook, so if you have created images with +1 galaxy, and +2 galaxies, and +3 galaxies etc, the following step will only perform the `combine` operation on the last one used (i.e. the current value of `OUTPUT_PATH`). If you want to repeat this for the others, just change the `OUTPUT PATH` to point to the appropriate directory.

In [None]:
combine(OUTPUT_PATH, CLEANED_PATH)

## 5 Adding Even More Galaxies

Of course, it's possible to paste in as many galaxies as you want by repeating the steps in step 2 again. Reset the paths (the `INPUT_PATH` is set to the `OUTPUT_PATH` of the previous step, as in section 2 above) and rerun the commands. It probably only makes sense to do this a few times, because if too many galaxies are added the image will get very crowded and it will be hard to add more galaxies without their masks overlapping. We don't allow masks to overlap, so if this happens then the image is automatically excluded from training. Nevertheless, it's perfectly possible to add a few galaxies to each training image.

## 6 Add Background Images

Background images are images with no objects of interest (galaxies) and no annotations. These can be made using the steps in notebook generate_backgrounds.ipynb. Having made these, we now copy a random selection of these into the directory containing the training images. We want the number to be about 5% of the total number of training images.

In [None]:
# Count the number of files in the output directory.
!ls /mnt/data/rgn_ijcnn/augmented/2_vae_galaxies/combined/train | wc -l

In [None]:
# Copy in a random selection of backgrounds. The argument -zn250 specifies the number of images, so change the 250 to be about 5% of the total number of training images.
!cd /mnt/data/rgn_ijcnn/augmented/backgrounds && shuf -zn275 -e *.png | xargs -0 cp -vt {os.path.join(OUTPUT_PATH, "combined", "train")}

## 7 Convert to YOLO Format

Finally, if you want to use the data to train YOLO, we need to convert it to the YOLO format. Set the `YOLO_PATH` below, to be the location to place the YOLO-format files. This should be an absolute path (not relative).

In [10]:
YOLO_PATH = "/mnt/data/rgn_ijcnn/yolo/yolo_2_vae_galaxies"

In [None]:
convert_to_yolo(os.path.join(OUTPUT_PATH, "combined"), YOLO_PATH)