# Image Segmentation of Spiral Galaxies

This notebook provides a detailed insight into the process that goes into the segmentation of spiral arms from images of spiral galaxies. This notebook accompanies the final year research project I completed for my Masters Degree in Professional Engineering (Software) at University of WA. The dataset used for this notebook is adapted from images retrieved from the 2nd public data release from the HSC data archive system, which is operated by Subaru Telescope and Astronomy Data Center at National Astronomical Observatory of Japan and classification of spiral galaxies from this dataset was achieved by [Tadaki et al.](https://arxiv.org/pdf/2006.13544.pdf) and all image content is a product of their work. You can register to access the HSC Data [here](https://hsc-release.mtk.nao.ac.jp/doc/index.php/data-access__pdr3/).

## [1] Install Dependencies

TThe following code cells will install all the necessary requirements for all code cells. Here we will also run the imports for packages that are used commonly throughout the notebook. Any other required packages will be imported within the code cell that they are required.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras

## [2] Data Pre-Processing

This section covers the functionality used to process the data prior to it being used in training the segmentation model. These functions take the images from the dataset (in the format of `.jpg` and `.tif` files) to a data format that is usable by the Tensorflow Keras API. To achieve a concise pipeline, a Keras Sequence class is used to load and vectorize the data before training the model.

### Set Up Directory Paths & Global Variables

In [2]:
import os
from keras.utils import image_dataset_from_directory

images_dir = '/kaggle/input/galaxy-segmentation-set-1/images'
masks_dir = '/kaggle/input/galaxy-segmentation-set-1/masks'
img_size = (100, 100)
num_classes = 2
batch_size = 32

images_paths = sorted(
    [
        os.path.join(images_dir, fname)
        for fname in os.listdir(images_dir)
        if fname.endswith(".jpg")
    ]
)

masks_paths = sorted(
    [
        os.path.join(masks_dir, fname)
        for fname in os.listdir(masks_dir)
        if fname.endswith(".tif") and not fname.startswith(".")
    ]
)

images_dataset = image_dataset_from_directory(images_dir, labels=None, color_mode='grayscale', image_size=img_size)
# TODO: reupload mask files as PNG
#masks_dataset = image_dataset_from_directory(masks_dir, labels=None, color_mode='grayscale', image_size=img_size)

#dataset = Dataset.zip((images_dataset, masks_dataset))


Found 100 files belonging to 1 classes.


### Implement Data Augmentation Pipeline

Using in-built Keras preprocessing layers, build a pipeline that can be used to augment the images from the dataset.

### Construct Sequence Class

In [3]:
from tensorflow.keras.preprocessing.image import load_img

# TODO: Re-implement for new Dataset structure
class DataSequence(keras.utils.Sequence):
    """Sequence implementation to prepare data for training"""

    def __init__(self, batch_size, img_size, images_paths, masks_paths):
        self.batch_size = batch_size
        self.img_size = img_size
        self.images_paths = images_paths
        self.masks_paths = masks_paths

    def __len__(self):
        return len(self.masks_paths) // self.batch_size

    def __getitem__(self, idx):
        """Returns tuple (input, target) correspond to batch #idx."""
        i = idx * self.batch_size
        batch_images_paths = self.images_paths[i : i + self.batch_size]
        batch_masks_paths = self.masks_paths[i : i + self.batch_size]
        x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype="float32")
        for j, path in enumerate(batch_images_paths):
            img = load_img(path, target_size=self.img_size)
            x[j] = img
        y = np.zeros((self.batch_size,) + self.img_size + (1,), dtype="uint8")
        for j, path in enumerate(batch_masks_paths):
            img = load_img(path, target_size=self.img_size, color_mode="grayscale")
            y[j] = np.expand_dims(img, 2)
        return x, y

### Instantiate Training and Validation Sequences

## [3] Build Model Using Keras