# Turning the galaxy images and respective labels into arrays and save them in npy format

We're going to study the dataset Galaxy 10 DECals (https://astronn.readthedocs.io/en/latest/galaxy10.html), containing 17736 256x256 pixels colored galaxy images (g, r and z band) separated in 10 classes.

Galaxy10 dataset (17736 images):

├── Class 0 (1081 images): Disturbed Galaxies

├── Class 1 (1853 images): Merging Galaxies

├── Class 2 (2645 images): Round Smooth Galaxies

├── Class 3 (2027 images): In-between Round Smooth Galaxies

├── Class 4 ( 334 images): Cigar Shaped Smooth Galaxies

├── Class 5 (2043 images): Barred Spiral Galaxies

├── Class 6 (1829 images): Unbarred Tight Spiral Galaxies

├── Class 7 (2628 images): Unbarred Loose Spiral Galaxies

├── Class 8 (1423 images): Edge-on Galaxies without Bulge

└── Class 9 (1873 images): Edge-on Galaxies with Bulge

We're going to use libraries h5py and tensorflow to extract and save the images and labels from the dataset to the numpy format.

In [1]:
import h5py
import numpy as np
from tensorflow.keras import utils

# To get the images and labels from file
with h5py.File('Galaxy10_DECals.h5', 'r') as F:
    images = np.array(F['images'])
    labels = np.array(F['ans'])

# To convert the labels to categorical 10 classes
labels = utils.to_categorical(labels, 10)

# To convert to desirable type
labels = labels.astype(np.uint8)
images = images.astype(np.uint8)

# Save in numpy format
np.save('G10_DECals_images.npy',images)
np.save('G10_DECals_labels.npy',labels)