Skip to content

Computer Vision project to recognize galaxies; either spiral or not. Also, image binary classification with TensorFlow & Keras practice.

Notifications You must be signed in to change notification settings

isi-mube/cosmic-compendium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cosmic Compendium

Hubble Beholds a Big, Beautiful Blue Galaxy
Hubble Beholds a Big, Beautiful Blue Galaxy
NGC 2336 is the quintessential galaxy — big, beautiful, and blue — and it is captured here by the NASA/ESA Hubble Space Telescope.

  1. Python script

About the Project

The objective of this project is to do binary image classification with galaxies, either spiral galaxies or non-spiral. The images are provided from the Galaxy Zoo 2 project, a Hubble Space Telescope open-source dataset.

About the Hubble sequence

The Hubble sequence is a morphological classification for galaxies, published by Edwin Hubble in 1926, dividing regular galaxies into three main classes; ellipticals, lenticulars, and spirals.

Spiral galaxies, which are abundant in the universe, display a distinctive disk with spiraling arms and a gas and dust-rich central bulge. Studying spiral galaxies gives us a peek into the universe's past and helps us understand galaxy evolution and cosmology.

The Galaxy Zoo involved human volunteers for visual and pattern recognition through a decision-tree process, answering questions progressively about a galaxy's structure.

Project development:

  • 01/06/23: Data collection and defining the problem; image binary classification to detect either spiral galaxy or not.
  • 15/06/23: The script is nearly done.
    • Reduced the number of images used for the model training and testing to a subset of 1000.
  • 15/06/23 to 21/06/23: Fixing errors and cleaning the code.
  • 29/06/23:
    • Further cleaning of the code and bugs.
    • Got an 81% accuracy predicting unseen galaxies. Total number of epochs: 35.
    • Removed some data directories from GitHub for optimization.
  • 06/07/23: Finally got val_accuracy running (and not frozen) adapting Sabina's CNN structure in Glaucoma detection, I need to upgrade it further to get better scoring. Also:
    • 1.400 unique galaxies for the training subset and 600 unique galaxies for the validation subset.
    • Changed adam optimizer to adamax.
    • Added ImageDateGenerator parameters; horizontal flips, width and height shifts and zoom range to 0.2.
    • Augmented image size to 256x256 to get better resolution.
    • Created a cathartic playlist related to val_accuracy obsession to debug it.
  • 07/07/23: Adapted the final CNN structure to:
    • Input layer
    • 4 convolutional layers with 32, 64, 128 and 256 filters, followed by max pooling.
    • Flatten layer, converting 3D outputs to 1D vector.
    • 2 fully connected (dense) layers with 512 and 256 neurons.
    • An output layer with 1 neuron for binary classification

Further project development:

  • Develop a Streamlit app for more interactive model visualization.
  • Take a break, keep focusing on Python basics, and move on to image segmentation and multiclassification.

Model Results

Toolkit:

  • JupyterLab: Enviorment for Python scripts and managing files.

Libraries

  • Pandas: Data manipulation and analysis.
  • Numpy: Arrays and mathematical functions.
  • Os: File managment.
  • Warnings: Roses are red, violets are blue --> Warnings are annoying.
  • Matplotlib: Data visualization.
  • Seaborn: Runs on top of matplotlib, HD data visualization.
  • Shutil: File operations (copying, deleting...).
  • TensorFlow: Machine Learning for Computer Vision.
  • Keras: High-level neural networks API for Deep Learning, running on top of TensorFlow.
  • Sklearn: Machine Learning metrics.
  • PIL: Python Imaging Library to manipulate images.
  • Random: To generate random subsets.
  • ImageDataGenerator: To generate random data augmentation (flips, zoom...).

Bibliography:

  • Lintott, C. J. et al. (2008). Galaxy Zoo: Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, 389(3), 1179–1189.
  • Willett, K. W. et al. (2013). Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, 435(3), 2835–2860.
  • Chollet, F. (n.d.). Image Classification from Scratch. Keras. Retrieved from https://keras.io/examples/vision/image_classification_from_scratch/#introduction
  • Chollet, F. (n.d.). Keras Metrics. Keras. Retrieved from https://keras.io/api/metrics/
  • Nicholas Renotte. (n.d.). Build a Deep CNN Image Classifier with ANY Images. [Video]. YouTube. Available at: https://www.youtube.com/watch?v=jztwpsIzEGc

About

Computer Vision project to recognize galaxies; either spiral or not. Also, image binary classification with TensorFlow & Keras practice.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages