Skip to content

This project contains a TensorFlow-based Denoising Autoencoder (DAE) to exact model galaxy from observed images and perform galaxy model subtraction.

License

Notifications You must be signed in to change notification settings

rongrong00/galaxy-denoising-autoencoder

Repository files navigation

Galaxy Denoising Autoencoder

This project contains a TensorFlow-based Denoising Autoencoder (DAE) to exact model galaxy from observed galaxy images and perform galaxy model subtraction.

Clone the Repository

Clone this repository to your local machine or cloud service using the following command:

git clone https://github.com/rongrong00/galaxy-denoising-autoencoder.git
cd galaxy-denoising-autoencoder

Prerequisites

Before running the scripts, ensure you have the following installed:

  • Python 3.7 or later
  • TensorFlow 2.x
  • NumPy
  • Astropy

You can install the necessary Python packages using pip:

pip install -r requirements.txt

File Structure

  • dae_model.py: Contains the DAE class defining the autoencoder model.
  • fits_data_generator.py: Manages the loading and preprocessing of FITS data in batches for training the model.
  • train.py: The main script used to run the training process with settings specified in config.json.
  • test.py: Script to evaluate the trained model on a set of test data, reporting performance metrics such as loss and accuracy.
  • gen_gal_input.py: Script to generate input parameter files for GALFIT, allowing for customized simulation of galaxy images based on user-defined parameters.
  • run_galfit.py: Executes GALFIT using the parameter files generated by gen_gal_input.py, producing simulated galaxy images.
  • add_background.py: Adds a sky background to the simulated galaxy images, preparing them for the denoising task.
  • train_config.json: Holds all configuration settings for the model and the training process, including paths, model parameters, and training settings.
  • test_config.json: Holds all configuration settings for the testing process, including paths to the test data, output directory, and path to the pretrained model.

To provide a comprehensive overview of the dataset generation process used in your project, it's important to detail the steps and scripts involved clearly in the README.md file. Below, I've expanded the README.md content to include a thorough explanation of dataset generation, which will help users understand how to prepare the datasets required for both training and testing the Denoising Autoencoder.


Dataset Generation

This step helps users generate simulated galaxy images to train the Denoising Autoencoder. It involves generating simulated galaxy images using GALFIT and adding a sky background to these images. The pair of data for simulated galaxy image and galaxy image with sky background will be used as training data.

Make sure that GALFIT is correctly installed and configured on your system before starting this step. The paths to the parameter files and output directories for simulated galaxy files can be changed within these scripts.

Step 1: Generate Simulated Galaxy Files with GALFIT

To generate simulated galaxy images, we use two scripts: gen_gal_input.py and run_galfit.py.

  • gen_gal_input.py: This script generates input parameter files for GALFIT. It defines the parameters of the galaxies to be simulated, such as size, shape, brightness, and other astrophysical characteristics. Modify this script to change the types of galaxies generated.

    Run the script using:

    python3 gen_gal_input.py
  • run_galfit.py: This script takes the parameter files generated by gen_gal_input.py and runs GALFIT to produce the galaxy images.

    Execute the script with:

    python3 run_galfit.py

Step 2: Add Sky Background

After generating the simulated galaxy images, the next step is to add a realistic sky background. This is done using the add_background.py script.

  • add_background.py: This script adds a clean sky background to the simulated galaxy images. The backgrounds should be free of large galaxies, bright stars, or imaging artifacts.

Before running this script, You need to prepare a set of clean sky background images. These images can be obtained from observational data or generated using similar simulation techniques.

Run this script as follows:

python3 add_background.py

Output

The final output of these steps will be a set of images ready for use in training and testing the autoencoder. These images will include both noisy (with sky background) and clean (original simulated galaxies) versions.

Training

Hardware Requirements

For reference, training the default model on Google Colab with an L4 GPU takes approximately 3 hours to complete. If the number of parameter increases with your custom configuration, the training may take more computing resources. Please make sure you have access to a GPU device when training.

Train Configuration

The training behavior can be configured through a JSON file (train_config.json). This configuration file allows you to specify paths, training parameters, model architecture details, and more. Here is a breakdown of the configurable parameters in training:

  • noisy_dir: Directory containing images of galaxy injected into sky background.
  • clean_dir: Directory containing clean simulated galaxy images.
  • total_data_count: Total number of images available in the dataset.
  • pretrained_model_path: Path to pre-trained model, leave blank if training from scratch.
  • input_shape: Shape of the input images.
  • batch_size: Number of images per batch.
  • n_epochs: Number of training epochs.
  • initial_learning_rate: Initial learning rate for the optimizer.
  • checkpoint_dir: Directory to save model checkpoints.
  • val_split: Fraction of the data for validation.
  • use_early_stopping: Boolean to enable/disable early stopping.
  • early_stopping_patience: Number of epochs with no improvement after which training will be stopped.
  • save_intermediate_models: Boolean to enable/disable saving model checkpoints.
  • save_freq: Frequency to save the model checkpoints.
  • encoder_filters, decoder_filters: Number of filters for each convolutional layer in the encoder and decoder.
  • encoder_kernel_sizes, decoder_kernel_sizes: Kernel sizes for each convolutional layer in the encoder and decoder.
  • pooling_type: Type of pooling layer ('max' or 'average').
  • pooling_size: Size of the pooling window.
  • activation: Activation function to be used ('relu', 'sigmoid', etc.).
  • alpha: Alpha value for LeakyReLU; if not using LeakyReLU, this can be set to null.

The example train_config.json file contains parameters needed to train a denoising autoencoder model on 512*512 image data. 'noisy_dir', 'clean_dir', 'total_data_count', 'input_shapes' need to be modified for identifying your training dataset. The other parameters can either be left as-is or modified to tune the model to fit your needs. If you want to start training with a pre-trained model or continue training from a model checkpoint, put it in the "pretrained_model_path" configuration.

Running the Training Script

To run the training script, use the following command:

python train.py --config path/to/train_config.json

Testing

Test Configuration

The test_config.json file contains settings that are specifically used during model testing. This is where the model can be used to produce results on real galaxy data. Here is a breakdown of configurable parameters in testing:

  • model_path: Specifies the path to the trained model file. (a .h5 file containing the weights of your trained model)

  • test_data_dir: The directory containing the test dataset in FITS format. The script assumes all files in this directory are galaxy images with the sky background.

  • output_dir: Directory where the denoised images produced by the model will be saved.

  • residual_dir: Directory where the residual images (original - denoised) will be saved.

Before testing, modify the configuration file to contain all the path parameters needed in testing, including the path to the pre-trained model file.

Running the Testing Script

To run the testing script, use the following command:

python test.py --config path/to/test_config.json

About

This project contains a TensorFlow-based Denoising Autoencoder (DAE) to exact model galaxy from observed images and perform galaxy model subtraction.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages