Skip to content

mbeaty2/envisioning_exoplanets

Repository files navigation

Envisioning Distant Worlds: Fine-Tuning a Latent Diffusion Model with NASA's Exoplanet Data

This project fine-tunes Stable Diffusion with NASA's confirmed exoplanet data to generate predicted visualizations of distant worlds.

Description

This repository holds all code necessary to execute my thesis for the Masters of Science in Data Science and AI at the University of Arts London. This project sought to fine-tune Stable Diffusion to generate scientifically supported images of distant exoplanets. Stable Diffusion was fine-tuned using generated prompts based on the data within NASA's Confirmed Exoplanet dataset and images from NASA's Image and Video Library. The prompts were generated by translating the numeric data within the Exoplanet dataset to textual descriptions interpreting that data. A series of different translations occured to create one prompt. This process was repeated to create four total prompts that were then used to fine-tune Stable Diffusion. The results of these models were evluated to determine the best model for this type of scientifically backed image generation.

The results and evaluation can be found in the thesis report. This repository, instead, holds the following documents:

  1. getting_images: A notebook and python file used to search and save images from NASA's Image and Video Library
  2. prompt_generator_functions: A notebook and python file used to clean and process the data, translate values, and generate prompts
  3. training_data_prompts: Training Dataset as a .csv file updated with numeric data and the generated prompts
  4. updated_training_prompts: A simplified Training Dataset used to preprocess and download images and create a metadata file for training the Stable Diffusion model
  5. exoplanet_data_prompts: Exoplanet Dataset as a .csv file updated with the generated prompts
  6. getting_training_datasets: A notebook and python file used to download the images from our training_data_prompts file, save them as 512x512 images, and write the necessary metadata file used to train Stable Diffusion and push all data to HuggingFace.

This repository does note hold the Kohya Colab Notebook nor the Automatic1111 WebUI Colab Notebook that were adapted to train and visualize the model. The links to these are below should you desire to use those yourself.

Getting Started

Dependencies

  • 30GB GPU or usage of Google Colab

Installing

  • Download the .csv files
  • Either upload the

Executing program

  • Download the necessary datasets.

If using the Jupyter Notebook:

  • Open the file in a code editor and connect to a code environment.
  • Run each cell in the notebook. Should you want to make adjustments, be sure to save the file before continuing to run each cell. If there are issues, restart the kernel and re-run each cell. You can run the jupyter notebook either through Google Colab by uploading the files to your Google Drive, or by running them on your local machine.

If using the python files:

  • Download the .py files
  • Open a terminal and locate where the downloaded datasets and files are on your local machine.
  • Create a new environment to run everything in.
  • Run this line in your terminal to get and save images: python getting_images.py -k <your_api_key> --planet-photographs
  • Run this line in your terminal to develop the prompts for each image in the training and exoplanet dataset: python prompt_generator_functions.py
  • Run this line in the terminal prepare the images to fine tune a Stable Diffusion model: python your_script.py --input-csv training_data_prompts.csv --output-csv updated_training_data_prompts.csv --data-folder data_huggingface --metadata-json metadata.json

Please note that this project manipulated and adapted a Kohya Notebook to fine-tune Stable Diffusion, available here: https://colab.research.google.com/drive/1ZVukUuUMLxIZ6BgX7loKSMxcoBhfg70B#scrollTo=XhXhQY5Sov-g. As well as an Automatic1111 WebUI made available by The Last Ben, available here: https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb#scrollTo=Baw78R-w4T2j.

If you choose to run this project on your local machine these files will need to be downloaded to your local machine as well and adapted for those running the work in their terminal.

Help

The most common problem I encountered was run-time lengths, GPU usage, and machine incompatibilities. I encourage those interested in pursuing this project to ensure they have a strong enough GPU to handle the large datasets. A full discussion of the difficulties is available within the thesis document within this notebook. I encourage those interested in this subject and area of research to review that prior to adapting or replicating this work.

Authors

Marissa Beaty, https://github.com/mbeaty2

Version History

  • 0.1
    • Initial Release

Acknowledgments

Inspiration, code snippets, etc.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages