Envisioning Distant Worlds: Fine-Tuning a Latent Diffusion Model with NASA's Exoplanet Data

This project fine-tunes Stable Diffusion with NASA's confirmed exoplanet data to generate predicted visualizations of distant worlds.

Description

This repository holds all code necessary to execute my thesis for the Masters of Science in Data Science and AI at the University of Arts London. This project sought to fine-tune Stable Diffusion to generate scientifically supported images of distant exoplanets. Stable Diffusion was fine-tuned using generated prompts based on the data within NASA's Confirmed Exoplanet dataset and images from NASA's Image and Video Library. The prompts were generated by translating the numeric data within the Exoplanet dataset to textual descriptions interpreting that data. A series of different translations occured to create one prompt. This process was repeated to create four total prompts that were then used to fine-tune Stable Diffusion. The results of these models were evluated to determine the best model for this type of scientifically backed image generation.

The results and evaluation can be found in the thesis report. This repository, instead, holds the following documents:

getting_images: A notebook and python file used to search and save images from NASA's Image and Video Library
prompt_generator_functions: A notebook and python file used to clean and process the data, translate values, and generate prompts
training_data_prompts: Training Dataset as a .csv file updated with numeric data and the generated prompts
updated_training_prompts: A simplified Training Dataset used to preprocess and download images and create a metadata file for training the Stable Diffusion model
exoplanet_data_prompts: Exoplanet Dataset as a .csv file updated with the generated prompts
getting_training_datasets: A notebook and python file used to download the images from our training_data_prompts file, save them as 512x512 images, and write the necessary metadata file used to train Stable Diffusion and push all data to HuggingFace.

This repository does note hold the Kohya Colab Notebook nor the Automatic1111 WebUI Colab Notebook that were adapted to train and visualize the model. The links to these are below should you desire to use those yourself.

Getting Started

Dependencies

30GB GPU or usage of Google Colab

Installing

Download the .csv files
Either upload the

Executing program

Download the necessary datasets.

If using the Jupyter Notebook:

Open the file in a code editor and connect to a code environment.
Run each cell in the notebook. Should you want to make adjustments, be sure to save the file before continuing to run each cell. If there are issues, restart the kernel and re-run each cell. You can run the jupyter notebook either through Google Colab by uploading the files to your Google Drive, or by running them on your local machine.

If using the python files:

Download the .py files
Open a terminal and locate where the downloaded datasets and files are on your local machine.
Create a new environment to run everything in.
Run this line in your terminal to get and save images: python getting_images.py -k <your_api_key> --planet-photographs
Run this line in your terminal to develop the prompts for each image in the training and exoplanet dataset: python prompt_generator_functions.py
Run this line in the terminal prepare the images to fine tune a Stable Diffusion model: python your_script.py --input-csv training_data_prompts.csv --output-csv updated_training_data_prompts.csv --data-folder data_huggingface --metadata-json metadata.json

Please note that this project manipulated and adapted a Kohya Notebook to fine-tune Stable Diffusion, available here: https://colab.research.google.com/drive/1ZVukUuUMLxIZ6BgX7loKSMxcoBhfg70B#scrollTo=XhXhQY5Sov-g. As well as an Automatic1111 WebUI made available by The Last Ben, available here: https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb#scrollTo=Baw78R-w4T2j.

If you choose to run this project on your local machine these files will need to be downloaded to your local machine as well and adapted for those running the work in their terminal.

Help

The most common problem I encountered was run-time lengths, GPU usage, and machine incompatibilities. I encourage those interested in pursuing this project to ensure they have a strong enough GPU to handle the large datasets. A full discussion of the difficulties is available within the thesis document within this notebook. I encourage those interested in this subject and area of research to review that prior to adapting or replicating this work.

Authors

Marissa Beaty, https://github.com/mbeaty2

Version History

0.1
- Initial Release

Acknowledgments

Inspiration, code snippets, etc.

awesome-readme

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
images		images
notebooks		notebooks
.gitattributes		.gitattributes
MBeaty_Thesis.pdf		MBeaty_Thesis.pdf
README.md		README.md
exoplanet_data_prompts.csv.zip		exoplanet_data_prompts.csv.zip
getting_images.py		getting_images.py
getting_training_datasets.py		getting_training_datasets.py
prompt_generator_functions.py		prompt_generator_functions.py
training_data_prompts.csv		training_data_prompts.csv
updated_training_data_prompts.csv		updated_training_data_prompts.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

notebooks

notebooks

.gitattributes

.gitattributes

MBeaty_Thesis.pdf

MBeaty_Thesis.pdf

README.md

README.md

exoplanet_data_prompts.csv.zip

exoplanet_data_prompts.csv.zip

getting_images.py

getting_images.py

getting_training_datasets.py

getting_training_datasets.py

prompt_generator_functions.py

prompt_generator_functions.py

training_data_prompts.csv

training_data_prompts.csv

updated_training_data_prompts.csv

updated_training_data_prompts.csv

Repository files navigation

Envisioning Distant Worlds: Fine-Tuning a Latent Diffusion Model with NASA's Exoplanet Data

Description

Getting Started

Dependencies

Installing

Executing program

Help

Authors

Version History

Acknowledgments

About

Releases

Packages

Languages

mbeaty2/envisioning_exoplanets

Folders and files

Latest commit

History

Repository files navigation

Envisioning Distant Worlds: Fine-Tuning a Latent Diffusion Model with NASA's Exoplanet Data

Description

Getting Started

Dependencies

Installing

Executing program

Help

Authors

Version History

Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages