Tiny_Stable_Diffusion.mojo

Some images I generated with Stable Diffusion and Stable Diffusion XL... what if we could generate them with Mojo?

Overview 💡

This is a 100% Mojo implementation of a forward pass of the Tiny Stable Diffusion model available here. Every component of the model was implemented from scratch, from basic integers and floats all the way to matrix multiplications, convolutions, image arrays, and operations that exist in PyTorch, such as linear layers, upsampling, and broadcasting. To view these operations in Mojo and use them in your project, check our helpers/utils.mojo. This will save you time, as you will not need to implement them from scratch. In this project, however, the primary use of these basic building blocks was to construct cross-attention modules, encoders, decoders, a diffusion module, CLIP, and other components of the Stable Diffusion pipeline.

The goal of this project is to provide a basic implementation of the Tiny Stable Diffusion model. My hope is that this implementation can be used by anyone who wants to modify the given code to load their own Stable Diffusion weights into this model. The code is divided as follows:

vae.mojo: A Variational Autoencoder (VAE), including both an encoder and decoder. The encoder is used to encode pre-existing images so that they can be used by the Stable Diffusion model. The decoder is used by all image generation forward passes, regardless of whether an initial image is provided or not
sampler.mojo: An implementation of a DDPM sampler
pipeline.mojo: Creates a Mojo pipeline that takes in a text prompt (and, optionally, an image) and computes a forward pass through the model
diffusion.mojo: Code for the diffusion part of the Tiny Stable Diffusion model (comprised of a UNet, Time Embedding, and output layer)
clip.mojo: Comprised of clip embeddings and a CLIP player, it implements the structs necessary to generate a CLIP text encoder used by the inference pipeline
helpers folder: Contains important low-level functions used everywhere in the code (such as convolution, image resizing, matrix multiplications, etc.) in the utils.mojo file. Also Contains the attention modules (self-attention, cross-attention) in the attention.mojo file.
demo.mojo: A simple example of how to run a forward pass
tokenizer_creation.py: A Python file that retrieves the CLIP tokenizer from Hugging Face and stores it as tokenizer_clip.bin. This .bin file will be read during image generation to load the CLIP tokenizer values. For this reason, this Python file should be executed before any forward passes are run, since this will allow you to recreate a CLIP tokenizer with real values in your machine.

Usage 🔨

First, retrive the CLIP tokenizer file

python tokenizer_creation.py

Next, compile the "helpers" package

mojo package helpers -o "helpers.mojopkg"

Next, load the Tiny Stable Diffusion weights available here into the model. To do so, check out the "Tokenizer" struct in helpers/utils.mojo. The init function in this struct shows how to load weights from a .bin file into a struct, so the same process can be applied to any other structs (CLIP, Encoder, Diffusion, etc) for which you would like to load weights. Just copy and paste that code and modify as needed. This file input / output logic was retrieved from the amazing Llama2.mojo project, linked in the "thanks" section.
Lastly, modify demo.mojo file to set the parameters you would like to use for the model and run a forward pass

mojo demo.mojo

Furthermore, change the "image_size" alias in the pipeline.mojo file to fit the image width / length that you desire to use!

Next Steps for this project (please fork and open a pull request if you would like to implement this!)

Load the Tiny Stable Diffusion weights into the model. To do so, check out the init function of the Tokenizer struct in utils.mojo. An identical method can be used to initialize the weights for every other struct in the model from a .bin file
Benchmark the speed of this Mojo implementation against the original Python-based one.

Thanks 🙏

Thanks to the extraordinary Pytorch Stable Diffusion implementation available here. This was the primary source of inspiration for this project.
Thanks to Segmind for developing the Tiny SD model!
Thanks to the awesome Llama2 Mojo implementation that helped me set up the tokenizer and taught me how to load bianry values into Mojo.
Thanks to this amazing Karpathy tutorial for creating a Llama2 tokenizer
Thanks to Modular for providing the #mojo-help Discord channel, which clarified many of my questions about the Mojo programming language.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
helpers		helpers
sample_images		sample_images
.DS_Store		.DS_Store
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
clip.mojo		clip.mojo
demo.mojo		demo.mojo
diffusion.mojo		diffusion.mojo
pipeline.mojo		pipeline.mojo
sampler.mojo		sampler.mojo
tokenizer_creation.py		tokenizer_creation.py
vae.mojo		vae.mojo

License

lrmantovani10/Stable-Diffusion.mojo

Folders and files

Latest commit

History

Repository files navigation

Tiny_Stable_Diffusion.mojo

Overview 💡

Usage 🔨

Next Steps for this project (please fork and open a pull request if you would like to implement this!)

Thanks 🙏

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages