ECE/CS 508 Spring 2023 Project Team 10

Project Title

CUDA Stable Diffusion Algorithm

ControlNet repository

Project Summary

Implement the stable diffusion algorithm in CUDA. The existing open-source implementations of the diffusion models [1][2][3][4] use high level APIs like pytorch or tensorflow for implementing diffusion models. Their performance is orders of magnitude off from the latency requirements of the real-time applications. Even on the modern desktop class GPUs, these models take ~weeks to train and 10s of seconds to generate even relatively low resolution images (512x512). [6] has shown that CUDA based implementations of the MLP networks can achieve approximately 1OOM improvement in training and inference times compared to their high level (tensorflow based) implementations. Moreover, the frame generation latency scales with the resolution of the frame, which makes matter even worse for higher resolution images. Hence, it might make sense to accelerate the stable diffusion algorithm using GPU programming languages. I propose to implement the state of the art stable discussion algorithm [4] in GPU programming languages (cuda or vulkan) where “hardware-aware” fusion of certain layers of diffusion models can be performed, which can potentially exploit more and more on-chip data reuse for intermediate outputs and avoid more expensive off-chip DRAM accesses.

CrossAttentionBlock Forward Pass Comparison

Experiments conducted on a single GeForce RTX 3050 Mobile GPU. TODO: Add nsight profiling results.

Model	Average Forward Time
Original CrossAttention	0.723301888
Our CrossAttention(t=1024)	0.700392

How to run

Currently, this project can only be run using rai, a tool from UIUC to run on cloud servers.

rai -p .

If you need to do profiling, you can use the following command:

rai -p . --queue rai_amd64_exclusive

Input image:
Output image:

How to configure rai build commands

Run controlnet with out attentionblock implementation

...
commands:
  build:
    - /bin/sh -c 'cd /src/python/ && python3.8 -m pip install ./'
    - /bin/sh -c 'cd /src/ControlNet && python3.8 my_scribble2image.py'

TODO:

~~Generate the fake data for running the kernel from the CrossAttention~~
~~Build unittest for verifying the correctness of the kernel~~
~~Move the model weights into the docker container and push~~
~~Integrate pytorch & cuda-c~~
1. ~~Find necessary modules from the ControlNet code to make sure what we need to implement~~
2. ~~Find out how to connect the C++ code to pytorch~~
~~Implement the operations in an CrossAttention~~
1. ~~Matrix Multiplication~~
2. ~~Softmax~~
Implement the cuda operations in an CrossAttention
1. ~~Matrix Multiplication~~
2. Softmax
~~Plug our CrossAttention implementation into the ControlNet model and load weights successfully~~
1. Provide argparse options to switch between the original and our implementation
Profiling
1. ~~ControlNet with the original CrossAttention~~
2. ~~ControlNet with our CrossAttention~~
Potential code optimizations
1. ~~pointer/ref~~
2. ~~torch.zero_grad()~~
Decouple the project with rai

Team Information

Info	Description	Email
TeamID	Team-10
Member1	Cheng-Han Chiang	chc11@illinois.edu
Member2	Shao-Chian Chen	scchen4@illinois.edu
Member3	Po-Wei Wang	poweiww2@illinois.edu

4/20 meeting status overview

where we are in the process

Trying to run gradio_hough2image.py in ControlNet, but we don't have enough gpus to run it.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
ControlNet		ControlNet
common		common
data		data
docker		docker
docs		docs
github		github
media		media
python		python
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Final Report.pdf		Final Report.pdf
README.md		README.md
helper.hpp		helper.hpp
rai_build.yml		rai_build.yml
template.cu		template.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECE/CS 508 Spring 2023 Project Team 10

Project Title

CUDA Stable Diffusion Algorithm

Project Summary

CrossAttentionBlock Forward Pass Comparison

How to run

How to configure rai build commands

TODO:

Team Information

4/20 meeting status overview

where we are in the process

About

Releases 2

Packages

Contributors 3

Languages

KMint1819/cuda-diffusion

Folders and files

Latest commit

History

Repository files navigation

ECE/CS 508 Spring 2023 Project Team 10

Project Title

CUDA Stable Diffusion Algorithm

Project Summary

CrossAttentionBlock Forward Pass Comparison

How to run

How to configure rai build commands

TODO:

Team Information

4/20 meeting status overview

where we are in the process

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages