Stable Diffusion optimized for AMD RDNA2/RDNA3 GPUs
Before you start, please be aware that this is beta software that relies on a special AMD driver. Like all StableDiffusion GUIs published so far, you need some technical expertise to set it up. We apologize in advance if you bump into issues. If that happens, please don't hesitate to ask our Discord community for help! Please be assured that we (Nod and AMD) are working hard to improve the user experience in coming months. If it works well for you, please "star" the following GitHub projects... this is one of the best ways to help and spread the word!
Install this specific AMD Drivers (AMD latest may not have all the fixes).
AMD KB Drivers for RDNA2 and RDNA3:
AMD Software: Adrenalin Edition 22.11.1 for MLIR/IREE Driver Version 22.20.29.09 for Windows® 10 and Windows® 11 (Windows Driver Store Version 31.0.12029.9003)
First, for RDNA2 users, download this special driver in a folder of your choice. We recommend you keep the installation files around, since you may need to re-install it later, if Windows Update decides to overwrite it: https://www.amd.com/en/support/kb/release-notes/rn-rad-win-22-11-1-mlir-iree
For RDNA3, the latest driver 23.1.2 supports MLIR/IREE as well: https://www.amd.com/en/support/kb/release-notes/rn-rad-win-23-1-2-kb
KNOWN ISSUES with this special AMD driver:
Windows Update
may (depending how it's configured) automatically install a new official AMD driver that overwrites this IREE-specific driver. If Stable Diffusion used to work, then a few days later, it slows down a lot or produces incorrect results (e.g. black images), this may be the cause. To fix this problem, please check the installed driver version, and re-install the special driver if needed. (TODO: document how to prevent thisWindows Update
behavior!)- Some people using this special driver experience mouse pointer accuracy issues, especially if using a larger-than-default mouse pointer. The clicked point isn't centered properly. One possible work-around is to reset the pointer size to "1" in "Change pointer size and color".
Installation
Download the latest Windows SHARK SD binary 492 here in a folder of your choice. If you want nighly builds, you can look for them on the GitHub releases page.
Notes:
- We recommend that you download this EXE in a new folder, whenever you download a new EXE version. If you download it in the same folder as a previous install, you must delete the old
*.vmfb
files. Those contain Vulkan dispatches compiled from MLIR which can be outdated if you run a new EXE from the same folder. You can use--clear_all
flag once to clean all the old files. - If you recently updated the driver or this binary (EXE file), we recommend you:
- clear all the local artifacts with
--clear_all
OR - clear the Vulkan shader cache: For Windows users this can be done by clearing the contents of
C:\Users\%username%\AppData\Local\AMD\VkCache\
. On Linux the same cache is typically located at~/.cache/AMD/VkCache/
. - clear the
huggingface
cache. In Windows, this isC:\Users\%username%\.cache\huggingface
.
- clear all the local artifacts with
Running
- Open a Command Prompt or Powershell terminal, change folder (
cd
) to the .exe folder. Then run the EXE from the command prompt. That way, if an error occurs, you'll be able to cut-and-paste it to ask for help. (if it always works for you without error, you may simply double-click the EXE to start the web browser) - The first run may take about 10-15 minutes when the models are downloaded and compiled. Your patience is appreciated. The download could be about 5GB.
- If successful, you will likely see a Windows Defender message asking you to give permission to open a web server port. Accept it.
- Open a browser to access the Stable Diffusion web server. By default, the port is 8080, so you can go to http://localhost:8080/?__theme=dark.
Stopping
- Select the command prompt that's running the EXE. Press CTRL-C and wait a moment. The application should stop.
- Please make sure to do the above step before you attempt to update the EXE to a new version.
Results
Here are some samples generated:
The output on a 7900XTX would like:
Stats for run 0:
Average step time: 47.19188690185547ms/it
Clip Inference time (ms) = 109.531
VAE Inference time (ms): 78.590
Total image generation time: 2.5788655281066895sec
Find us on SHARK Discord server if you have any trouble with running it on your hardware.