Skip to content

Commit

Permalink
SD3: initial support (#2124)
Browse files Browse the repository at this point in the history
TO DO:

- [x] readme
- [x] text
- [x] table of content
- [x] device selection
- [x] quantization
- [x] meta
- [x] gradio
  • Loading branch information
eaidova committed Jun 20, 2024
1 parent 1f11b58 commit 08cb183
Show file tree
Hide file tree
Showing 9 changed files with 1,986 additions and 4 deletions.
3 changes: 2 additions & 1 deletion .ci/ignore_convert_execution.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,5 @@ notebooks/stable-video-diffusion/stable-video-diffusion.ipynb
notebooks/llm-agent-langchain/llm-agent-langchain.ipynb
notebooks/hello-npu/hello-npu.ipynb
notebooks/yolov10-optimization/yolov10-optimization.ipynb
notebooks/hunyuan-dit-image-generation/hunyuan-dit-image-generation.ipynb
notebooks/hunyuan-dit-image-generation/hunyuan-dit-image-generation.ipynb
notebooks/stable-diffusion-v3/stable-diffusion-v3.ipynb
3 changes: 2 additions & 1 deletion .ci/ignore_pip_conflicts.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@ notebooks/sketch-to-image-pix2pix-turbo/sketch-to-image-pix2pix-turbo.ipynb
notebooks/yolov10-optimization/yolov10-optimization.ipynb # nncf from git
notebooks/person-counting-webcam/person-counting.ipynb # numpy should be installed first
notebooks/llava-multimodal-chatbot/videollava-multimodal-chatbot.ipynb # torchvision < 0.17.0
notebooks/parler-tts-text-to-speech/parler-tts-text-to-speech.ipynb # torch >= 2.2
notebooks/parler-tts-text-to-speech/parler-tts-text-to-speech.ipynb # torch >= 2.2
notebooks/stable-diffusion-v3/stable-diffusion-v3.ipynb # diffusers from git
1 change: 1 addition & 0 deletions .ci/ignore_treon_docker.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,4 @@ notebooks/yolov10-optimization/yolov10-optimization.ipynb
notebooks/whisper-subtitles-generation/whisper-subtitles-generation.ipynb
notebooks/speechbrain-emotion-recognition/speechbrain-emotion-recognition.ipynb
notebooks/hunyuan-dit-image-generation/hunyuan-dit-image-generation.ipynb
notebooks/stable-diffusion-v3/stable-diffusion-v3.ipynb
3 changes: 2 additions & 1 deletion .ci/ignore_treon_linux.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,5 @@ notebooks/stable-cascade-image-generation/stable-cascade-image-generation.ipynb
notebooks/dynamicrafter-animating-images/dynamicrafter-animating-images.ipynb
notebooks/yolov10-optimization/yolov10-optimization.ipynb
notebooks/whisper-subtitles-generation/whisper-subtitles-generation.ipynb
notebooks/hunyuan-dit-image-generation/hunyuan-dit-image-generation.ipynb
notebooks/hunyuan-dit-image-generation/hunyuan-dit-image-generation.ipynb
notebooks/stable-diffusion-v3/stable-diffusion-v3.ipynb
3 changes: 2 additions & 1 deletion .ci/ignore_treon_mac.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,5 @@ notebooks/dynamicrafter-animating-images/dynamicrafter-animating-images.ipynb
notebooks/yolov10-optimization/yolov10-optimization.ipynb
notebooks/nano-llava-multimodal-chatbot/nano-llava-multimodal-chatbot.ipynb
notebooks/whisper-subtitles-generation/whisper-subtitles-generation.ipynb
notebooks/hunyuan-dit-image-generation/hunyuan-dit-image-generation.ipynb
notebooks/hunyuan-dit-image-generation/hunyuan-dit-image-generation.ipynb
notebooks/stable-diffusion-v3/stable-diffusion-v3.ipynb
1 change: 1 addition & 0 deletions .ci/ignore_treon_win.txt
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,4 @@ notebooks/dynamicrafter-animating-images/dynamicrafter-animating-images.ipynb
notebooks/yolov10-optimization/yolov10-optimization.ipynb
notebooks/whisper-subtitles-generation/whisper-subtitles-generation.ipynb
notebooks/hunyuan-dit-image-generation/hunyuan-dit-image-generation.ipynb
notebooks/stable-diffusion-v3/stable-diffusion-v3.ipynb
1 change: 1 addition & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,7 @@ MLLM
MLLMs
MMVLM
MLP
MMDiT
MobileCLIP
MobileLLaMA
mobilenet
Expand Down
42 changes: 42 additions & 0 deletions notebooks/stable-diffusion-v3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Image generation with Stable Diffusion v3 and OpenVINO

Stable Diffusion V3 is next generation of latent diffusion image Stable Diffusion models family that outperforms state-of-the-art text-to-image generation systems in typography and prompt adherence, based on human preference evaluations. In comparison with previous versions, it based on Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

![mmdit.png](https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/dd079427-89f2-4d28-a10e-c80792d750bf)

More details about model can be found in [model card](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [research paper](https://stability.ai/news/stable-diffusion-3-research-paper) and [Stability.AI blog post](https://stability.ai/news/stable-diffusion-3-medium).
In this tutorial, we will consider how to convert and optimize Stable Diffusion v3 for running with OpenVINO.
If you want to run previous Stable Diffusion versions, please check our other notebooks:

* [Stable Diffusion](../stable-diffusion-text-to-image)
* [Stable Diffusion v2](../stable-diffusion-v2)
* [Stable Diffusion XL](../stable-diffusion-xl)
* [LCM Stable Diffusion](../latent-consistency-models-image-generation)
* [Turbo SDXL](../sdxl-turbo)
* [Turbo SD](../sketch-to-image-pix2pix-turbo)


The notebook provides a simple interface that allows communication with a model using text instruction. In this demonstration user can provide input instructions and the model generates an image. An additional part demonstrates how to optimize model with [NNCF](https://github.com/openvinotoolkit/nncf/) to speed up pipeline and reduce memory consumption.

The image below illustrates the provided generated image example.

![text2img_example.png](https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/ac99098c-66ec-4b7b-9e01-e80625f1dc3f)

>**Note**: Some demonstrated models can require at least 32GB RAM for conversion and running.
### Notebook Contents

The tutorial consists of the following steps:

- Install prerequisites
- Collect Pytorch model pipeline
- Convert model to OpenVINO intermediate representation (IR) format and compress weights using NNCF
- Prepare OpenVINO Inference pipeline
- Run Text-to-Image generation
- Launch interactive demo

## Installation Instructions

This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).
Loading

0 comments on commit 08cb183

Please sign in to comment.