Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Diffusion quantization example #294

Merged
merged 10 commits into from
Apr 25, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions examples/openvino/stable-diffusion-quantization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Stable Diffusion Quantization
This example demonstrates Quantization-aware Training (QAT) of Stable Diffusion UNet model wich is the most time-consuming element of the whole pipeline. The quantized model and the pipeline is exported to the OpenVINO format for inference with `OVStableDiffusionPipeline` helper.

Knowledge distillation and EMA techniques can be used to improve the model accuracy.

This example supports model tuning on two datasets from the HuggingFace:
* [Pokemon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions)
* [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en)

But it can be easily extended to other datasets.
>**Note**: laion2B-en is being downloaded on-fly durint the fine-tuning process. No need to store it locally.

## Prerequisites
* Install Opimum-Intel for OpenVINO:
AlexKoff88 marked this conversation as resolved.
Show resolved Hide resolved
```python
pip install optimum-intel[openvino]
```
* Install example requirements:
```python
pip install -r requirements.txt
AlexKoff88 marked this conversation as resolved.
Show resolved Hide resolved
```

## Running pre-optimized model
* General-purpose image generation model:
```python
from optimum.intel.openvino import OVStableDiffusionPipeline

pipe = OVStableDiffusionPipeline.from_pretrained("OpenVINO/stable-diffusion-2-1-quantized", compile=False)
pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)
pipe.compile()

prompt = "sailing ship in storm by Rembrandt"
output = pipe(prompt, num_inference_steps=50, output_type="pil")
output.images[0].save("result.png")
```
* Pokemon generation:
```python
from optimum.intel.openvino import OVStableDiffusionPipeline

pipe = OVStableDiffusionPipeline.from_pretrained("OpenVINO/Stable-Diffusion-Pokemon-en-quantized", compile=False)
pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)
pipe.compile()

prompt = "cartoon bird"
output = pipe(prompt, num_inference_steps=50, output_type="pil")
output.images[0].save("result.png")
```
* You can also run `pokemon_generation_demo.ipynb` notebook from the folder to compare FP32 pipeline with the optimized.

## HW Requirements for QAT
The minimal HW setup for the run is GPU with 24GB of memory.

>**NOTE**: Potentially you can set the number of training steps to 0 and it will lead to Post-Training Quantization. CPU should be enough in this case but you may need to modify the scipt.

## Run QAT:
* QAT for pokemon generation model:
```python
python quantize.py --ema_device="cpu" --use_kd --model_id="svjack/Stable-Diffusion-Pokemon-en" --center_crop --random_flip --gradient_checkpointing --dataloader_num_workers=2 --dataset_name="lambdalabs/pokemon-blip-captions" --max_train_steps=4096 --opt_init_steps=300 --output_dir=sd-quantized-pokemon
```
AlexKoff88 marked this conversation as resolved.
Show resolved Hide resolved

AlexKoff88 marked this conversation as resolved.
Show resolved Hide resolved
* QAT on a laion dataset:
```python
CUDA_VISIBLE_DEVICES=2 python quantize.py --ema_device="cpu" --use_kd --center_crop --random_flip --dataset_name="laion/laion2B-en" --max_train_steps=10000 --model_id="runwayml/stable-diffusion-v1-5" --max_train_samples=100000 --dataloader_num_workers=8 --opt_init_steps=800 --gradient_checkpointing --output_dir=sd-1-5-quantied-laion
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
{
AlexKoff88 marked this conversation as resolved.
Show resolved Hide resolved
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comparison of the results of the stable diffusion quantization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from optimum.intel.openvino import OVStableDiffusionPipeline\n",
"from diffusers import DDPMScheduler\n",
"from IPython.display import display\n",
"\n",
"import torch\n",
"import random\n",
"import numpy as np"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the original pipeline\n",
"This pipeline was fine-tuned on the public [dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) with Pokemon images and the correspoinding captions. You can find the model and the description [here](https://huggingface.co/svjack/Stable-Diffusion-Pokemon-en)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipe = OVStableDiffusionPipeline.from_pretrained(\"svjack/Stable-Diffusion-Pokemon-en\", export=True, compile=False)\n",
AlexKoff88 marked this conversation as resolved.
Show resolved Hide resolved
"pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)\n",
"\n",
"pipe.compile()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Let's fix the seed for reproducibility.\n",
"np.random.seed(42)\n",
"random.seed(42)\n",
"torch.manual_seed(42)\n",
AlexKoff88 marked this conversation as resolved.
Show resolved Hide resolved
"\n",
"prompt = \"cartoon bird\"\n",
"output = pipe(prompt, num_inference_steps=50, output_type=\"pil\")\n",
"display(output.images[0])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the quantized pipeline\n",
"Now we run the quantized pipeline that was obtained with Quantization-Aware Training on the same dataset. The original model was used as a baseline for quantization. The resulted model can be found [here](https://huggingface.co/OpenVINO/Stable-Diffusion-Pokemon-en-quantized)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"quantized_pipe = OVStableDiffusionPipeline.from_pretrained(\"OpenVINO/Stable-Diffusion-Pokemon-en-quantized\", compile=False)\n",
"quantized_pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)\n",
"quantized_pipe.compile()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use the same seed to compare\n",
"np.random.seed(42)\n",
"random.seed(42)\n",
"torch.manual_seed(42)\n",
"prompt = \"cartoon bird\"\n",
"\n",
"output = quantized_pipe(prompt, num_inference_steps=50, output_type=\"pil\")\n",
"display(output.images[0])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Now you can see the difference of the difference in the results and the time required to generate the image."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.10 ('stable_diffusion')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"vscode": {
"interpreter": {
"hash": "7918409a64d3d4275e0103fc4443d9be5863d1df136c02ed032407c7ae821339"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading