huggingface · echarlaix · Apr 25, 2023 · Apr 19, 2023 · Apr 19, 2023 · Apr 19, 2023
diff --git a/examples/openvino/stable-diffusion-quantization/README.md b/examples/openvino/stable-diffusion-quantization/README.md
@@ -0,0 +1,64 @@
+# Stable Diffusion Quantization
+This example demonstrates Quantization-aware Training (QAT) of Stable Diffusion UNet model wich is the most time-consuming element of the whole pipeline. The quantized model and the pipeline is exported to the OpenVINO format for inference with `OVStableDiffusionPipeline` helper.
+
+Knowledge distillation and EMA techniques can be used to improve the model accuracy.
+
+This example supports model tuning on two datasets from the HuggingFace:
+* [Pokemon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions)
+* [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en)
+
+But it can be easily extended to other datasets.
+>**Note**: laion2B-en is being downloaded on-fly durint the fine-tuning process. No need to store it locally.
+
+## Prerequisites
+* Install Opimum-Intel for OpenVINO:
+```python
+pip install optimum-intel[openvino]
+```
+* Install example requirements:
+```python
+pip install -r requirements.txt
+```
+
+## Running pre-optimized model
+* General-purpose image generation model:
+```python
+from optimum.intel.openvino import OVStableDiffusionPipeline
+
+pipe = OVStableDiffusionPipeline.from_pretrained("OpenVINO/stable-diffusion-2-1-quantized", compile=False)
+pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)
+pipe.compile()
+
+prompt = "sailing ship in storm by Rembrandt"
+output = pipe(prompt, num_inference_steps=50, output_type="pil")
+output.images[0].save("result.png")
+```
+* Pokemon generation:
+```python
+from optimum.intel.openvino import OVStableDiffusionPipeline
+
+pipe = OVStableDiffusionPipeline.from_pretrained("OpenVINO/Stable-Diffusion-Pokemon-en-quantized", compile=False)
+pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)
+pipe.compile()
+
+prompt = "cartoon bird"
+output = pipe(prompt, num_inference_steps=50, output_type="pil")
+output.images[0].save("result.png")
+```
+* You can also run `pokemon_generation_demo.ipynb` notebook from the folder to compare FP32 pipeline with the optimized.
+
+## HW Requirements for QAT
+The minimal HW setup for the run is GPU with 24GB of memory.
+
+>**NOTE**: Potentially you can set the number of training steps to 0 and it will lead to Post-Training Quantization. CPU should be enough in this case but you may need to modify the scipt.
+
+## Run QAT:
+* QAT for pokemon generation model:
+```python
+python quantize.py --ema_device="cpu" --use_kd --model_id="svjack/Stable-Diffusion-Pokemon-en" --center_crop --random_flip --gradient_checkpointing --dataloader_num_workers=2 --dataset_name="lambdalabs/pokemon-blip-captions"  --max_train_steps=4096 --opt_init_steps=300 --output_dir=sd-quantized-pokemon
+```
+
+* QAT on a laion dataset:
+```python
+CUDA_VISIBLE_DEVICES=2 python quantize.py --ema_device="cpu" --use_kd --center_crop --random_flip --dataset_name="laion/laion2B-en" --max_train_steps=10000  --model_id="runwayml/stable-diffusion-v1-5" --max_train_samples=100000 --dataloader_num_workers=8 --opt_init_steps=800 --gradient_checkpointing --output_dir=sd-1-5-quantied-laion
+```
diff --git a/examples/openvino/stable-diffusion-quantization/pokemon_generation_demo.ipynb b/examples/openvino/stable-diffusion-quantization/pokemon_generation_demo.ipynb
@@ -0,0 +1,134 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Comparison of the results of the stable diffusion quantization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from optimum.intel.openvino import OVStableDiffusionPipeline\n",
+    "from diffusers import DDPMScheduler\n",
+    "from IPython.display import display\n",
+    "\n",
+    "import torch\n",
+    "import random\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run the original pipeline\n",
+    "This pipeline was fine-tuned on the public [dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) with Pokemon images and the correspoinding captions. You can find the model and the description [here](https://huggingface.co/svjack/Stable-Diffusion-Pokemon-en)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipe = OVStableDiffusionPipeline.from_pretrained(\"svjack/Stable-Diffusion-Pokemon-en\", export=True, compile=False)\n",
+    "pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)\n",
+    "\n",
+    "pipe.compile()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's fix the seed for reproducibility.\n",
+    "np.random.seed(42)\n",
+    "random.seed(42)\n",
+    "torch.manual_seed(42)\n",
+    "\n",
+    "prompt = \"cartoon bird\"\n",
+    "output = pipe(prompt, num_inference_steps=50, output_type=\"pil\")\n",
+    "display(output.images[0])"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run the quantized pipeline\n",
+    "Now we run the quantized pipeline that was obtained with Quantization-Aware Training on the same dataset. The original model was used as a baseline for quantization. The resulted model can be found [here](https://huggingface.co/OpenVINO/Stable-Diffusion-Pokemon-en-quantized)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "quantized_pipe = OVStableDiffusionPipeline.from_pretrained(\"OpenVINO/Stable-Diffusion-Pokemon-en-quantized\", compile=False)\n",
+    "quantized_pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)\n",
+    "quantized_pipe.compile()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Use the same seed to compare\n",
+    "np.random.seed(42)\n",
+    "random.seed(42)\n",
+    "torch.manual_seed(42)\n",
+    "prompt = \"cartoon bird\"\n",
+    "\n",
+    "output = quantized_pipe(prompt, num_inference_steps=50, output_type=\"pil\")\n",
+    "display(output.images[0])"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now you can see the difference of the difference in the results and the time required to generate the image."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.8.10 ('stable_diffusion')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "7918409a64d3d4275e0103fc4443d9be5863d1df136c02ed032407c7ae821339"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}