TensorRT Text2Image Stable Diffusion Pipeline

The TensorRT Pipeline can be used to accelerate the Text2Image Stable Diffusion Inference run.

NOTE: The ONNX conversions and TensorRT engine build may take up to 30 minutes. This script was contributed by [Asfiya Baig](https://github.com/asfiyab-nvidia) and the notebook by [Parag Ekbote](https://github.com/ParagEkbote).

In [1]:
pip install polygraphy onnx cuda-python onnx-graphsurgeon tensorrt onnxruntime-gpu colored

Collecting colored
  Downloading colored-2.3.0-py3-none-any.whl.metadata (3.6 kB)
Downloading colored-2.3.0-py3-none-any.whl (18 kB)
Installing collected packages: colored
Successfully installed colored-2.3.0
Note: you may need to restart the kernel to use updated packages.


In [1]:
import torch
from diffusers import DDIMScheduler
from diffusers.pipelines import DiffusionPipeline

# Use the DDIMScheduler scheduler here instead
scheduler = DDIMScheduler.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="scheduler")

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1",
    custom_pipeline="stable_diffusion_tensorrt_txt2img",
    variant='fp16',
    torch_dtype=torch.float16,
    scheduler=scheduler,)

# re-use cached folder to save ONNX models and TensorRT Engines
pipe.set_cached_folder("stabilityai/stable-diffusion-2-1", variant='fp16',)

pipe = pipe.to("cuda")

prompt = "a beautiful photograph of Mt. Fuji during cherry blossom"
image = pipe(prompt).images[0]
image.save('tensorrt_mt_fuji.png')

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

Fetching 28 files:   0%|          | 0/28 [00:00<?, ?it/s]

Running inference on device: cuda:0
Building Engines...
Engine build can take a while to complete
Building Engines...
Engine build can take a while to complete
Building TensorRT engine for /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/onnx/unet.opt.onnx: /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/unet.plan


[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
[I] Configuring with profiles:[
        Profile 0:
            {sample [min=(2, 4, 96, 96), opt=(2, 4, 96, 96), max=(8, 4, 96, 96)],
             encoder_hidden_states [min=(2, 77, 1024), opt=(2, 77, 1024), max=(8, 77, 1024)],
             timestep [min=[1], opt=[1], max=[1]]}
    ]
[I] Loading tactic timing cache from /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/timing_cache
[38;5;11m[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.[0m
[38;5;14m[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.STANDARD
    Memory Pools           | [WORKSPACE: 22699.88 MiB, TACTIC_DRAM: 22699.88 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
    Tactic Sources         | []
    Profiling Verbosity    | Pro

Building TensorRT engine for /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/onnx/vae.opt.onnx: /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/vae.plan


[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
[I] Configuring with profiles:[
        Profile 0:
            {latent [min=(1, 4, 96, 96), opt=(1, 4, 96, 96), max=(4, 4, 96, 96)]}
    ]
[I] Loading tactic timing cache from /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/timing_cache
[38;5;14m[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.STANDARD
    Memory Pools           | [WORKSPACE: 22699.88 MiB, TACTIC_DRAM: 22699.88 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
    Tactic Sources         | []
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [PROFILE_SHARING_0806][0m
[38;5;10m[I] Finished engine building in 172.855 seconds[0m
[I] Saving tactic timing cache to /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapsho

Loading TensorRT engine: /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/clip.plan


[I] Saving engine to /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/vae.plan
[I] Loading bytes from /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/clip.plan


Loading TensorRT engine: /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/unet.plan


[I] Loading bytes from /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/unet.plan


Loading TensorRT engine: /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/vae.plan


[I] Loading bytes from /home/zeus/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-1/snapshots/5cae40e6a2745ae2b01ad92ae5043f95f23644d6/engine/vae.plan
