Skip to content

Conversation

BenjaminBossan
Copy link
Member

See huggingface/diffusers#11816 (comment)

Description

This PR implements two small improvements to the speed of adapter injection. On a benchmark based on the linked issue, the first change leads to a speedup of 21% and the second change of another 3%. It's not that much, but as the changes don't make the code more complicated, there is really no reason not to take them.

The optimizations don't add any functional change but are simply based on not recomputing the same values multiple times. Therefore, unless I'm missing something, they should strictly improve runtime.

Note

Be careful when profiling this: Each operation is very quick but can be perfomed millions of times. If the profiler adds overhead, it can completely skew the results. E.g. with pyinstrument, just enabling the profiler increases execution time (after optimization) from ~15 sec to ~21 sec.

Script

This was the script I used to profile (profiler commented out for aforementioned reason):

import time

import torch
from diffusers import StableDiffusionXLPipeline
from huggingface_hub import hf_hub_download
# from pyinstrument import Profiler

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    #controlnet=controlnet,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

hf_hub_download("johass/lr", "k1.safetensors")

loras = [
    "k1.safetensors",
] * 6

# profiler = Profiler(interval=0.01)
# profiler.start()

for i, lora in enumerate(loras):
    adapter_name = lora.replace(".", "_")
    adapter_name = f"{i}_{adapter_name}"
    ti = time.time()
    pipeline.load_lora_weights(
        "johass/lr", weight_name=lora, adapter_name=adapter_name, low_cpu_mem_usage=True
    )
    dt = time.time() - ti
    pipeline.set_adapters(adapter_name, adapter_weights=0.7)
    print(f"Loaded adapter {i} in {dt:.3f} s")

# profiler.stop()
# profiler.write_html("profile.html")

See
huggingface/diffusers#11816 (comment)

This PR implements two small improvements to the speed of adapter
injection. On a benchmark based on the linked issue, the first change
leads to a speedup of 21% and the second change of another 3%. It's not
that much, but as the changes don't make the code more complicated,
there is really no reason not to take them.

The optimizations don't add any functional change but are simply based
on not recomputing the same values multiple times. Therefore, unless I'm
missing something, they should strictly improve runtime.

Note:

Be careful when profiling this: Each operation is very quick but can be
perfomed millions of times. If the profiler adds overhead, it can
completely skew the results. E.g. with pyinstrument, just enabling the
profiler increases execution time (after optimization) from ~15 sec to
~21 sec.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@BenjaminBossan BenjaminBossan merged commit f6b0a2d into huggingface:main Sep 23, 2025
14 checks passed
@BenjaminBossan BenjaminBossan deleted the enh-small-optimizations-to-injection-speed branch September 23, 2025 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants