Add Guidance Rescaling to LatentConsistencyModelPipeline#5859
Add Guidance Rescaling to LatentConsistencyModelPipeline#5859dg845 wants to merge 4 commits intohuggingface:mainfrom
Conversation
|
For the conditional noise prediction |
|
cc @patil-suraj feel free to merge if ok for you |
|
Do we have any results for this? And as you said, the model has not seen guidance scales below 3 during training, so I'm not sure if this makes a difference in results. And also we should support this in the base pipelines as well as we can now use LCMs with the base pipelines. |
|
I haven't tested this implementation of guidance rescaling on a full LCM checkpoint yet. I think people have tried guidance rescaling with the LCM LoRA on pipelines that use CFG instead of a guidance scale embedding (which avoids the problem of what the proper |
|
Here is a script to get some examples: import torch
from diffusers import LatentConsistencyModelPipeline
seed = 0
device = "cuda"
torch_dtype = torch.float16
model_id_or_path = "SimianLuo/LCM_Dreamshaper_v7"
pipe = LatentConsistencyModelPipeline.from_pretrained(
model_id_or_path,
torch_dtype=torch_dtype,
)
pipe.to(torch_device="cuda", torch_dtype=torch_dtype)
generator = torch.manual_seed(seed)
image = pipe(
prompt="Self-portrait oil painting, a beautiful cyborg with golden hair, 8k",
num_inference_steps=4,
guidance_scale=8.5, # 7.5 in the original LCM paper CFG formulation
generator=generator,
guidance_rescale=0.7, # The default suggested in the original guidance rescale paper
).images[0]
image.save(f"samples_seed_{seed}.png")I ran the inference in mixed precision due to GPU memory constraints. Here are some examples: Seed 0: Seed 2937: Seed 3409: Seed 49283: The examples look pretty good to me, but I'm not sure if they represent a noticeable improvement over samples without guidance rescaling, curious what people think about the sample quality @luosiallen @patil-suraj @patrickvonplaten. |
|
After some further investigation, it seems that images generated with guidance rescale and images generated without guidance rescale tend to be very similar because the CFG noise prediction In particular, I believe the above samples generated using guidance rescale are very similar to those generated without guidance rescale. They're not necessary visually indistinguishable; my experience so far is that images generated with guidance rescale tend to be a little darker than without (due to the fact that samples generated without CFG [e.g., |
|
@patil-suraj a gentle ping. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
@patil-suraj can you please check here again? |
|
To summarize my testing regarding this PR, it seems that the scales of the CFG estimate In the original guidance rescale paper, the authors observe that as terminal SNR goes to 0 (at timesteps near I guess the advantage of merging the PR would be that guidance rescaling would be available for the LCM pipelines as a feature. (I believe it's already available for other pipelines compatible with The downsides are as follows:
|
|
cc @patil-suraj again |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
gentle pin @patil-suraj |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
@patil-suraj could you give this a look? |
|
interesting experiment! I will leave this PR open so more people can test it out and see they find this feature helpful |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |




What does this PR do?
This PR adds classifier-free guidance rescaling (introduced in this paper) to
LatentConsistencyModelPipeline. Using guidance rescaling may improve the LCM sample quality, in particular when using zero terminal SNR (rescale_betas_zero_snr=True) inLCMScheduler.Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@patrickvonplaten
@sayakpaul
@luosiallen