Skip to content

Add Guidance Rescaling to LatentConsistencyModelPipeline#5859

Open
dg845 wants to merge 4 commits intohuggingface:mainfrom
dg845:lcm-pipeline-rescale-cfg
Open

Add Guidance Rescaling to LatentConsistencyModelPipeline#5859
dg845 wants to merge 4 commits intohuggingface:mainfrom
dg845:lcm-pipeline-rescale-cfg

Conversation

@dg845
Copy link
Copy Markdown
Collaborator

@dg845 dg845 commented Nov 18, 2023

What does this PR do?

This PR adds classifier-free guidance rescaling (introduced in this paper) to LatentConsistencyModelPipeline. Using guidance rescaling may improve the LCM sample quality, in particular when using zero terminal SNR (rescale_betas_zero_snr=True) in LCMScheduler.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten
@sayakpaul
@luosiallen

@dg845
Copy link
Copy Markdown
Collaborator Author

dg845 commented Nov 18, 2023

For the conditional noise prediction noise_pred_text ($x_{pos}$ in the paper) for rescale_noise_cfg, I am currently using the output of the unet on the same latents and prompt_embeds but with a guidance scale embedding corresponding to a guidance_scale of 1 (e.g., no CFG). While this should theoretically remove the unconditional output and leave only the conditional output, it's not obvious that this is the right thing to do because the LCM might not have seen guidance scale values that low during training/distillation (during training/distillation, a random guidance scale is typically sampled in $[3, 15]$; see Appendix F of the LCM paper).

@patrickvonplaten
Copy link
Copy Markdown
Contributor

cc @patil-suraj feel free to merge if ok for you

@patil-suraj
Copy link
Copy Markdown
Contributor

Do we have any results for this? And as you said, the model has not seen guidance scales below 3 during training, so I'm not sure if this makes a difference in results.

And also we should support this in the base pipelines as well as we can now use LCMs with the base pipelines.

@dg845
Copy link
Copy Markdown
Collaborator Author

dg845 commented Nov 20, 2023

I haven't tested this implementation of guidance rescaling on a full LCM checkpoint yet. I think people have tried guidance rescaling with the LCM LoRA on pipelines that use CFG instead of a guidance scale embedding (which avoids the problem of what the proper $x_{pos}$ value should be).

@dg845
Copy link
Copy Markdown
Collaborator Author

dg845 commented Nov 28, 2023

Here is a script to get some examples:

import torch
from diffusers import LatentConsistencyModelPipeline

seed = 0
device = "cuda"
torch_dtype = torch.float16
model_id_or_path = "SimianLuo/LCM_Dreamshaper_v7"
pipe = LatentConsistencyModelPipeline.from_pretrained(
    model_id_or_path,
    torch_dtype=torch_dtype,
)
pipe.to(torch_device="cuda", torch_dtype=torch_dtype)

generator = torch.manual_seed(seed)
image = pipe(
    prompt="Self-portrait oil painting, a beautiful cyborg with golden hair, 8k",
    num_inference_steps=4,
    guidance_scale=8.5,  # 7.5 in the original LCM paper CFG formulation
    generator=generator,
    guidance_rescale=0.7,  # The default suggested in the original guidance rescale paper
).images[0]

image.save(f"samples_seed_{seed}.png")

I ran the inference in mixed precision due to GPU memory constraints.

Here are some examples:

Seed 0:

samples

Seed 2937:

samples_seed_2937

Seed 3409:

samples_seed_3409

Seed 49283:

samples_seed_49283

The examples look pretty good to me, but I'm not sure if they represent a noticeable improvement over samples without guidance rescaling, curious what people think about the sample quality @luosiallen @patil-suraj @patrickvonplaten.

@dg845
Copy link
Copy Markdown
Collaborator Author

dg845 commented Nov 29, 2023

After some further investigation, it seems that images generated with guidance rescale and images generated without guidance rescale tend to be very similar because the CFG noise prediction noise_pred_cfg and non-CFG (conditional) noise prediction noise_pred_cond have very similar standard deviations throughout sampling, and thus the rescaled noise prediction is very similar to the original CFG noise prediction (at least for the prompts I've tested so far with the SimianLuo/LCM_Dreamshaper_v7 checkpoint). Note that when this is the case, increasing the guidance_rescale factor typically doesn't have much effect because we're interpolating between two very similar noise predictions.

In particular, I believe the above samples generated using guidance rescale are very similar to those generated without guidance rescale. They're not necessary visually indistinguishable; my experience so far is that images generated with guidance rescale tend to be a little darker than without (due to the fact that samples generated without CFG [e.g., guidance_scale = 1.0] tend to be darker).

@sayakpaul
Copy link
Copy Markdown
Member

@patil-suraj a gentle ping.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot added the stale Issues that haven't received updates label Jan 16, 2024
@patrickvonplaten patrickvonplaten removed the stale Issues that haven't received updates label Jan 17, 2024
@patrickvonplaten
Copy link
Copy Markdown
Contributor

@patil-suraj can you please check here again?

@dg845
Copy link
Copy Markdown
Collaborator Author

dg845 commented Jan 18, 2024

To summarize my testing regarding this PR, it seems that the scales of the CFG estimate noise_pred_cfg and the conditional noise prediction noise_pred_cond are very similar for the SimianLuo/LCM_Dreamshaper_v7 LCM checkpoint. Thus using guidance rescaling typically alters the final sample only a little bit (see #5859 (comment)).

In the original guidance rescale paper, the authors observe that as terminal SNR goes to 0 (at timesteps near num_train_timesteps $T$), and at high guidance weights noise_pred_cfg becomes large and can result in saturated images, and proposed guidance rescaling to fix this. I am not sure if the same conditions hold in general for LCM models since they don't perform CFG normally.

I guess the advantage of merging the PR would be that guidance rescaling would be available for the LCM pipelines as a feature. (I believe it's already available for other pipelines compatible with LCMScheduler such as StableDiffusionPipeline because CFG and guidance rescaling are already implemented for those pipelines.)

The downsides are as follows:

  • With current LCM checkpoints, guidance rescaling does not seem to have a big effect on the samples
  • The LCM pipelines are more complex with guidance rescaling implemented (pipelines which implement CFG basically get guidance rescaling for free, but the LCM pipelines don't use CFG normally so a CFG-like implementation is currently used in order to support guidance rescaling)
  • Guidance rescaling is not currently theoretically well-justified for LCM models

@patrickvonplaten
Copy link
Copy Markdown
Contributor

cc @patil-suraj again

@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot added the stale Issues that haven't received updates label Feb 12, 2024
@github-actions github-actions Bot closed this Feb 21, 2024
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Feb 21, 2024
@yiyixuxu yiyixuxu reopened this Feb 21, 2024
@yiyixuxu
Copy link
Copy Markdown
Collaborator

gentle pin @patil-suraj

@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot added the stale Issues that haven't received updates label Mar 17, 2024
@sayakpaul sayakpaul removed the stale Issues that haven't received updates label Mar 17, 2024
@sayakpaul
Copy link
Copy Markdown
Member

@patil-suraj could you give this a look?

@yiyixuxu
Copy link
Copy Markdown
Collaborator

interesting experiment!
In general, I think we should not add a feature unless there is a use case for it.

I will leave this PR open so more people can test it out and see they find this feature helpful

@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot added the stale Issues that haven't received updates label Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributions-welcome stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants