Skip to content

Conversation

@hlky
Copy link
Contributor

@hlky hlky commented Oct 9, 2024

What does this PR do?

This PR provides a slight performance improvement to Euler, EDMEuler, FlowMatchHeun by moving creation of randn tensor noise to inside the gamma > 0 condition.

gamma, calculated as min(s_churn / (len(self.sigmas) - 1), 2**0.5 - 1) if s_tmin <= sigma <= s_tmax else 0.0 is generally 0.0 as s_churn, s_tmin and s_tmax are typically not changed from their default values of 0.0, 0.0 and inf respectively.

Similarly in KDPM2Ancestral we move creation of noise to inside the 2nd order path.

Who can review?

@yiyixuxu

@vladmandic
Copy link
Contributor

@hlky love the attention you're giving schedulers.

btw, a bit off-topic, but though of mentioning it here:
this is executed on each step - and its definitely causing torch sync:

sample = sample.to(torch.float32)

@hlky hlky changed the title Slight performance improvement to Euler Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral Oct 9, 2024
@hlky
Copy link
Contributor Author

hlky commented Oct 9, 2024

Thanks! Looks like that cast and the mentioned precision issues comes from rescale_betas_zero_snr so perhaps the cast could be moved under if self.config.rescale_betas_zero_snr, or a different value like in Comfy and Forge could be used and the cast removed.

@vladmandic
Copy link
Contributor

rescale_betas_zero_snr

that options is so rarely used that moving cast only if its true should be more than ok.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Oct 10, 2024

some of the tests failed as expected because the random seeds are used differently now
can we:

  1. make a simple slow test that running a pipeline with each of these scheduler we changed and make sure the output is expected (even though it might be different from the main even with the same seed)
  2. update the tests!

@hlky
Copy link
Contributor Author

hlky commented Oct 10, 2024

  1. Do we want new slow tests like
    def test_stable_diffusion_lms(self):
    sd_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", safety_checker=None)
    sd_pipe.scheduler = LMSDiscreteScheduler.from_config(sd_pipe.scheduler.config)
    sd_pipe = sd_pipe.to(torch_device)
    sd_pipe.set_progress_bar_config(disable=None)
    inputs = self.get_inputs(torch_device)
    image = sd_pipe(**inputs).images
    image_slice = image[0, -3:, -3:, -1].flatten()
    assert image.shape == (1, 512, 512, 3)
    expected_slice = np.array([0.10542, 0.09620, 0.07332, 0.09015, 0.09382, 0.07597, 0.08496, 0.07806, 0.06455])
    assert np.abs(image_slice - expected_slice).max() < 3e-3

    or just review the output to check it's still reasonable?
  2. I've updated the values in KDPM2AncestralDiscreteSchedulerTest

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yiyixuxu yiyixuxu merged commit 9d06161 into huggingface:main Oct 15, 2024
15 checks passed
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
…`, `KDPM2Ancestral` (#9616)

* Slight performance improvement to Euler

* Slight performance improvement to EDMEuler

* Slight performance improvement to FlowMatchHeun

* Slight performance improvement to KDPM2Ancestral

* Update KDPM2AncestralDiscreteSchedulerTest

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants