New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

add PAG support for SD Controlnet Img2Img #8810

Closed

Bhavay-2001 wants to merge 10 commits into huggingface:main from Bhavay-2001:StableDiffusionControlNetPAGImg2ImgPipeline

Contributor

Bhavay-2001 commented Jul 8, 2024

What does this PR do?

Part of #8710

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Tagging @yiyixuxu

Bhavay-2001 and others added 7 commits

April 5, 2024 15:35


          Create diffusers.yml

11a4491


          Merge branch 'huggingface:main' into main


          Merge branch 'huggingface:main' into main

eadf7e8


          Merge branch 'huggingface:main' into main

f15e6de


          Added StableDiffusionControlNetPAGImg2ImgPipeline

0d41315


          Added StableDiffusionControlNetPAGImg2ImgPipeline

ba28dcf


          Delete diffusers.yml

2ee7257

Contributor Author

Bhavay-2001 commented Jul 8, 2024

Hi @yiyixuxu, please review this once. I am having some difficulty with the tests so please have a look at that.

yiyixuxu reviewed

View reviewed changes

tests/pipelines/pag/test_pag_controlnet_sd_img2img.py

+                      max_diff = np.abs(image_slice.flatten() - expected_slice).max()
+                      assert max_diff < 1e-3, f"output is different from expected, {image_slice.flatten()}"
+                  # def test_ip_adapter_single(self):

Collaborator

yiyixuxu Jul 8, 2024

we can remove these tests, no?

Contributor Author

Bhavay-2001 Jul 8, 2024

Yes, the commented test will be removed.

Collaborator

yiyixuxu Jul 8, 2024

can you run make style and make fix-copies so the quality tests would pass? currently failing

yiyixuxu requested a review from a-r-r-o-w

July 8, 2024 17:21

HuggingFaceDocBuilderDev commented Jul 8, 2024

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w requested changes

View reviewed changes

Member

a-r-r-o-w left a comment •

edited

Loading

Thank you for adding support for this! It seems like a lot of the perturbed attention guidance part is not handled yet and this is a raw copy ControlNetSDImg2Img, no? Let me try and help you with the required changes that need to be made:

Refer to add PAG support #7944 and add PAG support for SD architecture #8725
Try and understand what happens in pag_utils.py file.
Take a look at one of the existing PAG pipeline implementations. Notice that at various locations in the code, we check the self.do_perturbed_attention_guidance flag and handle things differently from normal CFG. You will have to apply these changes as well. It might be a little tricky to do with controlnet since you also need to take care of control_model_input. The easiest way to see all the differences would be to view the diff of non-PAG and PAG variants (for example, StableDiffusionXLPipeline and StableDiffusionXLPAGPipeline) side-by-side
Once you're comfortable and have made all the required changes, try and run through all the different scenarios such as guess_mode true and false, with guidance_scale == 1 and guidance_scale > 1, pag_scale == 0 and pag_scale > 0, etc.

If you need any additional help, feel free to ping me any time.

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py

		raise AttributeError("Could not access latents of provided encoder_output")


		def prepare_image(image):

Member

a-r-r-o-w Jul 8, 2024

Could you add the missing # Copied from here?

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py Outdated

+                      self.register_to_config(requires_safety_checker=requires_safety_checker)
+                      self.set_pag_applied_layers(pag_applied_layers)
+                  # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline._encode_prompt

Member

a-r-r-o-w Jul 8, 2024

I don't think this method is being used anywhere and so can be removed.

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py

+                          extra_step_kwargs["generator"] = generator
+                      return extra_step_kwargs
+                  def check_inputs(

Member

a-r-r-o-w Jul 8, 2024

Please add the missing # Copied from here as well

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py Outdated

+                              "not-safe-for-work" (nsfw) content.
+                      """
+                      callback = kwargs.pop("callback", None)

Member

a-r-r-o-w Jul 8, 2024

These callbacks have been deprecated. You can remove them

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py Outdated

+                      callback_on_step_end_tensor_inputs: List[str] = ["latents"],
+                      pag_scale: float = 3.0,
+                      pag_adaptive_scale: float = 0.0,
+                      **kwargs,

Member

a-r-r-o-w Jul 8, 2024

Suggested change

**kwargs,

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py Outdated

+                      else:
+                          assert False
+                      # 5. Prepare timesteps

Member

a-r-r-o-w Jul 8, 2024

You might have to fix the step numbering here

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py Outdated

+                      with self.progress_bar(total=num_inference_steps) as progress_bar:
+                          for i, t in enumerate(timesteps):
+                              # expand the latents if we are doing classifier free guidance
+                              latent_model_input = torch.cat([latents] * 2) if self.do_classifier_free_guidance else latents

Member

a-r-r-o-w Jul 8, 2024

Are you sure the forward pass is working? Shouldn't this be something like

diffusers/src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_xl.py

Line 1492 in 9838867

    
           latent_model_input = torch.cat([latents] * (prompt_embeds.shape[0] // latents.shape[0]))

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py Outdated

Comment on lines 1294 to 1296

+                                  if callback is not None and i % callback_steps == 0:
+                                      step_idx = i // getattr(self.scheduler, "order", 1)
+                                      callback(step_idx, t, latents)

Member

a-r-r-o-w Jul 8, 2024

Suggested change

      
                                if callback is not None and i % callback_steps == 0:
          
                                    step_idx = i // getattr(self.scheduler, "order", 1)
          
                                    callback(step_idx, t, latents)

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py Outdated

+                              )[0]
+                              # perform guidance
+                              if self.do_classifier_free_guidance:

Member

a-r-r-o-w Jul 8, 2024

Perturbed guidance part does not seem to have been implemented at the different places where it is supposed to be added.

src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py Outdated

+                          prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
+                      if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
+                          image_embeds = self.prepare_ip_adapter_image_embeds(

Member

a-r-r-o-w Jul 8, 2024

IP Adapter perturbed embeddings need to be generated differently. Please refer to one of the linked PRs.

Contributor Author

Bhavay-2001 Jul 11, 2024

Hi @a-r-r-o-w, are you referring to this part here?

Member

a-r-r-o-w Jul 11, 2024 •

edited

Loading

No, I'm referring to this: (lines 1156-1177)

diffusers/src/diffusers/pipelines/pag/pipeline_pag_sd_xl.py

Line 1156 in 673eb60

if ip_adapter_image is not None or ip_adapter_image_embeds is not None:

Member

a-r-r-o-w commented Jul 8, 2024

Also, it might help to change the title of this PR to something like "add PAG support for SD Controlnet Img2Img" to reflect the intent correctly when merged

Bhavay-2001 changed the title ~~add PAG support for SD architecture~~ add PAG support for SD Controlnet Img2Img


          Formatted the files

1d9c86c

Contributor Author

Bhavay-2001 commented Jul 12, 2024

Hi @a-r-r-o-w, I am trying to fix the coding mistakes and coding style by calling the make style and other similar commands but it says make command not found. I know there is something wrong but cannot figure out. Could you pls explain that once how its done?

Member

a-r-r-o-w commented Jul 12, 2024

You will need to install make. I'm assuming you are on Windows since make should be available by default on linux or mac. If you install Git for Windows, you will easily be able to use it. Otherwise, try https://stackoverflow.com/questions/32127524/how-to-install-and-use-make-in-windows

Bhavay-2001 added 2 commits

July 14, 2024 00:02


          Fixed style

3758ad6


          __init__ files

1a840d6

Contributor Author

Bhavay-2001 commented Jul 14, 2024

Hi @a-r-r-o-w, I am first completing the pag pipeline and not the associated tests. Please check this whenever you have time. Thanks

Member

a-r-r-o-w commented Jul 14, 2024

Can you show us some examples with and without PAG enabled? Also, please post the minimal reproducible example. And yes, we can work on tests later in a similar fashion to how others have done in their PRs.

Contributor Author

Bhavay-2001 commented Jul 14, 2024

Hi, I am shifting this PR here. Further communications will be done there. I am closing this one.
Thanks

Bhavay-2001 closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment