Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inpainting for sdxl checkpoints? #9

Closed
adhaesitadimo1 opened this issue May 10, 2024 · 10 comments
Closed

Inpainting for sdxl checkpoints? #9

adhaesitadimo1 opened this issue May 10, 2024 · 10 comments

Comments

@adhaesitadimo1
Copy link

Hello, has anybody tried inpainting on sdxl checkpoints? Looks like there are some bugs in implementation. The size of add_time_embeds and add_tex_embeds doesn't match which causes size mismatch error in add_embed of unet. I investigated and the root of the problem is that prompt_embeds are interpolated when there is background and pooled embeds are not.
When I obfuscated first background dimension from pooled embeds or interpolated them the same way all generations worked but are having weird squiggly lines and too shadowy mask borders
image_720 (6)
image_720 (5)

image_720 (4)
It was the same with both my custom checkpoints and base sdxl. Have anybody encountered this? Any clue about how to fix this? Probably there are some masking and latents bugs on background

@ironjr
Copy link
Owner

ironjr commented May 10, 2024

Can you please provide how did you produced the results? Thanks!

@ironjr
Copy link
Owner

ironjr commented May 10, 2024

Specifically, is the model of StableMultiDiffusionPipelineSDXL or of StreamMultiDiffusionSDXL? I will check this out.

@adhaesitadimo1
Copy link
Author

Sure, here is the .ipynb I used
https://drive.google.com/file/d/18MtBdlOohfwgIlnT9AwqCyPySS4lDJux/view?usp=drive_link
StableMultiDiffusionSDXLPipeline was used. I made a couple of fixes in this class first to be able to use custom sdxl checkpoint
model_ckpt = 'drive/MyDrive/checkpoints/john_cena_last.ckpt' # Use the correct ckpt for your step setting! print(model_ckpt) #model_ckpt = "sdxl_lightning_8step_unet.safetensors" #unet = UNet2DConditionModel.from_config(model_key, subfolder='unet').to(self.device, self.dtype) #unet.load_state_dict(load_file(hf_hub_download(lightning_repo, model_ckpt), device=self.device)) #self.pipe = StableDiffusionXLPipeline.from_pretrained(model_key, unet=unet, torch_dtype=self.dtype, variant=variant).to(self.device) self.pipe = StableDiffusionXLPipeline.from_single_file(model_ckpt, torch_dtype=self.dtype, variant="fp16").to(self.device)
Then fp16 vae fix
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to(self.device) self.pipe.vae = vae self.vae = self.pipe.vae
And then the quick fix with pooled embeddings dimensions I described
# INTERPOLATION ba = pooled_prompt_embeds[0] fa = pooled_prompt_embeds[1] pooled_prompt_embeds = torch.lerp(ba, fa, s1) # BACKGROUND OBFUSCATION #pooled_prompt_embeds = pooled_prompt_embeds[1:,:] #print(pooled_prompt_embeds.shape)

@adhaesitadimo1
Copy link
Author

I think I will better pull request my revision so it's more convenient for you

@adhaesitadimo1
Copy link
Author

Forgot to mention there was a typo in bootstrap using never mentioned bg_latents variable, I deduced it's bg_latent from before

@ironjr
Copy link
Owner

ironjr commented May 10, 2024

Thanks for the detailed update! I will have a look.

@ironjr
Copy link
Owner

ironjr commented May 11, 2024

Thank you again for the report! I just updated StableMultiDiffusionSDXLPipeline to fix the error.
I also added notebooks/demo_inpaint_sdxl.ipynb for the dedicated usage guide.

@ironjr ironjr closed this as completed May 11, 2024
@adhaesitadimo1
Copy link
Author

Thanks mate!

@adhaesitadimo1
Copy link
Author

Hey, I also have one more question. Sometimes when using multiple masks one mask is left empty. Is it seed instability issue or problem with centering?
image

@ironjr
Copy link
Owner

ironjr commented May 13, 2024

Fundamentally, the main cause of the problem is shorter timesteps: reducing the timesteps from 50 to 5, the model has 10 times less 'chance' to correct the content creation.

Bootstrapping steps are for alleviating such issues. The recommended solutions for the problem is:

  1. Increase the bootstrapping_steps from 1 to 3.
  2. If 1 does not work, increase the number of timesteps from 5 to 8 (bootstrapping_steps=3 is recommended for timesteps 8.

Specifically, each of the bootstrapping stages do the following:

  • Bootstrapping: StableDiffusion model is dumb. If you designate two people in a scene even with a separate mask, frequently, the diffusion model develops only one person during the intermediate stage and the model feels happy about it, because each prompt (somewhat) agrees with the generated content (a person). The problem is more critical in earlier generation steps (~20%), when overall constitution of the image is formed. Bootstrapping is basically separating the generation process of each masked prompts. So, the more you bootstrap, the higher fidelity to the prompt-mask pairs. However, since each objects are developed without knowing each other, the overall consistency can be more easily broken. Therefore, the recommended bootstrapping steps are the first 20-50% of the timesteps.
  • Centering: StableDiffusion tends to generate prompt-related object at the center of the screen. If the mask is at the side of the frame, the output of the initial generation steps (1-2) that sketches the object in the scene is unnaturally cropped by the uncentered mask. The centering tries to resolve this issue by centering each prompt-designated objects at the center of the frame for the initial generation steps, so the objects are not cropped unwantedly.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants