Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I use CLIPGuidedStableDiffusion based on stable-diffusion-2, the generated result is incredibly bad, all mosaic. #1422

Closed
ScottishFold007 opened this issue Nov 25, 2022 · 4 comments
Labels
bug Something isn't working stale Issues that haven't received updates

Comments

@ScottishFold007
Copy link

Describe the bug

When I use CLIPGuidedStableDiffusion based on stable-diffusion-2, the generated result is incredibly bad, all mosaic.

image

Reproduction

from diffusers import DiffusionPipeline
from transformers import CLIPFeatureExtractor, CLIPModel
import torch

model_id = r"C:\Users\admin\Desktop\stable_diffusion模型合辑\stabilityai_stable-diffusion-2"
feature_extractor = CLIPFeatureExtractor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
clip_model = CLIPModel.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16)

guided_pipeline = DiffusionPipeline.from_pretrained(
model_id,
custom_pipeline="clip_guided_stable_diffusion",
clip_model=clip_model,
feature_extractor=feature_extractor,
revision="fp16",
torch_dtype=torch.float16,
)
guided_pipeline.enable_attention_slicing()
guided_pipeline = guided_pipeline.to("cuda")

prompt = "fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece"

generator = torch.Generator(device="cuda").manual_seed(0)
images = []
for i in range(4):
image = guided_pipeline(
prompt,
num_inference_steps=50,
guidance_scale=7.5,
clip_guidance_scale=100,
num_cutouts=4,
use_cutouts=False,
generator=generator,
).images[0]
images.append(image)

save images locally

for i, img in enumerate(images):
img.save(f"./clip_guided_sd/image_{i}.png")

Logs

No response

System Info

diffusers==0.9.0.dev0

@ScottishFold007 ScottishFold007 added the bug Something isn't working label Nov 25, 2022
@akirchmeyer
Copy link

akirchmeyer commented Nov 28, 2022

Maybe related: #1429 (comment)
I used to get "mosaic" images similar to what you describe while training a dreambooth model on SD v2, and this solved it for me.

@ScottishFold007
Copy link
Author

Maybe related: #1429 (comment) I used to get "mosaic" images similar to what you describe while training a dreambooth model on SD v2, and this solved it for me.
I tried this solution, but found that the problem was not.Maybe:
` @torch.enable_grad()
def cond_fn(
self,
latents,
timestep,
index,
text_embeddings,
noise_pred_original,
text_embeddings_clip,
clip_guidance_scale,
num_cutouts,
use_cutouts=True,
):
latents = latents.detach().requires_grad_()

    if isinstance(self.scheduler, (LMSDiscreteScheduler, EulerDiscreteScheduler, EulerAncestralDiscreteScheduler)):
        sigma = self.scheduler.sigmas[index]
        # the model input needs to be scaled to match the continuous ODE formulation in K-LMS
        latent_model_input = latents / ((sigma**2 + 1) ** 0.5)
    else:
        latent_model_input = latents

    # predict the noise residual
    noise_pred = self.unet(latent_model_input, timestep, encoder_hidden_states=text_embeddings).sample

    if isinstance(self.scheduler, (PNDMScheduler, DDIMScheduler)):
        alpha_prod_t = self.scheduler.alphas_cumprod[timestep]
        beta_prod_t = 1 - alpha_prod_t
        # compute predicted original sample from predicted noise also called
        # "predicted x_0" of formula (12) from https://arxiv.org/pdf/2010.02502.pdf
        pred_original_sample = (latents - beta_prod_t ** (0.5) * noise_pred) / alpha_prod_t ** (0.5)

        fac = torch.sqrt(beta_prod_t)
        sample = pred_original_sample * (fac) + latents * (1 - fac)
    elif isinstance(self.scheduler,  (LMSDiscreteScheduler, EulerDiscreteScheduler, EulerAncestralDiscreteScheduler)):
        sigma = self.scheduler.sigmas[index]
        sample = latents - sigma * noise_pred
    else:
        raise ValueError(f"scheduler type {type(self.scheduler)} not supported")

    sample = 1 / 0.18215 * sample
    image = self.vae.decode(sample).sample
    image = (image / 2 + 0.5).clamp(0, 1)

    if use_cutouts:
        image = self.make_cutouts(image, num_cutouts)
    else:
        image = transforms.Resize(feature_extractor.size["shortest_edge"])(image)
    image = self.normalize(image).to(latents.dtype)

    image_embeddings_clip = self.clip_model.get_image_features(image)
    image_embeddings_clip = image_embeddings_clip / image_embeddings_clip.norm(p=2, dim=-1, keepdim=True)

    if use_cutouts:
        dists = spherical_dist_loss(image_embeddings_clip, text_embeddings_clip)
        dists = dists.view([num_cutouts, sample.shape[0], -1])
        loss = dists.sum(2).mean(0).sum() * clip_guidance_scale
    else:
        loss = spherical_dist_loss(image_embeddings_clip, text_embeddings_clip).mean() * clip_guidance_scale

    grads = -torch.autograd.grad(loss, latents)[0]

    if isinstance(self.scheduler, (LMSDiscreteScheduler, EulerAncestralDiscreteScheduler)):
        latents = latents.detach() + grads * (sigma**2)
        noise_pred = noise_pred_original
    else:
        noise_pred = noise_pred_original - torch.sqrt(beta_prod_t) * grads
    return noise_pred, latents   `

@patrickvonplaten
Copy link
Contributor

Hey @ScottishFold007,

I think we currently sadly don't have the time to look into this problem. We would love to review a PR for the clip guided pipeline though that would make it work with v2.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Dec 26, 2022
@github-actions github-actions bot closed this as completed Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

3 participants