Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted Prompts for Diffusers stable diffusion pipeline #1506

Closed
UglyStupidHonest opened this issue Dec 1, 2022 · 36 comments
Closed

Weighted Prompts for Diffusers stable diffusion pipeline #1506

UglyStupidHonest opened this issue Dec 1, 2022 · 36 comments
Labels
stale Issues that haven't received updates

Comments

@UglyStupidHonest
Copy link

I could not find anything for diffusers and unfortunately I'm not on the Level yet where I can implement it myself. :)

It would be amazing to be able to weight prompts like "a dog with a hat:0.5"

Thank you for this amazing library !!

@WASasquatch
Copy link

WASasquatch commented Dec 1, 2022

This has unfortunately only been added as community pipeline, which imo, is a very broken system that just adds tons of work to the end-usage managing all these pipes, and not very API friendly.

https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py

With community pipelines, you get only what it advertises, and nothing else. It's not like the many other repos out there like AUTOMATIC where these things are more packages together for usage with all available features, creating a robust and feature rich system.

@github-actions
Copy link

github-actions bot commented Jan 1, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 1, 2023
@patrickvonplaten
Copy link
Contributor

For future readers:

For a direct use case, we have the following community pipeline: https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py

You can also define your own attention processor that weighs certain prompts differently by making use of this API:
#1639

@Ephil012
Copy link

Ephil012 commented Jan 8, 2023

@patrickvonplaten Are there any plans to integrate this into the main pipeline? As @WASasquatch said the community pipeline implementation is not very user friendly. It seems like it would be pretty useful to have it built in as a feature given how often prompt weighting is used in the community

@alexisrolland
Copy link
Contributor

Upvoting this as I think prompt weighting is indeed an important feature that should be added to diffusers to compete with other alternative solutions. All other alternatives support it (Stable Diffusion WebUI, DreamStudio, Midjourney...).

Thanks for your hard work! <3

@patrickvonplaten
Copy link
Contributor

cc @patil-suraj what do you think?

@patrickvonplaten
Copy link
Contributor

My opinion here is that diffusers doesn't aim at being a full-fledged UI , but rather a backend for UIs such as:

Nevertheless, we could/should try to more actively maintain: https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py and potentially write a documentation page about it.

Also @SkyTNT what do you think maybe :-)

@alexisrolland
Copy link
Contributor

alexisrolland commented Jan 13, 2023

How does supporting prompt weighting transform diffusers toward a UI? I think the kind of usage that would be expected here is to be able to use weights in a way similar to this and let the backend do it’s magic ;) :

pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
pipe = pipe.to("cuda")

prompt = "a photo of an ((astronaut)) riding a horse on mars"
# or
prompt = "a photo of an (astronaut:0.5) riding a horse on mars"
image = pipe(prompt).images[0]

@SkyTNT
Copy link
Contributor

SkyTNT commented Jan 14, 2023

I agree with @Ephil012 . But I'm busy recently, so I may not be able to contribute.

@WASasquatch
Copy link

WASasquatch commented Jan 14, 2023

What does a user interface have to do with back-end functionality?

@Ephil012
Copy link

Ephil012 commented Jan 15, 2023

@patrickvonplaten I'd argue that adding this feature does not lead to diffusers becoming a full fledged UI. This would simply be a feature on the backend when inputting prompts (like how alexisrolland mentioned).

You mentioned that the goal of diffusers is to act as a backend for projects providing a SD UI. However, by not implementing this feature it's arguably making it harder to use diffusers as a backend. When building a UI, most users expect there to be prompt weighting built in. By not having it in diffusers, it leads to each project having to build their own implementation. This causes duplicated work between projects and in general makes using diffusers harder. Personally, I started looking for other alternatives to diffusers to build my side project on top of simply because it was missing essential features like prompt weighting. I'd also argue other common features should be built in, such as long prompts (this may have already been added, not sure), but that's a discussion for another thread. Yes there are community pipelines that can be used, but it would make sense to have it in the main pipeline too for maintainability and reliability.

As far as implementation goes, I do think that some projects might not want to follow the A111 syntax. I think there could be a default syntax, which you could customize via code. Or you could take the approach imaginAIry does where they allow you to create a list of prompts and set weights in code (example below). Either approach would allow for using your own syntax

ImaginePrompt([
    WeightedPrompt("cat", weight=1),
    WeightedPrompt("dog", weight=1),
])

@keturn
Copy link
Contributor

keturn commented Jan 15, 2023

My opinion here is that diffusers doesn't aim at being a full-fledged UI , but rather a backend for UIs such as:

If you are going to refer people to the current InvokeAI code as an example of how to use diffusers as a backend, be warned that there are parts that are not pretty. 😆

This is definitely a place where we had to work around the StableDiffusionPipeline rather than with it. I see that _encode_prompt is its own method now, which at least allows the possibility of overriding it, but there are still a couple of reasons why Invoke had to work around it:

  • Under its current architecture, Invoke has already prepared the text embeddings by the time it's ready to do inference, and the pipeline doesn't have any method that takes that form of input.
  • The _encode_prompt method has the tokenization and encoding too entangled with the structure of the batch and the conditioned/unconditioned data.

You've already identified other use cases for exposing an API that takes text embeddings directly, such as #205 and #1869. It's also always easier to pass values to things than it is to subclass and override template methods, so factoring such a method out of the existing StableDiffusionPipeline.__call__ sounds like the way to go.

@damian0815
Copy link
Contributor

damian0815 commented Jan 15, 2023

I have a work in progress project of turning the prompt weighting code i built for InvokeAI into a library called Incite that would theoretically be able to plug in to any transformers-based system that takes a text string, tokenizes it, and then produces an embedding vector.

A simple way of providing painless weighting support would be for the stable diffusion pipeline to support conditioning vectors as alternative input to prompt strings. The process of doing weighted prompting would then look something like this:

pipeline = StableDiffusionPipeline.from_pretrained(...)
incite = Incite(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# weight of 'fluffy' is increased, weight of 'dark' is decreased
positive_conditioning_tensor = incite.build_conditioning_tensor(
    "a fluffy+++ cat playing with a ball in a dark-- forest"
) 
negative_conditioning_tensor = incite.build_conditioning_tensor(
    "ugly, poorly drawn, etc."
)

images = pipeline(positive_conditioning=positive_conditioning_tensor,
    negative_conditioning=negative_conditioning_tensor).images

This in itself is just a first step, however, - because being able to to alter prompts on the fly unlocks all sorts of other possibilities. Here's a more advanced design:

pipeline = StableDiffusionPipeline.from_pretrained(...)
incite = Incite(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# at 50% of the way through the diffusion process, replace the word "cat" with "dog"
prompt="a cat.swap(dog, start=0.5) playing with a ball in the forest" 
conditioning_scheduler = incite.build_conditioning_scheduler(
    positive_prompt=prompt, 
    negative_prompt=""
)

images = pipeline(conditioning_scheduler=conditioning_scheduler).images
# at the start of every diffusion step the pipeline queries the conditioning_scheduler 
# for positive and negative conditioning tensors to apply for that step

This unlocks the capability for, as one early reviewer, @raefu, put it, "a generalized macro language that ultimately creates conditioning vectors for every step of the image generation".

With such a flexible model it would be possible to do wild things like performing image comparison operations with the latent image vector part-way through the diffusion process and then programmatically altering the conditioning/prompt based on what has been partially diffused already. The possibilities are endless, and really quite exciting.

@patrickvonplaten
Copy link
Contributor

Opening a PR that allows text_embeddings to be passed via the __call__ method. This makes a lot of sense to me and is in line with #1869 .

@damian0815
Copy link
Contributor

damian0815 commented Jan 26, 2023

thanks @patrickvonplaten - with 0.12 and my prompt weighting library Compel (based on the InvokeAI weighting code) I can now do this to apply weights to different parts of the prompt:

from compel import Compel
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
embeds = compel.build_conditioning_tensor(prompt)
image = pipeline(prompt_embeds=embeds).images[0]

works great - thank you!

@patil-suraj
Copy link
Contributor

Very cool @damian0815 !

@UglyStupidHonest
Copy link
Author

So coool I need to try this !! Thank you!!

@alexisrolland
Copy link
Contributor

@damian0815 very cool!

What would be the syntax if we want to add weight to a group of words rather than just a single word?

Thanks!

@damian0815
Copy link
Contributor

damian0815 commented Jan 28, 2023

@damian0815 very cool!

What would be the syntax if we want to add weight to a group of words rather than just a single word?

Thanks!

you can put the (words you want to weight)++ in parentheses

this (also (supports)-- nesting)+

speech marks "also work"+ like this

@alexisrolland
Copy link
Contributor

Thanks @damian0815 ! Do you actually have the link of a documentation describing the different syntaxes? I am also wondering how to add different level of weights to different bags of words... is it just something like:

(this bag is heavy)+++ while (this bag is medium)+ and (this one is really light)---

?

@damian0815
Copy link
Contributor

that's right @alexisrolland . docs are linked on the readme but it's basically adapted from what i wrote for InvokeAI - https://invoke-ai.github.io/InvokeAI/features/PROMPTS/#prompt-syntax-features

@alexisrolland
Copy link
Contributor

@damian0815 If I may, I think it would be nice if your compel library supports the same syntax as SD WebUI since it is hugely popular. For example if it could accept () to increase weight and [] to decrease weight. See Doc here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#attentionemphasis

@damian0815
Copy link
Contributor

nope, not happening. the Auto111 syntax is rubbish

@alexisrolland
Copy link
Contributor

nope, not happening. the Auto111 syntax is rubbish

Ha ha ha as much as I agree with you, it's becoming the defacto standard 😀

I prefer your syntax too...

@damian0815
Copy link
Contributor

what i might consider adding is a converter that can convert auto syntax to invoke syntax. pull requests welcome :)

@alexisrolland
Copy link
Contributor

That would be fantastic... the best of both worlds ^^

@patrickvonplaten
Copy link
Contributor

BTW, another use case that should be somewhat easily enabled by this is long-weight prompting: #2136 (comment)

@Ephil012
Copy link

Ephil012 commented Feb 5, 2023

@patrickvonplaten I saw that the PR added the ability to pass embeddings in now. From my understanding, you still need to either write the prompt weighting code yourself or use a third party library (like compel). Do you know if there's any plans to add built in prompt weighting (similar to the LPW community pipeline) into one of the main Stable Diffusion pipelines? That way people don't have to use a third party code for this functionality.

@patil-suraj
Copy link
Contributor

Prompt weightin won't be included in the main pipeline in order to keep the pipeline simple so that users can easily follow and modify the pipeline on their own. The philosphy behind this is explained in this doc, we encourage users to give it a read :)

@WASasquatch
Copy link

WASasquatch commented Feb 12, 2023 via email

@alexisrolland
Copy link
Contributor

alexisrolland commented Feb 15, 2023

Hello @damian0815

I am trying to use your compel library to convert prompts / negative prompts into embeddings. It works like a charm with StableDiffusionPipeline but with StableDiffusionImg2ImgPipeline I get the error message below when calling the pipeline:

[...]
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)
prompt_embeds = compel.build_conditioning_tensor(payload.prompt) if payload.prompt else None
negative_prompt_embeds = compel.build_conditioning_tensor(payload.negative_prompt) if payload.negative_prompt else None
[...]
 pipeline(
        prompt_embeds=prompt_embeds,
        negative_prompt_embeds = negative_prompt_embeds,
        image=init_images,
        strength=payload.init_image_noise,
        num_inference_steps=payload.steps,
        guidance_scale=payload.guidance,
        num_images_per_prompt=payload.num_images,
        generator=generators
    )

Returns

ValueError("prompt has to be of type str or list but is <class 'NoneType'>")

I checked my prompt_embeds and it does contain data. Am I doing anyting wrong?
Thanks

@damian0815
Copy link
Contributor

hi @alexisrolland , check that you're on at least diffusers v0.12 . if that doesn't fix it, please post the full stack trace (on the compel github issues rather than here)

@alexisrolland
Copy link
Contributor

@damian0815 yes I'm on v0.12.1... actually I think that's more of a problem with StableDiffusionImg2ImgPipeline than Compel... but based on your answer I assume it should be working. I will fill in another bug report on diffusers instead of this thread. Thanks for the prompt answer.

@Ephil012
Copy link

Ephil012 commented Feb 16, 2023

@patil-suraj I read the doc you sent. It helped clarify a lot of things for me. Thanks!

However, the one concern I have is about community pipeline support. Some of these pipelines provide essential features to devs, but seem to be less well maintained than the main pipeline. As a result, it makes devs hesitant to build on top of them or diffusers in general. The same goes for third party libs.

Would it make sense to keep a simple main pipeline and then make some of the community pipelines part of the official pipelines list that are more actively maintained by huggingface? That way the philosophy of keeping stuff simple is adhered to, but it also provides devs with the features they need without worrying about if a community pipeline will be abandoned in the future. I know it involves a lot of commitment to support a new pipeline, but I figured I might as well ask in case. I feel like officially supporting this will attract more people to diffusers vs other libraries.

On an unrelated note, but should some of the stuff for compel be moved to another thread on that repo? It seems a lot of this thread has become a troubleshooting thread for a separate library. It might make sense to move the talk to compel's repo so that it's easier for people to find in the future while also keeping this thread more on topic.

@damian0815
Copy link
Contributor

damian0815 commented Feb 16, 2023

@Ephil012

should some of the stuff for compel be moved to another thread on that repo?

yeah that's probably my bad for not immediately redirecting people there. i'll be sure to do so in the future.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

9 participants