Skip to content

Conversation

pcuenca
Copy link
Member

@pcuenca pcuenca commented Aug 23, 2022

TODO:

  • Demo Colab.

The idea here is simply to iterate when generating the latents and take note of the seeds that were used. If the user provides any seeds, use them, otherwise generate new ones. Either way, return them.

I think this is a good opportunity to discuss how we want to incorporate these extensions inside the library, depending on how popular we think it could happen. @patil-suraj wrote image-to-image just yesterday, for example. It feels like the natural way to provide an extension is via the pipeline interface. It was very easy to do it, but most of the code is boilerplate. I only had to do the following:

  • Add a new argument to __call__ (the previous seeds).
  • Change the way latents are generated (~10 lines of code).
  • Change the return dict.
  • Copy everything else.

I wonder if we should design Pipelines in a way that they are meant for subclassing, or if we could use mixins or something else. Having to repeat the same steps everywhere (even the deprecated torch_device snippet) feels fragile.

Another limitation I found is that this mechanism could be useful for other diffusion pipelines, not just Stable Diffusion. But if you use DiffusionPipeline as a superclass then you are forced to write __init__ to register the appropriate modules, tying your implementation to some specific combination anyway.

One alternative is to just do nothing and keep this as it is right now. This makes perfect sense if we don't expect many contributions in the form of new pipelines. The code is now easy to read if you focus on a specific pipeline and don't care what others look like. But it might be a problem if we end up having lots of pipelines with similar blocks of code.

Thoughts @patrickvonplaten @anton-l @patil-suraj ?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@patrickvonplaten
Copy link
Contributor

Hmm @pcuenca,

I think I'm not a big fan of having another pipeline just for this in the official pypi -> what do you think about the adding latents to __call__ function as explained here: #208 (comment) ?

Any downsides with this?

@pcuenca
Copy link
Member Author

pcuenca commented Aug 24, 2022

That's certainly another option. But I think there are two discussions here:

  • Using a custom pipeline vs extending the current one. It's hard for me to assess what's best for a given situation. We'll have in-painting and possibly pure image-to-image (no text) coming soon. Do we want to bake them in, or create them as extensions?
  • Passing the seeds vs the latents to __call__. I think the seeds is easier because if we use the latents then users need to know how to generate them themselves, or we have to provide a method to do it. What's the advantage of sending the latents? In my mind the use-case looks something like this:
    • Give me a set of random images, but let me know what to do to replicate them. I don't care about the initial state in this step.
    • Now repeat the generation with another prompt but from the same random states.

I understand, though, that having the generator is messy and we should maybe remove it if we go the seeds route (or ignore it if the user provides the latents).

In addition, but less importantly, this extension would be useful for other pipelines, not just Stable Diffusion. For example, I tested it with regular Latent Diffusion. Is there a good way to generalize it?

I'll write the alternative implementation to see what it looks like.

@patrickvonplaten
Copy link
Contributor

@pcuenca a couple of higher level thoughts:

  • a) Boiler plate code is not a problem IMO. According to the philosophy: https://github.com/huggingface/diffusers#philosophy we prefer readability over abstracted code and IMO especially for pipelines we should try to have them be stand-alone as much as possible. Ideally I'd like to have all functionality except the saving repeated in every pipeline so that the forward pass is very easy to read. Even things like should IMO have been copy-pasted into every pipeline so that the user can read the forward pass end to end. Long story short, I'm strongly against abstracted away any code of the forward pass
  • b) For the same model and the same task (text2image) I would stay with the same pipeline class. The use case of continuously generating images from possibly the same random seed, but a different prompt is no different IMO to our already existing image2text StableDiffusionPipeline. So here I'd be in favor of adapting the existing pipeline. In constrast image2image or image in-painting are different tasks to me so they deserve a new class IMO
  • c) Very much not a fan of passing a list of seeds to the pipeline if other pipelines work with generator. To me, passing the generator instead of the seed has two advantages: 1) For pipelines that have schedulers (such as DDPM) that generate noise at every iteration step, it's cleaner to pass the generator forward to each torch.randn(...) instead of having one seed in the beginning that is set with torch.manual_seed(...) 2) If a script has more random operations before using the pipeline, it's better to create one generator once in the beginning with torch.manual_seed(...) and then passing forward this generator instead of doing torch.manual_seed(0) twice, e.g.:
generator = torch.manual_seed(0)
# some random generation
pipe(generator=generator)

is IMO better than:

torch.manual_seed(0)
# some random generation
pipe(seed=0)

=> So to me that means we should not have both generator and seed and I prefer generator due to the reasons above. Thus I think the easiest would here to be just let the use pass the latents to so that the pipeline is fully deterministic.

E.g.:

latents = torch.randn((4, 64, 64, 3)
images = pipe(prompt="astronaut", latents=latents)
# analyse image
next_latents = latents[best_image_idx]
images = pipe(prompt="astronaut", latents)

This saves us a lot of pain with having both seeds and generator, lots of logic inside pipeline that we could avoid etc...Note that the pipeline is in src/diffusers so I'd like to limit logic that we put into the pipelines as much as possible in the beginning to limit possible future breaking changes

@patil-suraj what's your opinion here?
T

@pcuenca
Copy link
Member Author

pcuenca commented Aug 24, 2022

Hi @patrickvonplaten, thanks a lot for the thoughtful message!

Very much agree that readability is a crucial goal to pursue, and that we should include as little code as we can get away with inside the pipelines. Also agree that it doesn't make sense to pass both seeds and generator. My question was motivated by the consideration that we might need to create (or accept from the community) a few pipelines in these initial stages, and they might become harder to update the more we have (for example, removing the deprecated kwargs for the torch device will require visiting all the pipelines). But we'll cross that bridge when we get there, I'm now satisfied with these design philosophy points, and I do share them :) I'm really happy that we care about readability, as it tends to get forgotten soon.

It also makes total sense to have one pipeline per task, but I wasn't sure whether this feature should be in the core pipeline or be provided as an example.

If everybody agrees I think we can close this PR and I'll try the other approach instead. Thanks!

@patil-suraj
Copy link
Contributor

patil-suraj commented Aug 24, 2022

Hey @pcuenca , thanks a lot for the PR, think this is an very interesting use-case.

@patrickvonplaten explained most the things here and echo those.

  • Boilerplate shouldn't be a big problem IMO. Having everything in one place makes things hackable for users. For example users could just take the file and tweak it however they want according to their needs. But if we add abstractions users will have to jump between multiple files to understand it which adds more cognitive load. Sequential code is the easiest to digest. Also with abstraction users will have to tweak things at multiple places if they want something custom. IMO it's hard to provide clean abstract code that can handle many use-cases and I'm sure the community is going to come up with many more tricks. So it's better IMO to let them write code. Aslo to better deal with boilerplate code we could add some tooling that can generate the initial boilerplate code. We have this in transformers for models called transformers-cli add-new-model-like. https://huggingface.co/docs/transformers/add_new_model#514-port-brandnewbert-to-transformers

  • Agree with Patrick here. passing latents should be fine. When we share examples we could just provide an utility function that can help users to create latents using seeds.

This will be a very impactful example, looking forward to it !

@pcuenca
Copy link
Member Author

pcuenca commented Aug 24, 2022

We have this in transformers for models called transformers-cli add-new-model-like.

Oh, very interesting, I didn't know that, thanks!

Thank you, @patil-suraj, I think I get it now. These conversations are very insightful to absorb the design goals you adopted :)

@pcuenca
Copy link
Member Author

pcuenca commented Aug 25, 2022

Replaced by #247.

@pcuenca pcuenca closed this Aug 25, 2022
@pcuenca pcuenca deleted the pipeline-sd-seeds branch August 25, 2022 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants