Skip to content

Conversation

@thealmightygrant
Copy link
Contributor

@thealmightygrant thealmightygrant commented Nov 2, 2024

What does this PR do?

This is a partial fix just for the euler discrete scheduler that is used by default for the Stable Diffusion 3 pipeline for #3672. I am unsure how to do this as a generic fix, nor do I think it would go along with the HF diffusers philosophy doc if it was possible.

I've tested this with my local fastapi server, and it has no issues with generating images concurrently using the StableDiffusion3Pipeline.

Let me know if you like this change, and I can do some updates to some other schedulers to make them usable in servers that want to do concurrent generation of images.

Before submitting

@sayakpaul
Copy link
Member

I guess this is to make our schedulers thread-safe so that image generation can be performed in multi-threaded environments?

@sayakpaul sayakpaul requested a review from yiyixuxu November 3, 2024 01:34
@thealmightygrant
Copy link
Contributor Author

thealmightygrant commented Nov 3, 2024

Yes, but just for the FlowMatchEulerDiscreteScheduler to start. I was using this in a local deployment for just the company I work at, and I thought it would be good to push it upstream. Thanks for requesting a review properly for me 😅

@thealmightygrant thealmightygrant changed the title Add concurrent processing for flow match euler discrete scheduler. Make flow match euler discrete scheduler thread-safe. Nov 3, 2024
@vladmandic
Copy link
Contributor

i'm guessing you're sharing a model instance between threads which then causes all threads to access same scheduler instance?
if thats the case, wouldn't it be cleaner to instantiate new scheduler instance for each thread instead of patching scheduler to be thread-aware.

@thealmightygrant
Copy link
Contributor Author

thealmightygrant commented Nov 3, 2024

i'm guessing you're sharing a model instance between threads which then causes all threads to access same scheduler instance?

Yes. They are sharing a model instance. I believe to have a diffusers backed service there is no other reasonable way to do this because you are probably going to overload your GPU with more than one instance of something like sd-3.5-large.

wouldn't it be cleaner to instantiate new scheduler instance for each thread instead of patching scheduler to be thread-aware.

I'm not exactly sure that it is cleaner, but I was able to get this working without modifying the scheduler by using from_pipe. In the endpoint of my server:

loop = asyncio.get_event_loop()
scheduler = shared_pipeline.scheduler.from_config(shared_pipeline.scheduler.config)
pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline, scheduler=scheduler)
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = shared_generator()))

with the shared pipeline being defined globally to avoid loading the model multiple times into memory:

shared_pipeline = StableDiffusion3Pipeline.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16,
).to(device="cuda")

I'm fine if y'all think that is cleaner ¯\_(ツ)_/¯ I kind of like making the pipeline/scheduler more fault resistant by making the schedulers thread-safe. That seems like it results in simpler code for people that are more casual users of the library. Another option would be making the use of from_pipe a bit more clear with an example server? I could add that instead in a new PR if y'all would prefer that.

@vladmandic
Copy link
Contributor

Yes. They are sharing a model instance.

you want to share the loaded components, not full instance.
you dont need to load a model to create an instance, you can create as many instances of the pipeline as you want from an already loaded components - and then each instance can have its own scheduler.

@thealmightygrant
Copy link
Contributor Author

Can you link to an example of that or show what that would look like in code? I'm really not sure what you are referring to. Is it different than what I did above?

@vladmandic
Copy link
Contributor

vladmandic commented Nov 3, 2024

ah, just saw you actually used from_pipe to instantiate from shared pipeline - thats fine. sorry i missed it before writing my previous reply.
you can also directly instantiate a model class using all init params as needed.
e.g. pipe = StableDiffusion3Pipeline(vae=shared.vae, ...).

making the pipeline/scheduler more fault resistant by making the schedulers thread-safe

my worry is such approach can cause unnecessary gpu sync calls in cases where variable might be a tensor.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Nov 6, 2024

hi @thealmightygrant @vladmandic
thanks for the discussion here! very interesting!
to be honest, our pipelines are not designed to be able to run multi-thread, and we haven't looked into this! If you have a nice example of how to do this, we'd love to have it in the doc for sure!

@EvanSong77
Copy link

What does this PR do?

This is a partial fix just for the euler discrete scheduler that is used by default for the Stable Diffusion 3 pipeline for #3672. I am unsure how to do this as a generic fix, nor do I think it would go along with the HF diffusers philosophy doc if it was possible.

I've tested this with my local fastapi server, and it has no issues with generating images concurrently using the StableDiffusion3Pipeline.

Let me know if you like this change, and I can do some updates to some other schedulers to make them usable in servers that want to do concurrent generation of images.

Before submitting

Thanks for your help, it's very helpful to me!

@thealmightygrant thealmightygrant mentioned this pull request Nov 13, 2024
6 tasks
@thealmightygrant
Copy link
Contributor Author

Closing in favor of #9918

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants