Skip to content

Conversation

@sayakpaul
Copy link
Member

No description provided.

@sayakpaul sayakpaul requested a review from cbensimon September 5, 2025 08:21
@cbensimon
Copy link
Contributor

@sayakpaul after re-thinking about regional compilation, I think that the current process is still a bit too complex to be included in the blogpost. I think that simplifying this process at library level (either in spaces package or diffusers) by leveraging ModelMixin._repeated_blocks might be worth it.

@sayakpaul
Copy link
Member Author

@cbensimon good point.

However, I think since the post is the only go-to resource for the devs (building on ZeroGPU) out there, it's nice to include the regional compilation section. Once we have an API in spaces or anywhere else, we can simply swap it back.

Regarding using ModelMixin._repeated_blocks, I think it will only work for diffusers models. But our solutions are generic. So, exposing an API from spaces in an agnostic manner makes more sense. WDYT?

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably okay to go ahead and merge this as-is and then you can refine as you abstract away the complexities a bit more

@sayakpaul
Copy link
Member Author

Yeah pretty much. Things are already in progress, so should be just a few days once we swap out things from here.

So, waiting for Charles to hear what he thinks.

- [LTX Video](https://huggingface.co/spaces/zerogpu-aoti/ltx-dev-fast)

### Regional compilation
- [Regional compilation recipe](https://docs.pytorch.org/tutorials/recipes/regional_compilation.html)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought that it was your recent tutorial on regional AoT. Still nice to include this one though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about to be merged: pytorch/tutorials#3543

@cbensimon
Copy link
Contributor

Approved. Only TODO link left @sayakpaul (link to the push and re-use collection)

Co-authored-by: Charles <charles@huggingface.co>
@sayakpaul
Copy link
Member Author

Will merge after updating the link.

zerogpu-aoti.md Outdated

In our example, we can compile the repeated blocks of the Flux transformer ahead of time like so. The [Flux Transformer](https://github.com/huggingface/diffusers/blob/c2e5ece08bf22d249c62e964f91bc326cf9e3759/src/diffusers/models/transformers/transformer_flux.py) has two kinds of repeated blocks: `FluxTransformerBlock` and `FluxSingleTransformerBlock`.

You can check out [this Space](https://huggingface.co/spaces/zerogpu-aoti/Qwen-Image-Edit-AoT-Regional) for a complete example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was clarifying to me, rather than the demo space itself. Perhaps we could link to both and use the code to illustrate the explanations.

However, I only see pipeline.transformer.transformer_blocks[0] being compiled, whereas we mentioned two different kinds of repeated blocks in the description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writing demonstrates with Flux. The demo uses Qwen which has a single block. I have changed the link to Flux from @cbensimon. But just a link to the demo is fine, IMO.

Comment on lines 360 to 370
### Use a compiled graph from the Hub

Once a model (or even a model block) is compiled ahead of time, we can serialize the compiled graph module
as an artifact and reuse later. In the context of a ZeroGPU-powered demo on Spaces, this will significantly
cut down the demo startup time.

To keep the storage light, we can just save the compiled model graph without including any model parameters
inside the artifact.

Check out [this collection](TODO) that shows a full workflow of obtaining compiled model graph, pushing it
to the Hub, and then using it to build a demo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this section. What are the benefits of persisting the serialization vs the code demonstrated in the previous example? Also, the collection is missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the collection is missing.

#3057 (comment)

I don't understand this section. What are the benefits of persisting the serialization vs the code demonstrated in the previous example?

We skip the compilation time reusing a compiled graph.

sayakpaul and others added 4 commits September 11, 2025 07:58
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
@sayakpaul sayakpaul merged commit 3085f9f into main Sep 11, 2025
@sayakpaul sayakpaul deleted the regional-aot branch September 11, 2025 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants