-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance Issue]: Takes a long time after a change in width and height to previous request #98
Comments
Changing the width or height causes a recompilation, and a new feature about this is working in process. |
Hey that is great. |
Looking forward! The time on same dimensions is unreal. Waiting for the
same on different dimensions.
…On Sat, 21 Jan 2023 at 1:49 PM, lazy-nurd ***@***.***> wrote:
Hey that is great.
Can we get a bit of information about the new feature and what
optimizations will it bring towards especially stable diffusion ?
—
Reply to this email directly, view it on GitHub
<https://github.com/Oneflow-Inc/diffusers/issues/98>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALEG6MWSQSGKTM7EH2RCBQLWTOPLLANCNFSM6AAAAAAUAAW7EA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@shangguanshiyuan Hi, is this issue fixed in the new update? if so, how to do that? |
Thanks for your attention. This feature has not been released during the testing phase, which will significantly reduce compilation time for multi-shape. |
@shangguanshiyuan Great, thanks! any approximation on when it will be released to public? |
We trace a static computation graph and optimize it to reduce the inference time. The static computation graph is also assuming static inputs shape to enable memory allocation ahead of time. So when finding an input with a new shape, it will trigger a graph compilation, which tasks a round 7 seconds. We provide an offline compile mode to avoid online compilation costs when we already know all kinds of inputs shape. We can compile graphs for these shapes offline, and then load the compiled results online. Loading a graph only takes less than 1 second. We also provide a shared graph feature to save more memory and compile time. Here this the test for offline compile / shared graph: https://github.com/Oneflow-Inc/diffusers/blob/oneflow-fork/tests/test_pipelines_oneflow_graph_load.py You need to update oneflow diffusers and oneflow to the most recent version.
How to load the compiled result of graphCompile and save graph
The graph cached under the graph cache of the previous pipe is stored under graph_save_path; Load the graph and use
In this way, the previously saved graph is loaded into the graph cache of the pipe, and when the pipe is called for reasoning later, it will hit the cache of the graph, thus avoiding compilation; Compile and share between graphs with different input shapes but the same parametersJust turn on After opening, multiple graphs with different input shapes but the same parameters can be shared:
This can save memory and compile time; In addition, sorting the input shape from large to small to trigger graph compilation can make the memory-sharing effect of the activation part better and further reduce the memory. |
@strint Thanks a lot for your answer! I am trying your instructions and it gives me the following error while trying to save graph for stable diffusion: AttributeError: 'VaeGraph' object has no attribute 'enable_save_runtime_state_dict' |
It's because oneflow has not been updated to the latest version. You can use this to get the oneflow version:
To install the latest oneflow, install nightly:
Here is the full update listUpdate oneflow:
Update transformersDelete the local folder which contains the oneflow fork of transformers, directly use the official transformers
Update diffusers
After updating oneflow/transformers/diffusers, you can run the test:
|
looks like it has been resolved, feel free to reopen if not. |
Brief Description
I am using oneflow with stable diffusion. If I generate the results in 512x512, it can generate the result in 1 second. If I change the width and height, it will generate the next result in ~10 seconds. Then it will generate normally afterwards on the same dimensions. So, a change in width and height causes the model to slow down for the first inference on the new dimensions.
Device and Context
A100 40 Gb.
Benchmark
Normal inference: ~1 second
Inference after change in dimensions (for first time): ~10 seconds
Alternatives
No response
The text was updated successfully, but these errors were encountered: