[Performance Issue]: Takes a long time after a change in width and height to previous request #98

diagonalge · 2023-01-19T08:16:58Z

Brief Description

I am using oneflow with stable diffusion. If I generate the results in 512x512, it can generate the result in 1 second. If I change the width and height, it will generate the next result in ~10 seconds. Then it will generate normally afterwards on the same dimensions. So, a change in width and height causes the model to slow down for the first inference on the new dimensions.

Device and Context

A100 40 Gb.

Benchmark

Normal inference: ~1 second
Inference after change in dimensions (for first time): ~10 seconds

Alternatives

No response

shangguanshiyuan · 2023-01-21T07:50:20Z

Changing the width or height causes a recompilation, and a new feature about this is working in process.

lazy-nurd · 2023-01-21T08:49:47Z

Hey that is great.
Can we get a bit of information about the new feature and what optimizations will it bring towards especially stable diffusion ?

diagonalge · 2023-01-21T08:53:50Z

Looking forward! The time on same dimensions is unreal. Waiting for the same on different dimensions.

…

On Sat, 21 Jan 2023 at 1:49 PM, lazy-nurd ***@***.***> wrote: Hey that is great. Can we get a bit of information about the new feature and what optimizations will it bring towards especially stable diffusion ? — Reply to this email directly, view it on GitHub <https://github.com/Oneflow-Inc/diffusers/issues/98>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALEG6MWSQSGKTM7EH2RCBQLWTOPLLANCNFSM6AAAAAAUAAW7EA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

diagonalge · 2023-02-04T13:44:55Z

@shangguanshiyuan Hi, is this issue fixed in the new update? if so, how to do that?

shangguanshiyuan · 2023-02-04T14:09:34Z

Thanks for your attention. This feature has not been released during the testing phase, which will significantly reduce compilation time for multi-shape.

diagonalge · 2023-02-04T14:11:34Z

@shangguanshiyuan Great, thanks! any approximation on when it will be released to public?

strint · 2023-02-05T02:07:00Z

a change in width and height causes the model to slow down for the first inference on the new dimensions.
it can generate the result in 1 second. If I change the width and height, it will generate the next result in ~10 seconds.

We trace a static computation graph and optimize it to reduce the inference time. The static computation graph is also assuming static inputs shape to enable memory allocation ahead of time. So when finding an input with a new shape, it will trigger a graph compilation, which tasks a round 7 seconds.

We provide an offline compile mode to avoid online compilation costs when we already know all kinds of inputs shape. We can compile graphs for these shapes offline, and then load the compiled results online. Loading a graph only takes less than 1 second.

We also provide a shared graph feature to save more memory and compile time.

Here this the test for offline compile / shared graph: https://github.com/Oneflow-Inc/diffusers/blob/oneflow-fork/tests/test_pipelines_oneflow_graph_load.py

You need to update oneflow diffusers and oneflow to the most recent version.

oneflow, https://github.com/Oneflow-Inc/oneflow

python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]

diffusers, https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion

cd diffusers

git checkout oneflow-fork

git pull origin oneflow-fork

How to load the compiled result of graph

Compile and save graph

Turn on pipe.enable_save_graph();
Call pipe to generate images, which will trigger compilation and cache the compilation results;
Call pipe.save_graph(graph_save_path) to save the graph, and the cached compilation results will be saved at this time; note that the graph_save_path folder needs to already exist;

The graph cached under the graph cache of the previous pipe is stored under graph_save_path;

Load the graph and use

Execute pipe.load_graph(graph_save_path, compile_unet=True, compile_vae=True), the previously saved cache will be restored

In this way, the previously saved graph is loaded into the graph cache of the pipe, and when the pipe is called for reasoning later, it will hit the cache of the graph, thus avoiding compilation;

Compile and share between graphs with different input shapes but the same parameters

Just turn on pipe.enable_graph_share_mem();

After opening, multiple graphs with different input shapes but the same parameters can be shared:

Compile pass optimization results;
Constant folded parameters;

This can save memory and compile time;

In addition, sorting the input shape from large to small to trigger graph compilation can make the memory-sharing effect of the activation part better and further reduce the memory.

diagonalge · 2023-02-05T10:18:50Z

@strint Thanks a lot for your answer! I am trying your instructions and it gives me the following error while trying to save graph for stable diffusion:

AttributeError: 'VaeGraph' object has no attribute 'enable_save_runtime_state_dict'

strint · 2023-02-05T13:05:18Z

AttributeError: 'VaeGraph' object has no attribute 'enable_save_runtime_state_dict'

It's because oneflow has not been updated to the latest version.

You can use this to get the oneflow version:

python3 -m oneflow --doctor

To install the latest oneflow, install nightly:

Nightly

python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]

All available [PLATFORM]:

Platform CUDA Driver Version Supported GPUs

cu117 >= 450.80.02 GTX 10xx, RTX 20xx, A100, RTX 30xx

cu102 >= 440.33 GTX 10xx, RTX 20xx

cpu N/A N/A

Here is the full update list

Update oneflow:

Nightly

python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]

All available [PLATFORM]:

Platform CUDA Driver Version Supported GPUs

cu117 >= 450.80.02 GTX 10xx, RTX 20xx, A100, RTX 30xx

cu102 >= 440.33 GTX 10xx, RTX 20xx

cpu N/A N/A

Update transformers

Delete the local folder which contains the oneflow fork of transformers, directly use the official transformers

python3 -m pip install transformers>=4.26

Update diffusers

cd diffusers

git checkout oneflow-fork

git pull origin oneflow-fork

python3 -m pip install -e .[oneflow]

After updating oneflow/transformers/diffusers, you can run the test:

python3 diffusers/tests/test_pipelines_oneflow_graph_load.py

@diagonalge

jackalcooper · 2023-02-21T23:16:32Z

looks like it has been resolved, feel free to reopen if not.

strint mentioned this issue Feb 6, 2023

每次使用img2img时，unet都会重新编译 #75

Closed

strint transferred this issue from Oneflow-Inc/oneflow Feb 10, 2023

jackalcooper closed this as completed Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance Issue]: Takes a long time after a change in width and height to previous request #98

[Performance Issue]: Takes a long time after a change in width and height to previous request #98

diagonalge commented Jan 19, 2023

shangguanshiyuan commented Jan 21, 2023

lazy-nurd commented Jan 21, 2023

diagonalge commented Jan 21, 2023 via email

diagonalge commented Feb 4, 2023

shangguanshiyuan commented Feb 4, 2023

diagonalge commented Feb 4, 2023

strint commented Feb 5, 2023

diagonalge commented Feb 5, 2023

strint commented Feb 5, 2023 •

edited

Loading

jackalcooper commented Feb 21, 2023

[Performance Issue]: Takes a long time after a change in width and height to previous request #98

[Performance Issue]: Takes a long time after a change in width and height to previous request #98

Comments

diagonalge commented Jan 19, 2023

Brief Description

Device and Context

Benchmark

Alternatives

shangguanshiyuan commented Jan 21, 2023

lazy-nurd commented Jan 21, 2023

diagonalge commented Jan 21, 2023 via email

diagonalge commented Feb 4, 2023

shangguanshiyuan commented Feb 4, 2023

diagonalge commented Feb 4, 2023

strint commented Feb 5, 2023

How to load the compiled result of graph

Compile and save graph

Load the graph and use

Compile and share between graphs with different input shapes but the same parameters

diagonalge commented Feb 5, 2023

strint commented Feb 5, 2023 • edited Loading

Here is the full update list

Update oneflow:

Update transformers

Update diffusers

jackalcooper commented Feb 21, 2023

strint commented Feb 5, 2023 •

edited

Loading