Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance Issue]: Takes a long time after a change in width and height to previous request #98

Closed
diagonalge opened this issue Jan 19, 2023 · 10 comments

Comments

@diagonalge
Copy link

Brief Description

I am using oneflow with stable diffusion. If I generate the results in 512x512, it can generate the result in 1 second. If I change the width and height, it will generate the next result in ~10 seconds. Then it will generate normally afterwards on the same dimensions. So, a change in width and height causes the model to slow down for the first inference on the new dimensions.

Device and Context

A100 40 Gb.

Benchmark

Normal inference: ~1 second
Inference after change in dimensions (for first time): ~10 seconds

Alternatives

No response

@shangguanshiyuan
Copy link

Changing the width or height causes a recompilation, and a new feature about this is working in process.

@lazy-nurd
Copy link

Hey that is great.
Can we get a bit of information about the new feature and what optimizations will it bring towards especially stable diffusion ?

@diagonalge
Copy link
Author

diagonalge commented Jan 21, 2023 via email

@diagonalge
Copy link
Author

@shangguanshiyuan Hi, is this issue fixed in the new update? if so, how to do that?

@shangguanshiyuan
Copy link

Thanks for your attention. This feature has not been released during the testing phase, which will significantly reduce compilation time for multi-shape.

@diagonalge
Copy link
Author

@shangguanshiyuan Great, thanks! any approximation on when it will be released to public?

@strint
Copy link
Collaborator

strint commented Feb 5, 2023

a change in width and height causes the model to slow down for the first inference on the new dimensions.
it can generate the result in 1 second. If I change the width and height, it will generate the next result in ~10 seconds.

We trace a static computation graph and optimize it to reduce the inference time. The static computation graph is also assuming static inputs shape to enable memory allocation ahead of time. So when finding an input with a new shape, it will trigger a graph compilation, which tasks a round 7 seconds.

We provide an offline compile mode to avoid online compilation costs when we already know all kinds of inputs shape. We can compile graphs for these shapes offline, and then load the compiled results online. Loading a graph only takes less than 1 second.

We also provide a shared graph feature to save more memory and compile time.

Here this the test for offline compile / shared graph: https://github.com/Oneflow-Inc/diffusers/blob/oneflow-fork/tests/test_pipelines_oneflow_graph_load.py

You need to update oneflow diffusers and oneflow to the most recent version.

python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]
cd diffusers

git checkout oneflow-fork

git pull origin oneflow-fork

How to load the compiled result of graph

Compile and save graph

  • Turn on pipe.enable_save_graph();
  • Call pipe to generate images, which will trigger compilation and cache the compilation results;
  • Call pipe.save_graph(graph_save_path) to save the graph, and the cached compilation results will be saved at this time; note that the graph_save_path folder needs to already exist;

The graph cached under the graph cache of the previous pipe is stored under graph_save_path;

Load the graph and use

  • Execute pipe.load_graph(graph_save_path, compile_unet=True, compile_vae=True), the previously saved cache will be restored

In this way, the previously saved graph is loaded into the graph cache of the pipe, and when the pipe is called for reasoning later, it will hit the cache of the graph, thus avoiding compilation;

Compile and share between graphs with different input shapes but the same parameters

Just turn on pipe.enable_graph_share_mem();

After opening, multiple graphs with different input shapes but the same parameters can be shared:

  • Compile pass optimization results;
  • Constant folded parameters;

This can save memory and compile time;

In addition, sorting the input shape from large to small to trigger graph compilation can make the memory-sharing effect of the activation part better and further reduce the memory.

@diagonalge
Copy link
Author

@strint Thanks a lot for your answer! I am trying your instructions and it gives me the following error while trying to save graph for stable diffusion:

AttributeError: 'VaeGraph' object has no attribute 'enable_save_runtime_state_dict'

@strint
Copy link
Collaborator

strint commented Feb 5, 2023

AttributeError: 'VaeGraph' object has no attribute 'enable_save_runtime_state_dict'

It's because oneflow has not been updated to the latest version.

You can use this to get the oneflow version:

python3 -m oneflow --doctor

To install the latest oneflow, install nightly:

  • Nightly
    python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]
    
  • All available [PLATFORM]:
    Platform CUDA Driver Version Supported GPUs
    cu117 >= 450.80.02 GTX 10xx, RTX 20xx, A100, RTX 30xx
    cu102 >= 440.33 GTX 10xx, RTX 20xx
    cpu N/A N/A

Here is the full update list

Update oneflow:

  • Nightly
    python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]
    
  • All available [PLATFORM]:
    Platform CUDA Driver Version Supported GPUs
    cu117 >= 450.80.02 GTX 10xx, RTX 20xx, A100, RTX 30xx
    cu102 >= 440.33 GTX 10xx, RTX 20xx
    cpu N/A N/A

Update transformers

Delete the local folder which contains the oneflow fork of transformers, directly use the official transformers

python3 -m pip install transformers>=4.26

Update diffusers

cd diffusers

git checkout oneflow-fork

git pull origin oneflow-fork

python3 -m pip install -e .[oneflow]

After updating oneflow/transformers/diffusers, you can run the test:

python3 diffusers/tests/test_pipelines_oneflow_graph_load.py

@diagonalge

@strint strint transferred this issue from Oneflow-Inc/oneflow Feb 10, 2023
@jackalcooper
Copy link
Collaborator

looks like it has been resolved, feel free to reopen if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants