Support of fp16 of LCM dreamshaper #9

Amin456789 · 2023-11-08T20:35:42Z

Hi!

i just downloaded this fp16 model from here:
https://huggingface.co/aislamov/lcm-dreamshaper-v7-onnx/tree/main

it loads very fast and good but when i push on generate it stops immediately, i mean the model stays loaded but it wont generate anything, could u take a look at it @saddam213 @dakenf please? i am using cpu, so i don't know if its a gpu optimizd model or not

jdluzen · 2023-11-09T01:45:15Z

Looks like it doesn't like the unet's timestep. The fp16's is a float, the original is a long.

jdluzen · 2023-11-09T01:52:48Z

Yeah that was it: LatentConsistencyDiffuser.cs:198.

saddam213 · 2023-11-09T02:10:19Z

That should be easy enough to support, let me see if I can squeeze it into tomorrows release

jdluzen · 2023-11-09T02:11:37Z

@saddam213 I've been trying to get a PR going, but I don't have access to the IOnnxModel in DiffuseAsync for _onnxModelService.GetInputMetadata. Is that available and I'm just not seeing it? Or will I have to edit OnnxModelService?

saddam213 · 2023-11-09T02:37:07Z

Sorry I missed your PR and already commited a fix 38f60b6

GetInputMetadata is accessible and worked perfect, our implementations were pretty much the same

Thanks for the PR

saddam213 · 2023-11-09T02:48:15Z

Latest commit will fix immediate issue for both pipelines, added the functionality to both diffuser base classes but I think implementation should be moved to a shared place as new pipelines will also need this I would assume.

Perhaps we need a static helper class for methods like these, as DecodeLatents is the same across both as well

Amin456789 · 2023-11-09T07:11:48Z

uh nice, thanks guys! cant wait for the update to test it out

dakenf · 2023-11-09T07:55:08Z

Hello everyone! If you don't mind, i'll give you some tips on model conversion based on this doc

Long story short, if you run fusion optimizer on the model, it will combine many ops into one. So from 3k+ ops it will get to 1k+. That will lead to VRAM/RAM decrease (less GPU buffers allocated for each node input/output) and performance optimizations, since CUDA and DML have fused attention kernels

I've been using this script, it already has optimized settings for DML https://github.com/Amblyopius/Stable-Diffusion-ONNX-FP16/blob/main/conv_sd_to_onnx.py but with some changes.

Last 4 lines (disable BiasAdd, BiasSplitGelu, packed KV and QKV) are required if you want the model to work on CPU. Bias* kernels are not implemented for CPU in ONNX and packed KV/QKV for MultiHeadAttention are not supported on CPU too

With these optimizations and fp16 you should be able to run unet with less than 5gb VRAM. You can check results with this model i've converted for WebGPU https://huggingface.co/aislamov/stable-diffusion-2-1-base-onnx/tree/main

But if you want maximum performance, you can create two revisions of the model on huggingface. One with max GPU optimizations and another for CPU

Feel free to ask me any questions if you have!

Amin456789 · 2023-11-09T08:02:20Z

Hello everyone! If you don't mind, i'll give you some tips on model conversion based on this doc

Long story short, if you run fusion optimizer on the model, it will combine many ops into one. So from 3k+ ops it will get to 1k+. That will lead to VRAM/RAM decrease (less GPU buffers allocated for each node input/output) and performance optimizations, since CUDA and DML have fused attention kernels

I've been using this script, it already has optimized settings for DML https://github.com/Amblyopius/Stable-Diffusion-ONNX-FP16/blob/main/conv_sd_to_onnx.py but with some changes. Last 4 lines (disable BiasAdd, BiasSplitGelu, packed KV and QKV) are required if you want the model to work on CPU. Bias* kernels are not implemented for CPU in ONNX and packed KV/QKV for MultiHeadAttention are not supported on CPU too

With these optimizations and fp16 you should be able to run unet with less than 5gb VRAM. You can check results with this model i've converted for WebGPU https://huggingface.co/aislamov/stable-diffusion-2-1-base-onnx/tree/main

Feel free to ask me any questions if you have!

hi! thank u so much for sharing this, sadly i have no idea how to code to do it myself, could u please make some fp16 models for cpu too? lyriel v16, deliberate v2 or v3, epiCRealism are a few good ones, any of them is good, i would like to use and test them out in onnxstack if possible, thanks
https://huggingface.co/nyxia/lyriel16/tree/main
or
https://civitai.com/models/22922/lyriel
https://civitai.com/models/25694/epicrealism
https://huggingface.co/stablediffusionapi/deliberate-v3/tree/main

Also, i assume this lcm model is only for gpu only? could u please make a cpu optimized too? but i will test tomorrow for cpu this one either way to see how it goes!

Amin456789 · 2023-11-10T08:20:15Z

LCM fp16 now works very good and it is so fast! but i have no idea what is going on as i used directml and set the device to 0 for unet and the rest on 1 so i think it uses my AMD and Intel gpus [in task manager my intel graphic goes 99% usage so it is mostly this gpu] not the cpu this time,

i close it this topic if it is ok now

jdluzen mentioned this issue Nov 9, 2023

Modify timestep tensor type to support LCM fp16 #10

Closed

Amin456789 closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of fp16 of LCM dreamshaper #9

Support of fp16 of LCM dreamshaper #9

Amin456789 commented Nov 8, 2023 •

edited

jdluzen commented Nov 9, 2023

jdluzen commented Nov 9, 2023 •

edited

saddam213 commented Nov 9, 2023

jdluzen commented Nov 9, 2023 •

edited

saddam213 commented Nov 9, 2023 •

edited

saddam213 commented Nov 9, 2023 •

edited

Amin456789 commented Nov 9, 2023

dakenf commented Nov 9, 2023 •

edited

Amin456789 commented Nov 9, 2023 •

edited

Amin456789 commented Nov 10, 2023 •

edited

Support of fp16 of LCM dreamshaper #9

Support of fp16 of LCM dreamshaper #9

Comments

Amin456789 commented Nov 8, 2023 • edited

jdluzen commented Nov 9, 2023

jdluzen commented Nov 9, 2023 • edited

saddam213 commented Nov 9, 2023

jdluzen commented Nov 9, 2023 • edited

saddam213 commented Nov 9, 2023 • edited

saddam213 commented Nov 9, 2023 • edited

Amin456789 commented Nov 9, 2023

dakenf commented Nov 9, 2023 • edited

Amin456789 commented Nov 9, 2023 • edited

Amin456789 commented Nov 10, 2023 • edited

Amin456789 commented Nov 8, 2023 •

edited

jdluzen commented Nov 9, 2023 •

edited

jdluzen commented Nov 9, 2023 •

edited

saddam213 commented Nov 9, 2023 •

edited

saddam213 commented Nov 9, 2023 •

edited

dakenf commented Nov 9, 2023 •

edited

Amin456789 commented Nov 9, 2023 •

edited

Amin456789 commented Nov 10, 2023 •

edited