-
Notifications
You must be signed in to change notification settings - Fork 439
Closed
Description
Hey, finally stable diffusion for ggml 😄
Did a test run
$ ./sd -t 8 -m ../models/v1-5-pruned-emaonly-ggml-model-q8_0.bin -p "alps, distant alms, small church, (cinematic:1.3), intricate details, (ArtStation:1.2), nikon dlsr, masterpiece, hyperreal"
[INFO] stable-diffusion.cpp:2189 - loading model from '../models/v1-5-pruned-emaonly-ggml-model-q8_0.bin'
[INFO] stable-diffusion.cpp:2214 - ftype: q8_0
[INFO] stable-diffusion.cpp:2259 - params ctx size = 1618.72 MB
[INFO] stable-diffusion.cpp:2399 - loading model from '../models/v1-5-pruned-emaonly-ggml-model-q8_0.bin' completed, taking 0.46s
[INFO] stable-diffusion.cpp:2477 - condition graph use 4.34MB of memory: static 1.41MB, dynamic = 2.93MB
[INFO] stable-diffusion.cpp:2477 - condition graph use 4.34MB of memory: static 1.41MB, dynamic = 2.93MB
[INFO] stable-diffusion.cpp:2822 - get_learned_condition completed, taking 0.16s
[INFO] stable-diffusion.cpp:2830 - start sampling
[INFO] stable-diffusion.cpp:2674 - step 1 sampling completed, taking 18.34s
[INFO] stable-diffusion.cpp:2674 - step 2 sampling completed, taking 18.24s
[INFO] stable-diffusion.cpp:2674 - step 3 sampling completed, taking 18.65s
[INFO] stable-diffusion.cpp:2674 - step 4 sampling completed, taking 18.41s
[INFO] stable-diffusion.cpp:2674 - step 5 sampling completed, taking 18.31s
[INFO] stable-diffusion.cpp:2674 - step 6 sampling completed, taking 18.18s
[INFO] stable-diffusion.cpp:2674 - step 7 sampling completed, taking 18.21s
[INFO] stable-diffusion.cpp:2674 - step 8 sampling completed, taking 18.29s
[INFO] stable-diffusion.cpp:2674 - step 9 sampling completed, taking 18.21s
[INFO] stable-diffusion.cpp:2674 - step 10 sampling completed, taking 18.28s
[INFO] stable-diffusion.cpp:2674 - step 11 sampling completed, taking 18.19s
[INFO] stable-diffusion.cpp:2674 - step 12 sampling completed, taking 18.00s
[INFO] stable-diffusion.cpp:2674 - step 13 sampling completed, taking 18.03s
[INFO] stable-diffusion.cpp:2674 - step 14 sampling completed, taking 18.54s
[INFO] stable-diffusion.cpp:2674 - step 15 sampling completed, taking 18.32s
[INFO] stable-diffusion.cpp:2674 - step 16 sampling completed, taking 18.41s
[INFO] stable-diffusion.cpp:2674 - step 17 sampling completed, taking 18.29s
[INFO] stable-diffusion.cpp:2674 - step 18 sampling completed, taking 18.51s
[INFO] stable-diffusion.cpp:2674 - step 19 sampling completed, taking 18.62s
[INFO] stable-diffusion.cpp:2674 - step 20 sampling completed, taking 18.11s
[INFO] stable-diffusion.cpp:2686 - diffusion graph use 623.74MB of memory: static 69.53MB, dynamic = 554.21MB
[INFO] stable-diffusion.cpp:2835 - sampling completed, taking 366.14s
[INFO] stable-diffusion.cpp:2766 - vae graph use 2177.12MB of memory: static 1153.12MB, dynamic = 1024.00MB
[INFO] stable-diffusion.cpp:2842 - decode_first_stage completed, taking 57.66s
[INFO] stable-diffusion.cpp:2843 - txt2img completed in 423.96s, with a runtime memory usage of 2177.12MB and parameter memory usage of 1618.58MB
save result image to 'output.png'
Painpoint: the extra python libs for conversion. Got a pip install error bc i have an incompatible version of something installed already, convert.py worked anyway though. :)
Timings: I used the q8_0 quantization and ran with different thread counts:
I have a 12core(24threads) cpu.
I took the timing of a sampling step.
| quant | q8_0 | q4_0 | f16 |
|---|---|---|---|
| -t 1 | 75.31s | 75.20s | 82.92s |
| -t 2 | 42.44s | ||
| -t 4 | 28.65s | 29.23s | 30.00s |
| -t 6 | 21.68s | ||
| -t 8 | 18.34s | 18.89s | 19.05s |
| -t 10 | 16.38s | 16.78s | 17.61s |
| -t 12 | 16.26s | 16.98s | 18.11s |
| -t 14 | 17.93s | ||
| -t 16 | 16.80s | ||
| -t 18 | 16.70s | ||
| -t 20 | 16.20s | ||
| -t 22 | 16.96s | ||
| -t 24 | 18.93s |
Additional questions:
- do you have/plan to support token weighing? ( eg:
(cinematic:1.3)) - are you looking into supporting cuda/opencl backends from ggml?
- are you looking into k-quants (like llama.cpp) and some form of quality mesurement of quantizations? (since k-quants use different quant for different parts of the model)
- it would be nice if the tool printed the "system line" (see https://github.com/ggerganov/llama.cpp/blob/f64d44a9b9581cd58f7ec40f4fa1c3ca5ca18e1e/llama.cpp#L4267 )
- did not see it mentioned, does it support sd 2.x / do you plan to add support for that ?
- my little benchmark suggests the bottleneck is not the model file, but the dynamic data. What number type do you use for it, llama.cpp has shown little to no degradation in quality when using f16 instead of f32 for the kv-cache.
edit: added f16 timings
x-legion, codehappy-net and SA-j00ux-legion and JohnClawx-legion
Metadata
Metadata
Assignees
Labels
No labels
