-
Notifications
You must be signed in to change notification settings - Fork 431
Closed
Description
I'm getting errors when trying to run on HipBLAS/ROCm, I'm on Arch Linux.
Executing the same command on vulkan works without any issues.
Logs:
markus@bernard ~/code/stable-diffusion.cpp/build_rocm (master) $ ./bin/sd \
--diffusion-model ../models/unet/flux/flux1-schnell-Q5_K_S.gguf \
--vae ../models/vae/FluxVAE.safetensors \
--clip_l ../models/clip/clip_l.safetensors \
--t5xxl ../models/clip/t5xxl_fp8_e4m3fn.safetensors \
-p "a lovely cat holding a sign says 'flux.cpp'" \
--cfg-scale 1.0 \
--sampling-method euler \
-v --steps 4 --width 1024 --height 1024 --seed -1 --vae-tiling
Option:
n_threads: 12
mode: img_gen
model_path:
wtype: unspecified
clip_l_path: ../models/clip/clip_l.safetensors
clip_g_path:
t5xxl_path: ../models/clip/t5xxl_fp8_e4m3fn.safetensors
diffusion_model_path: ../models/unet/flux/flux1-schnell-Q5_K_S.gguf
vae_path: ../models/vae/FluxVAE.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
stacked_id_embed_dir:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
mask_img:
control_image:
ref_images_paths:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:false
diffusion flash attention:false
diffusion Conv2d direct:false
vae Conv2d direct:false
strength(control): 0.90
prompt: a lovely cat holding a sign says 'flux.cpp'
negative_prompt:
min_cfg: 1.00
cfg_scale: 1.00
img_cfg_scale: 1.00
slg_scale: 0.00
guidance: 3.50
eta: 0.00
clip_skip: -1
width: 1024
height: 1024
sample_method: euler
schedule: default
sample_steps: 4
strength(img2img): 0.75
rng: cuda
seed: 10791925
batch_count: 1
vae_tiling: true
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:136 - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7800 XT, gfx1101 (0x1101), VMM: no, Wave Size: 32
[INFO ] stable-diffusion.cpp:199 - loading diffusion model from '../models/unet/flux/flux1-schnell-Q5_K_S.gguf'
[INFO ] model.cpp:1010 - load ../models/unet/flux/flux1-schnell-Q5_K_S.gguf using gguf format
[DEBUG] model.cpp:1027 - init from '../models/unet/flux/flux1-schnell-Q5_K_S.gguf'
[INFO ] stable-diffusion.cpp:208 - loading clip_l from '../models/clip/clip_l.safetensors'
[INFO ] model.cpp:1013 - load ../models/clip/clip_l.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/clip/clip_l.safetensors'
[INFO ] stable-diffusion.cpp:224 - loading t5xxl from '../models/clip/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] model.cpp:1013 - load ../models/clip/t5xxl_fp8_e4m3fn.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/clip/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] stable-diffusion.cpp:231 - loading vae from '../models/vae/FluxVAE.safetensors'
[INFO ] model.cpp:1013 - load ../models/vae/FluxVAE.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/vae/FluxVAE.safetensors'
[INFO ] stable-diffusion.cpp:243 - Version: Flux
[INFO ] stable-diffusion.cpp:277 - Weight type: q5_K
[INFO ] stable-diffusion.cpp:278 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:279 - Diffusion model weight type: q5_K
[INFO ] stable-diffusion.cpp:280 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:282 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:323 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:326 - CLIP: Using CPU backend
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[INFO ] flux.hpp:1094 - Flux blocks: 19 double, 38 single
[INFO ] flux.hpp:1098 - Flux guidance is disabled (Schnell mode)
[DEBUG] ggml_extend.hpp:1241 - clip params backend buffer size = 307.44 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1241 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1241 - flux params backend buffer size = 7880.37 MB(VRAM) (776 tensors)
[DEBUG] ggml_extend.hpp:1241 - vae params backend buffer size = 94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:475 - loading weights
[DEBUG] model.cpp:1891 - loading tensors from ../models/unet/flux/flux1-schnell-Q5_K_S.gguf
|==================================================| 1435/1435 - 1088.77it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/clip/clip_l.safetensors
|==================================================| 1435/1435 - 29285.71it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/clip/t5xxl_fp8_e4m3fn.safetensors
|==================================================| 1435/1435 - 82.81it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/vae/FluxVAE.safetensors
|==================================================| 1435/1435 - 30531.91it/s
[INFO ] stable-diffusion.cpp:574 - total params memory size = 17366.15MB (VRAM 7974.94MB, RAM 9391.21MB): clip 9391.21MB(RAM), unet 7880.37MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:578 - loading model from '' completed, taking 18.76s
[INFO ] stable-diffusion.cpp:604 - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:664 - finished loaded file
[DEBUG] stable-diffusion.cpp:1903 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2033 - TXT2IMG
[DEBUG] stable-diffusion.cpp:1573 - prompt after extract and remove lora: "a lovely cat holding a sign says 'flux.cpp'"
[INFO ] stable-diffusion.cpp:754 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1578 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1060 - parse 'a lovely cat holding a sign says 'flux.cpp'' to [['a lovely cat holding a sign says 'flux.cpp'', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:398 - token length: 256
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1192 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1192 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1175 - computing condition graph completed, taking 4049 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 4050 ms
[INFO ] stable-diffusion.cpp:1735 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1784 - generating image: 1/1 - seed 10791925
[DEBUG] stable-diffusion.cpp:881 - Sample
[DEBUG] ggml_extend.hpp:1192 - flux compute buffer size: 2577.25 MB(VRAM)
ROCm error: invalid device function
current device: 0, in function ggml_cuda_op_mul_mat at /home/markus/code/stable-diffusion.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1706
hipGetLastError()
/home/markus/code/stable-diffusion.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:82: ROCm error
[New LWP 98019]
[New LWP 98018]
[New LWP 98017]
[New LWP 98016]
[New LWP 98015]
[New LWP 98014]
[New LWP 98013]
[New LWP 98012]
[New LWP 98011]
[New LWP 98010]
[New LWP 98009]
[New LWP 97961]
[New LWP 97929]
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007fe13849f042 in ?? () from /usr/lib/libc.so.6
#0 0x00007fe13849f042 in ?? () from /usr/lib/libc.so.6
#1 0x00007fe1384931ac in ?? () from /usr/lib/libc.so.6
#2 0x00007fe1384931f4 in ?? () from /usr/lib/libc.so.6
#3 0x00007fe138503dcf in wait4 () from /usr/lib/libc.so.6
#4 0x000055788c44e1c7 in ggml_print_backtrace ()
#5 0x000055788bf3d419 in ggml_abort ()
#6 0x000055788c197f52 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) ()
#7 0x000055788c1a506e in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void (*)(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t*), void (*)(float const*, int const*, void*, ggml_type, long, long, long, long, long, long, long, long, ihipStream_t*)) ()
#8 0x000055788c19f9da in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) ()
#9 0x000055788c19da28 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#10 0x000055788c465567 in ggml_backend_graph_compute ()
#11 0x000055788c00d8ca in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) ()
#12 0x000055788c067810 in Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >) ()
#13 0x000055788c05f0bf in FluxModel::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, int, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, float, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >) ()
#14 0x000055788c0ac443 in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, SDCondition, ggml_tensor*, float, sd_guidance_params_t, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}::operator()(ggml_tensor*, float, int) const ()
#15 0x000055788bfee7e6 in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, SDCondition, ggml_tensor*, float, sd_guidance_params_t, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, ggml_tensor*) ()
#16 0x000055788bfc996b in generate_image_internal(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, sd_guidance_params_t, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, ggml_tensor*, ggml_tensor*) ()
#17 0x000055788bfcd897 in generate_image ()
#18 0x000055788bf54aa0 in main ()
[Inferior 1 (process 97928) detached]
Metadata
Metadata
Assignees
Labels
No labels