Skip to content

ROCm broken in recent versions #823

@Stefan-Olt

Description

@Stefan-Olt

I just wanted to compile a more recent version of stable-diffusion.cpp with the ROCm backend (gfx1030), but while it compiles just fine (it needs PIC enabled, otherwise it won't link for me), I get these error at runtime. Latest ROCm installed (6.4.3):

ggml_cuda_compute_forward: GET_ROWS failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /[sourcedir]/stable-diffusion.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2522
  err
/[sourcedir]/stable-diffusion.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:87: ROCm error
[New LWP 110749]
[New LWP 110743]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fb7477107e3 in __GI___wait4 (pid=110751, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
Warnung: 30	../sysdeps/unix/sysv/linux/wait4.c: Datei oder Verzeichnis nicht gefunden
#0  0x00007fb7477107e3 in __GI___wait4 (pid=110751, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000055e8452d8cd6 in ggml_print_backtrace ()
#2  0x000055e8452d8f29 in ggml_abort ()
#3  0x000055e844f48252 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) ()
#4  0x000055e844f4f229 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#5  0x000055e8452f08ac in ggml_backend_graph_compute ()
#6  0x000055e844daf44c in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) ()
#7  0x000055e844daf0e3 in CLIPTextModelRunner::compute(int, ggml_tensor*, int, void*, unsigned long, bool, ggml_tensor**, ggml_context*) ()
#8  0x000055e844dec8cb in FrozenCLIPEmbedderWithCustomWords::get_learned_condition_common(ggml_context*, int, std::vector<int, std::allocator<int> >&, std::vector<float, std::allocator<float> >&, int, int, int, int, bool) ()
#9  0x000055e844deb7b1 in FrozenCLIPEmbedderWithCustomWords::get_learned_condition(ggml_context*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int, int, bool) ()
#10 0x000055e844d53d20 in generate_image_internal(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, sd_guidance_params_t, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, bool, ggml_tensor*, ggml_tensor*) ()
#11 0x000055e844d58df0 in generate_image ()
#12 0x000055e844ce2ed2 in main ()
[Inferior 1 (process 110742) detached]

My previous build from March this year works just fine. I also tried building for Vulkan, that works as well, but it is significantly slower (3.2s a round vs. 6.3s) and needs more VRAM (needs --vae-on-cpu or will crash otherwise).

Is it possible to restore ROCm support?

Best regards
Stefan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions