Skip to content

Eval bug: Granite 4 template detection fails #16415

@pwilkin

Description

@pwilkin

Name and Version

ilintar@LinuksowaJaskinia:/mnt/win/k/models/unsloth/granite-4.0-h-small-GGUF$ llama-cli --version
load_backend: loaded BLAS backend from /devel/tools/llama.cpp/build/bin/libggml-blas.so
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (OpenBLAS)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
Device 1: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
load_backend: loaded CUDA backend from /devel/tools/llama.cpp/build/bin/libggml-cuda.so
register_backend: registered backend CUDA (2 devices)
register_device: registered device CUDA0 (NVIDIA GeForce RTX 3080)
register_device: registered device CUDA1 (NVIDIA GeForce RTX 5060 Ti)
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-icelake.so score: 0
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-skylakex.so score: 0
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so score: 64
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sse42.so score: 5
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sandybridge.so score: 21
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-alderlake.so score: 0
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-x64.so score: 1
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sapphirerapids.so score: 0
load_backend: loaded CPU backend from /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz)
version: 6539 (7ec2df6)
built with cc (Ubuntu 14.2.0-19ubuntu2) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CPU

Hardware

i7-9700K + GTX 3080 + GTX 5060 Ti

Models

granite-4.0-h-small-GGUF-Q8_0

Problem description & steps to reproduce

The Granite 4 Hybrid model doesn't seem to recognize its own chat template, leading to a crash later on when it tries to use tools. The template detection string in llama-chat.cpp is too specific, seems to only match one of the old templates for the Tiny model.

First Bad Commit

No response

Relevant log output

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
warning: 56	../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: No such file or directory
#0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
56	in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
#1  0x00007b8da2e9eb63 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=0, a6=0, nr=61) at ./nptl/cancellation.c:49
warning: 49	./nptl/cancellation.c: No such file or directory
#2  __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:75
75	in ./nptl/cancellation.c
#3  0x00007b8da2f1ae9f in __GI___wait4 (pid=<optimized out>, stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#4  0x00007b8da3546b71 in ggml_print_backtrace () at /devel/tools/llama.cpp/ggml/src/ggml.c:196
196	        waitpid(child_pid, NULL, 0);
#5  0x00007b8da355c393 in ggml_uncaught_exception () at /devel/tools/llama.cpp/ggml/src/ggml.cpp:9
9	    ggml_print_backtrace();
#6  0x00007b8da32c10aa in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007b8da32aaa9e in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007b8da32c1361 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00005886af1bde71 in nlohmann::json_abi_v3_12_0::detail::json_sax_dom_parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::parse_error<nlohmann::json_abi_v3_12_0::detail::parse_error> (this=0x7ffd12a16260, ex=...) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:8983
8983	            JSON_THROW(ex);
#10 0x00005886af1bd899 in nlohmann::json_abi_v3_12_0::detail::parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::sax_parse_internal<nlohmann::json_abi_v3_12_0::detail::json_sax_dom_parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > (this=0x7ffd12a16460, sax=0x7ffd12a16260) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:13324
13324	            return sax->parse_error(m_lexer.get_position(),
#11 0x00005886af19a2c9 in nlohmann::json_abi_v3_12_0::detail::parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::parse (this=0x7ffd12a16460, strict=true, result=...) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:12984
12984	            sax_parse_internal(&sdp);
#12 0x00005886af277ded in nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>::parse<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (i="\n{\"name\": \"create_text_file\", \"arguments\": {\"relative_path\":\"/devel/alt/random/isorpg/src/main.js\",\"content\":\"// main.js \\u2013 entry point for the Three.js Isometric RPG\\n\\n// Import core Three.js mo"..., cb=..., allow_exceptions=true, ignore_comments=false) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:24063
24063	        parser(detail::input_adapter(std::forward<InputType>(i)), std::move(cb), allow_exceptions, ignore_comments).parse(true, result); // cppcheck-suppress[accessMoved,accessForwarded]
#13 0x00005886af3d7be6 in common_json_parse (it=10 '\n', end=0 '\000', healing_marker="1910210050", out=...) at /devel/tools/llama.cpp/common/json-partial.cpp:245
245	            out.json = json::parse(str);
#14 0x00005886af3d19ff in common_chat_msg_parser::try_consume_json (this=0x7ffd12a16e80) at /devel/tools/llama.cpp/common/chat-parser.cpp:237
237	    if (!common_json_parse(it, end, healing_marker_, result)) {
#15 0x00005886af3d2ce3 in common_chat_msg_parser::try_consume_json_with_dumped_args (this=0x7ffd12a16e80, args_paths=std::vector of length 1, capacity 1 = {...}, content_paths=std::vector of length 0, capacity 0) at /devel/tools/llama.cpp/common/chat-parser.cpp:272
272	    auto partial = try_consume_json();
#16 0x00005886af2d8674 in common_chat_parse_hermes_2_pro (builder=...) at /devel/tools/llama.cpp/common/chat.cpp:2109
2109	            if (auto tool_call = builder.try_consume_json_with_dumped_args({{"arguments"}})) {
#17 0x00005886af2e05fc in common_chat_parse (builder=...) at /devel/tools/llama.cpp/common/chat.cpp:2686
2686	            common_chat_parse_hermes_2_pro(builder);
#18 0x00005886af2e083b in common_chat_parse (input="<tool_call>\n{\"name\": \"create_text_file\", \"arguments\": {\"relative_path\":\"/devel/alt/random/isorpg/src/main.js\",\"content\":\"// main.js \\u2013 entry point for the Three.js Isometric RPG\\n\\n// Import core "..., is_partial=true, syntax=...) at /devel/tools/llama.cpp/common/chat.cpp:2715
2715	        common_chat_parse(builder);
#19 0x00005886af13567c in server_slot::update_chat_msg (this=0x5886d39e9220, diffs=std::vector of length 0, capacity 0) at /devel/tools/llama.cpp/tools/server/server.cpp:1620
1620	            params.oaicompat_chat_syntax);
#20 0x00005886af142443 in server_context::send_partial_response (this=0x7ffd12a1a230, slot=..., tkn=..., is_progress=false) at /devel/tools/llama.cpp/tools/server/server.cpp:2776
2776	            slot.update_chat_msg(res->oaicompat_msg_diffs);
#21 0x00005886af140ff6 in server_context::process_token (this=0x7ffd12a1a230, result=..., slot=...) at /devel/tools/llama.cpp/tools/server/server.cpp:2568
2568	                send_partial_response(slot, result, false);
#22 0x00005886af1499e1 in server_context::update_slots (this=0x7ffd12a1a230) at /devel/tools/llama.cpp/tools/server/server.cpp:3922
3922	                if (!process_token(result, slot)) {
#23 0x00005886af0e5b13 in operator() (__closure=0x7ffd12a1b9a0) at /devel/tools/llama.cpp/tools/server/server.cpp:5384
5384	        ctx_server.update_slots();
#24 0x00005886af0f492a in std::__invoke_impl<void, main(int, char**)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/14/bits/invoke.h:61
61	    { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
#25 0x00005886af0f26f0 in std::__invoke_r<void, main(int, char**)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/14/bits/invoke.h:111
111	        std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
#26 0x00005886af0ee373 in std::_Function_handler<void(), main(int, char**)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/14/bits/std_function.h:290
290	        return std::__invoke_r<_Res>(*_Base::_M_get_pointer(__functor),
#27 0x00005886af14fdc4 in std::function<void()>::operator() (this=0x7ffd12a1b9a0) at /usr/include/c++/14/bits/std_function.h:591
591	        return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
#28 0x00005886af138f8d in server_queue::start_loop (this=0x7ffd12a1b880) at /devel/tools/llama.cpp/tools/server/server.cpp:1918
1918	            callback_update_slots();
#29 0x00005886af0e835e in main (argc=10, argv=0x7ffd12a1bc88) at /devel/tools/llama.cpp/tools/server/server.cpp:5411
5411	    ctx_server.queue_tasks.start_loop();

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions