Tokio runtime panicking due to llama_cpp::LlamaSession::context_size
using block_on
#88
Labels
bug
Something isn't working
Hi maintainers, I am not entirely sure if this is edgen's, llama_cpp-rs or my problem so I apologize in advance if I miss something obvious here.
I cloned from
main
(936a45afedbb208a177038c7379341a52b911786
) to build from source, then did a releaserun
toserve
. Axum started listening correctly, and everything looks good:However if I am to ping the
v1/chat/completions
endpoint as per the example, then the following panic occurs:The relevant stack backtraces are:
This occurs with or without
--features llama_cuda
.This does not occur, however, if I checkout
v0.1.2
instead, which produces a token-by-token output:And on it goes that generates the sentence "Hello! How can I assist you today?" which I assume is the expected behaviour.
Looking at the exception it seems like it came from the fact that
llama_cpp::LlamaSession::context_size
internally started callingblock_on
last week, which the existingtokio::Runtime
didn't appreciate.Is this a bug, and if not, can anyone point me to the right direction here? Many thanks in advance.
System
Debian GNU/Linux 11 (bullseye) x86_64
rustc 1.75.0-beta.3 (b66b7951b 2023-11-20) as per
rust-toolchain.toml
(same happens to 1.76 stable anyway)
The text was updated successfully, but these errors were encountered: