Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokio runtime panicking due to llama_cpp::LlamaSession::context_size using block_on #88

Closed
denwong47 opened this issue Feb 25, 2024 · 2 comments · Fixed by #89
Closed
Assignees
Labels
bug Something isn't working

Comments

@denwong47
Copy link

Hi maintainers, I am not entirely sure if this is edgen's, llama_cpp-rs or my problem so I apologize in advance if I miss something obvious here.

I cloned from main (936a45afedbb208a177038c7379341a52b911786) to build from source, then did a release run to serve. Axum started listening correctly, and everything looks good:

RUST_BACKTRACE=full CUDA_ROOT=/usr/include/_remapped cargo run --features llama_cuda --release -- serve -g -b http://my_host:54321

However if I am to ping the v1/chat/completions endpoint as per the example, then the following panic occurs:

thread 'tokio-runtime-worker' panicked at /home/my_username/.cargo/git/checkouts/llama_cpp-rs-b9d51cabb4b43824/1141010/crates/llama_cpp/src/lib.rs:887:27:
Cannot block the current thread from within a runtime. This happens because a function attempted to block the current thread while the thread is being used to drive asynchronous tasks.

The relevant stack backtraces are:

  ...
  18:     0x5593475ee443 - core::option::expect_failed::h0d6627132effeebe
                               at /rustc/b66b7951b9b4258fc433f2919e72598fbcc1816e/library/core/src/option.rs:1985:5
  19:     0x559347ff727d - tokio::future::block_on::block_on::h57b11492fc0329f5
  20:     0x559347ff2d21 - llama_cpp::LlamaSession::context_size::h7a5ce26c608b13f9
  21:     0x559347a3d8b4 - llama_cpp::LlamaSession::start_completing_with::h9d25e453a5e19b0a
  22:     0x5593479b3576 - <core::pin::Pin<P> as core::future::future::Future>::poll::hf7d715e681ad5cb3
  23:     0x559347a0585f - <F as axum::handler::Handler<(M,T1),S>>::call::{{closure}}::h650dac4864c27520
  ...
  39:     0x559347a463af - tokio::runtime::task::harness::Harness<T,S>::poll::h3ae60ec675d41951
  40:     0x5593481ecd93 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h5f54c3255e052ad4
  41:     0x5593481e6c14 - tokio::runtime::context::scoped::Scoped<T>::set::h4261ff6b5d63c680
  42:     0x5593481af564 - tokio::runtime::context::runtime::enter_runtime::hb3189b7b6d405fc5
  43:     0x5593481ecaac - tokio::runtime::scheduler::multi_thread::worker::run::h11ac423a0e22051e
  ...

This occurs with or without --features llama_cuda.

This does not occur, however, if I checkout v0.1.2 instead, which produces a token-by-token output:

data: {"id":"856034ed-0e9e-4a56-94de-c9cb0be6cc90","choices":[{"delta":{"content":"Hello","role":null},"finish_reason":null,"index":0}],"created":1708805524,"model":"main","system_fingerprint":"edgen-0.1.2","object":"text_completion"}

data: {"id":"95a36b4b-4c7b-42fd-aa90-9fbfca770057","choices":[{"delta":{"content":"!","role":null},"finish_reason":null,"index":0}],"created":1708805524,"model":"main","system_fingerprint":"edgen-0.1.2","object":"text_completion"}

data: {"id":"24483190-b48d-48cc-ad0b-7459be014740","choices":[{"delta":{"content":" How","role":null},"finish_reason":null,"index":0}],"created":1708805524,"model":"main","system_fingerprint":"edgen-0.1.2","object":"text_completion"}

...

And on it goes that generates the sentence "Hello! How can I assist you today?" which I assume is the expected behaviour.

Looking at the exception it seems like it came from the fact that llama_cpp::LlamaSession::context_size internally started calling block_on last week, which the existing tokio::Runtime didn't appreciate.

Is this a bug, and if not, can anyone point me to the right direction here? Many thanks in advance.

System

Debian GNU/Linux 11 (bullseye) x86_64
rustc 1.75.0-beta.3 (b66b7951b 2023-11-20) as per rust-toolchain.toml
(same happens to 1.76 stable anyway)

@francis2tm francis2tm added the bug Something isn't working label Feb 25, 2024
@pedro-devv
Copy link
Contributor

It's actually the other way around, the problem is the lack of a block_on, main is using an old version of llama_cpp which is doing a blocking_read on an async environment. While I push changes to main, running cargo update should solve the issue.

@pedro-devv pedro-devv linked a pull request Feb 26, 2024 that will close this issue
@denwong47
Copy link
Author

Thank you @pedro-devv !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants