perf(llm): warm up connection before correction by hyspace · Pull Request #25 · missuo/koe

hyspace · 2026-03-29T08:19:52Z

This PR reduces cold LLM latency by warming the existing HTTP connection as soon as a recording session starts, so network setup can overlap with user speech instead of waiting until ASR has already finished.

In the current Azure OpenAI-compatible environment, direct cold-vs-warm measurements on the LLM host show an expected cold-path improvement of about 300ms. Across 7 paired runs, cold requests reached first response bytes in about 368-415ms, while warm reused connections did so in about 92-102ms, for an observed savings range of roughly 274-323ms.

This benefit mainly applies to:

the first correction after app launch
the first correction after the pooled connection has expired due to idleness

It is not expected to materially improve back-to-back dictation sessions when the LLM connection is already warm.

Implementation

Start a best-effort LLM warmup from sp_core_session_begin
Reuse the existing shared reqwest::Client
Use GET /models/{model} as a lightweight same-origin warmup request
URL-encode model IDs
Skip warmup when:
- LLM is disabled or not configured
- another warmup is already in flight
- the connection was used recently
Never block the real correction request on warmup

The reuse window is tied to the real HTTP client settings:

pool idle timeout: 90s
safety margin: 20s
warmup reuse window: 70s

For the current Rust reqwest stack, a lightweight same-origin request is the most practical warmup mechanism; unlike browsers, there is no built-in higher-level preconnect primitive that safely warms the pooled connection we actually want to reuse.

Alternative explored but not adopted

We also tested HTTP/3 as a separate optimization idea.

Result:

OpenAI's public endpoint could use HTTP/3 on a compatible network path
the current Azure OpenAI-compatible endpoint did not demonstrate a usable HTTP/3 path in our testing
reqwest HTTP/3 is still unstable and does not provide the fallback behavior we want

Because of that, HTTP/3 is not part of this PR. This change only warms the existing stable HTTP/2/TLS path.

Validation

Passed:

cargo build -q -p koe-core
cargo test -q -p koe-core

Copilot

Pull request overview

This PR reduces first-request LLM latency by initiating a best-effort “warmup” HTTP request at the start of a recording session so TLS/HTTP connection setup can overlap with user speech, reusing the existing shared reqwest::Client.

Changes:

Added OpenAiCompatibleProvider::warmup() and centralized the HTTP pool idle timeout constant.
Introduced global warmup state and kick off warmup from sp_core_session_begin, with TTL/in-flight gating.
Added urlencoding dependency to safely URL-encode model IDs for the warmup endpoint.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`koe-core/src/llm/openai_compatible.rs`	Adds a warmup request method and reuses a shared pool idle timeout constant for the HTTP client.
`koe-core/src/lib.rs`	Adds warmup state tracking and triggers warmup at session start; updates last-touched on successful corrections.
`koe-core/Cargo.toml`	Adds `urlencoding` dependency used by the warmup URL construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-29T08:24:18Z

+        let _ = response.bytes().await;
+        if !status.is_success() {
+            log::debug!("LLM warmup completed with HTTP {status}");
+        }
+
+        Ok(())


warmup() ignores the result of response.bytes().await. If reading/draining the body fails, the function still returns Ok(()), which can cause the caller to mark the connection as “touched” even though the request didn’t fully complete (and may not be reusable). Handle and propagate/log the bytes read error (and consider treating it as warmup failure).

Suggested change

let _ = response.bytes().await;

if !status.is_success() {

log::debug!("LLM warmup completed with HTTP {status}");

}

Ok(())

match response.bytes().await {

Ok(_) => {

if !status.is_success() {

log::debug!("LLM warmup completed with HTTP {status}");

}

Ok(())

}

Err(e) => {

log::debug!(

"LLM warmup request to {url} failed while reading response body: {e}"

);

Err(KoeError::LlmFailed(e.to_string()))

}

}

Copilot · 2026-03-29T08:24:18Z

+fn llm_is_ready(cfg: &config::LlmSection) -> bool {
+    cfg.enabled && !cfg.base_url.is_empty() && !cfg.api_key.is_empty() && !cfg.model.is_empty()
+}


llm_is_ready() introduces a stricter “configured” check (it includes model), but the correction path still uses a different condition (llm_enabled earlier in run_session only checks enabled/base_url/api_key). This divergence can make warmup behavior inconsistent with whether correction actually runs, and makes future config validation harder. Consider reusing a single readiness predicate for both warmup and correction (or rename/split the function to make the difference explicit).

perf(llm): warm up connection before correction

31da9b8

missuo requested a review from Copilot March 29, 2026 08:21

Copilot started reviewing on behalf of missuo March 29, 2026 08:21 View session

missuo merged commit 183ae9c into missuo:main Mar 29, 2026
3 checks passed

Copilot AI reviewed Mar 29, 2026

View reviewed changes

hyspace mentioned this pull request Mar 29, 2026

fix(llm): tighten warmup readiness and drain handling #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(llm): warm up connection before correction#25

perf(llm): warm up connection before correction#25
missuo merged 1 commit into
missuo:mainfrom
hyspace:codex/perf-llm-warmup

hyspace commented Mar 29, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hyspace commented Mar 29, 2026

Implementation

Alternative explored but not adopted

Validation

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants