perf(llm): reuse HTTP client and enable HTTP/2 by hyspace · Pull Request #17 · missuo/koe

hyspace · 2026-03-27T08:36:47Z

Summary

This PR improves LLM correction latency by reusing a shared reqwest::Client across sessions and enabling HTTP/2 support in koe-core.

The main issue in the current implementation is that the OpenAI-compatible provider builds a new HTTP client for every voice session. That prevents connection pooling from being effective and adds unnecessary transport setup cost on a latency-sensitive path.

This change keeps the existing config hot-read behavior for request-level settings, but moves the HTTP client lifecycle to the core level:

create one shared LLM HTTP client in sp_core_create()
reuse it across sessions
rebuild it on explicit config reload
prefer HTTP/2 when the upstream supports it, while still allowing automatic fallback to HTTP/1.1

This is a koe-core change, but it was implemented and validated on top of the current windows-support branch.

What Changed

enabled the http2 feature for reqwest
moved HTTP client construction out of OpenAiCompatibleProvider::new()
added a shared LLM HTTP client to the global Core state
cloned the shared client into each session instead of rebuilding it
rebuilt the client during sp_core_reload_config()
documented that changing llm.timeout_ms requires restarting Koe to fully apply
tuned the transport settings for the voice-input use case:
- pool_max_idle_per_host(2)
- tcp_keepalive(30s)
- http2_keep_alive_interval(30s)
- http2_keep_alive_timeout(30s)
- http2_keep_alive_while_idle(true)

Why This Approach

reqwest::Client is designed to be reused and already contains an internal connection pool. Reusing the same client lets Koe keep transport state warm across voice sessions instead of paying the setup cost every time.

I intentionally did not add per-field config diffing logic for client rebuilds. The simpler policy here is:

request-level settings like base_url, api_key, model, temperature, top_p, and token settings still apply on the next session
transport-level client settings are refreshed on explicit reload / restart

That keeps the implementation small and predictable.

Benchmark Notes

I ran a real-endpoint benchmark against the current Azure Foundry OpenAI-compatible endpoint used by this setup.

Most relevant comparison:

current-like path: HTTP/1.1 + fresh client per request
optimized path: HTTP/2 + reused client

30-request result:

current-like path
- average: 1.390s
- P50: 1.319s
- P90: 1.780s
- P95: 1.789s
optimized path
- average: 1.113s
- P50: 1.079s
- P90: 1.470s
- P95: 1.584s

Observed improvement:

average latency reduced by 0.277s
about 19.9% faster overall in this benchmark
tail latency also improved

I also verified that the endpoint negotiates HTTP/2 when the client has HTTP/2 enabled. This is not a forced-HTTP/2 change: if an upstream only supports HTTP/1.1, the client can still fall back automatically.

Validation

Verified locally with:

cargo test -p koe-core
cargo build --manifest-path koe-core/Cargo.toml --release --target x86_64-pc-windows-msvc
cmake -B KoeWin/build-x64 -S KoeWin -G "Visual Studio 18 2026" -A x64
cmake --build KoeWin/build-x64 --config Release

Also validated end-to-end on both Windows and macOS after rebasing onto the latest upstream main.

User-Facing Behavior

No workflow changes.

The only user-facing documentation change is that llm.timeout_ms now explicitly notes:

restart Koe after changing this value

because the shared HTTP client is long-lived and timeout is applied when that client is built.

missuo · 2026-03-27T08:41:02Z

For now, I’d like to avoid merging Win into the main branch. macOS is still in its early stages of development, and since it uses completely different frameworks, aligning all functions between macOS and Win is challenging. Therefore, I believe it’s better to focus on polishing the Win version after the macOS version is essentially complete.

hyspace · 2026-03-27T08:45:04Z

I'm switching between mac and win to test and accidentally brought win changes. has rebased and fixed.

missuo · 2026-03-27T08:46:06Z

I'm switching between mac and win to test and accidentally brought win changes. has rebased and fixed.

Thx!

perf(llm): reuse HTTP client and enable HTTP/2

11157c5

hyspace force-pushed the codex/llm-latency-optimization-http2 branch from d58f2f6 to 11157c5 Compare March 27, 2026 08:39

missuo merged commit 051dc6d into missuo:main Mar 27, 2026

hyspace deleted the codex/llm-latency-optimization-http2 branch March 29, 2026 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(llm): reuse HTTP client and enable HTTP/2#17

perf(llm): reuse HTTP client and enable HTTP/2#17
missuo merged 1 commit into
missuo:mainfrom
hyspace:codex/llm-latency-optimization-http2

hyspace commented Mar 27, 2026 •

edited

Loading

Uh oh!

missuo commented Mar 27, 2026 •

edited

Loading

Uh oh!

hyspace commented Mar 27, 2026

Uh oh!

missuo commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hyspace commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Why This Approach

Benchmark Notes

Validation

User-Facing Behavior

Uh oh!

missuo commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyspace commented Mar 27, 2026

Uh oh!

missuo commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hyspace commented Mar 27, 2026 •

edited

Loading

missuo commented Mar 27, 2026 •

edited

Loading