fix(ollama): restore catalog-driven num_ctx for native /api/chat #76181
fix(ollama): restore catalog-driven num_ctx for native /api/chat #76181openperf merged 6 commits intoopenclaw:mainfrom
Conversation
|
Codex review: needs changes before merge. Summary Reproducibility: yes. at source level. Current main drops catalog windows from native Ollama Next step before merge Security Review findings
Review detailsBest possible solution: Keep the runtime direction, update the Ollama docs and release note to match the new native precedence and local watchdog semantics, then land after maintainer approval and normal checks. Do we have a high-confidence way to reproduce the issue? Yes, at source level. Current main drops catalog windows from native Ollama Is this the best way to solve the issue? No, not complete as-is. The runtime Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against a92e2b13b8b8. |
3f27684 to
59eb92d
Compare
martingarramon
left a comment
There was a problem hiding this comment.
Static review (no fresh checkout / pnpm test). The strict IPv4 literal regex stops the obvious 10.0.0.5evil mis-parse; this is a clean defensive shape. A few notes:
RFC coverage: loopback (127/8), RFC 1918, and RFC 6598 (100.64/10 CGNAT) are correct. Two gaps are worth a comment or follow-up:
- IPv6 link-local (
fe80::/10) and ULA (fc00::/7) are not covered, so a Tailscale-over-IPv6-only rig would still hit the watchdog. - DNS-resolved local aliases (a
/etc/hostsentry likeollama-rig→192.168.1.20) skip the disable because resolution happens afterURL.hostname..localmDNS works because the suffix is matched textually. Probably fine — users can use the IP — but worth being explicit for the next person.
Pre-e899b32e1d baseline: the PR body cites the 4.27 collapse-knobs commit as the regression source. Worth confirming whether the pre-refactor path also returned 0 for local providers (= this is a restoration) or whether local-skip never existed before (= behavior shift, not regression fix). The framing in the changelog/PR body matters.
Test coverage: the strict-IPv4-vs-attacker-suffix cases (10.0.0.5evil, 127.0.0.1foo, 1.2.3.4.5) are the load-bearing assertions; 100.127.255.254 correctly bounds CGNAT. Good defensive test design.
Type cast: model: params.model as { baseUrl?: string } at src/agents/pi-embedded-runner/run/attempt.ts:2098 is a structural narrowing. If params.model already declares baseUrl?: string, the cast is redundant; otherwise, widen the consumer signature or pull a shared type.
CI green.
59eb92d to
a5c89c7
Compare
|
@martingarramon — thanks for the careful read. Pre- e899b32 baseline framing. You were right. The pre-refactor RFC coverage and DNS aliases. IPv6 ULA ( The Thanks again. |
08b1035 to
d802d80
Compare
d802d80 to
d314225
Compare
|
Merged via squash.
Thanks @openperf! |
|
Merged as c1b9af2. |
Summary
Problem: Issue [Bug]: The local ollama llm was invalid after 4.26 version #76117 reports that after upgrading from v2026.4.26 to v2026.4.29, local Ollama runs against models such as
qwen3.6:35B:a3bproduce broken output: the model "doesn't know my commands", uses wrong tools, and "says nonsense" on agent turns. Critically, the reporter explicitly says: "So I fixed it [the timeout], and the local LLM still act like a moron" and "I don't care about the latency" — proving the symptom is not caused by the 120s LLM idle watchdog (src/agents/pi-embedded-runner/run/llm-idle-timeout.ts). Bumping the timeout does not help; the request never reaches a timeout — the model finishes streaming, but its output is garbage because it lost context.Root Cause: Commit
7559845597 "fix(ollama): avoid implicit native num_ctx override"(2026-04-27, shipped in v2026.4.27 / 4.28 / 4.29) changedresolveOllamaModelOptionsinextensions/ollama/src/stream.ts:293fromto
The catalog-based fallback (
model.contextWindow ?? model.maxTokens) was lost. For users with a defaultopenclaw.json(no explicitmodels.providers.ollama.timeoutSecondsand noparams.num_ctx), every native/api/chatrequest now ships withoutoptions.num_ctx. Ollama silently falls back to the model's Modelfile default — typically 2048 tokens. A typical OpenClaw agent turn carries a system prompt (~3-5K tokens) plus tool definitions (~3-8K tokens) plus history; the request is silently truncated to the last ~2K, the model loses the tool catalog, and the stream completes with "wrong tools / nonsense" output. This explains every line of the reporter's symptom description and why bumping the timeout did nothing.The author's stated intent ("avoid implicit override") was reasonable in spirit — don't second-guess Ollama's Modelfile when the catalog has no opinion — but the implementation also dropped catalog-known windows (qwen3.6 → 32K/128K, llama3 → 128K, gemma3 → 128K) which OpenClaw's catalog already records. Those values are not an implicit override; they are explicit information Ollama has no way to recover.
Fix: Add a narrow
resolveOllamaNativeNumCtx(model)helper that resolves in priority order: (1) explicitparams.num_ctx; (2) catalogcontextWindow/maxTokensif present; (3)undefined(let the Modelfile decide for unknown models). Wire it intoresolveOllamaModelOptions. This restores 4.26 behavior for all known models without re-introducing theDEFAULT_CONTEXT_TOKENSfallback that the original commit deliberately removed for unknown models — which preserves the "avoid implicit override" intent for genuinely catalog-less models.In the same PR, this change is bundled with a related but distinct fix on the LLM idle watchdog (the original scope of this PR): the 120s default watchdog encodes a network-silence-as-hang assumption that does not hold for local providers (loopback / RFC 1918 / RFC 6598 CGNAT /
.local). When no explicit timeout is configured, the watchdog is now skipped for local provider URLs. This is a real bug — local sockets do not stall — but it is adjacent to, not the cause of, the user-reported symptom in [Bug]: The local ollama llm was invalid after 4.26 version #76117. We keep the two changes together because the watchdog change had already passed bot review on this branch and removing it would be churn.What changed:
extensions/ollama/src/stream.ts— addedresolveOllamaNativeNumCtx;resolveOllamaModelOptionsnow uses it instead ofresolveOllamaConfiguredNumCtx. Catalog windows survive the trip to Ollama.extensions/ollama/src/stream-runtime.test.ts— updated four assertions that previously locked in the brokennum_ctx === undefinedbehavior on models that explicitly setcontextWindow; added two new assertions: catalog window is forwarded asnum_ctx, and an unknown catalog still leavesnum_ctxabsent (preserving the original commit's intent for that case).src/agents/pi-embedded-runner/run/llm-idle-timeout.ts— provider-aware watchdog: skip the default fallback whenmodel.baseUrlis loopback / RFC 1918 / RFC 6598 /.localand no explicit timeout is configured. Strict IPv4-literal regex guards against numeric-looking hostnames such ashttp://10.0.0.5evil:11434.src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts— coverage for local/non-local IPv4 boundaries (RFC 1918 ranges, RFC 6598 CGNAT, 127/8 full range), numeric-hostname injection cases, malformedbaseUrl, explicit timeout interaction.src/agents/pi-embedded-runner/run/attempt.ts— passesparams.model(typed as{ baseUrl?: string }, mirroring the existingrequestTimeoutMscast on the same call) intoresolveLlmIdleTimeoutMs.What did NOT change (scope boundary):
extensions/ollama/src/stream.ts:createOllamaStreamFnrequest flow,/api/chatbody shape, header construction, SSRF policy — unchanged. Only the value carried inoptions.num_ctxchanges, and only for models that have catalog data.resolveOllamaNumCtx(used by the OpenAI-compatibility wrapper path) — unchanged. The compat path still has itsDEFAULT_CONTEXT_TOKENSfallback because that path wraps a request that already shipped withoutnum_ctx; it is the right place to backstop with a constant.streamWithIdleTimeoutitself, abort plumbing, hook signatures — unchanged.DEFAULT_LLM_IDLE_TIMEOUT_MS,DEFAULT_LLM_IDLE_TIMEOUT_SECONDS,DEFAULT_CONTEXT_TOKENS— same values, same semantics, same defaults for cloud/wrapper paths.docs/providers/ollama.md),CHANGELOG.md— unchanged. (CHANGELOG is intentionally left for the maintainer to slot under the right release on merge.)runTimeoutMs, explicitagents.defaults.timeoutSeconds, explicitmodels.providers.<id>.timeoutSecondspaths — bit-for-bit identical.baseUrl) — bit-for-bit identical.Reproduction
7559845597, 2026-04-27) with a defaultopenclaw.json— nomodels.providers.ollama.timeoutSeconds, nomodels.providers.ollama.params.num_ctx.http://127.0.0.1:11434:openclaw infer model run \ --model ollama/qwen3.6:35B \ --prompt "Plan three sequential bash commands and call them via tools."agents.defaults.timeoutSecondsdoes not help — confirmed by the issue reporter: "I fixed it, and the local LLM still act like a moron."num_ctx = 131072(or whatever the catalog records for that model) flows through to Ollama. Tool selection and answers behave as they did on v2026.4.26.https://api.openai.com/v1still receives the existing 120s idle watchdog; an Ollama model with no catalogcontextWindowstill ships withoutnum_ctxso the Modelfile decides — preserving the7559845597author's "avoid implicit override" intent for that case.Risk / Mitigation
contextWindowasnum_ctxwill cause Ollama to allocate the corresponding KV cache. On memory-constrained hosts, a model whose catalog says 131072 will use noticeably more RAM/VRAM than the Modelfile's 2048 default. This matches v2026.4.26 behavior exactly, but represents a change vs. current main for users who upgraded between 4.27 and now and silently adapted to the truncated context.contextWindowwill get the new "no num_ctx" behavior — the same as today, no regression.7559845597"trust Modelfile" behavior can opt out by removingcontextWindowfrom their custom catalog entries, or by settingmodels.providers.<provider>.params.num_ctxto the Modelfile value explicitly.resolveOllamaNativeNumCtxhelper is internal; the publicresolveOllamaNumCtx(used by the compat wrapper) is unchanged so the wrapper-side fallback semantics are untouched.extensions/ollama/src/stream-runtime.test.tstests for native/api/chatare updated; two new tests assert the catalog-fallback and the no-catalog-no-num_ctx contracts. Coverage on the watchdog side asserts both local and non-local ranges, numeric-hostname injection, and explicit-timeout interaction.any. The new helper signature is(model: ProviderRuntimeModel) => number | undefined. The watchdog parameter is structurally typed as{ baseUrl?: string }.Why this is the root cause and not a symptom patch
The reporter's exact words ("I fixed it [the timeout], and the local LLM still act like a moron") are the load-bearing evidence. A fix that targets the 120s watchdog leaves the reporter's symptom intact, because the request was never being aborted; it was being truncated before it even left OpenClaw. The smallest change that restores 4.26-equivalent behavior for the reporter's repro is to put
num_ctxback into the request body. The PR does exactly that, and bundles the orthogonal watchdog cleanup that was already accepted on this branch.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Fixes #76117