Skip to content

fix(ollama): restore catalog-driven num_ctx for native /api/chat #76181

Merged
openperf merged 6 commits intoopenclaw:mainfrom
openperf:fix/ollama-local-idle-timeout
May 3, 2026
Merged

fix(ollama): restore catalog-driven num_ctx for native /api/chat #76181
openperf merged 6 commits intoopenclaw:mainfrom
openperf:fix/ollama-local-idle-timeout

Conversation

@openperf
Copy link
Copy Markdown
Member

@openperf openperf commented May 2, 2026

Summary

  • Problem: Issue [Bug]: The local ollama llm was invalid after 4.26 version #76117 reports that after upgrading from v2026.4.26 to v2026.4.29, local Ollama runs against models such as qwen3.6:35B:a3b produce broken output: the model "doesn't know my commands", uses wrong tools, and "says nonsense" on agent turns. Critically, the reporter explicitly says: "So I fixed it [the timeout], and the local LLM still act like a moron" and "I don't care about the latency" — proving the symptom is not caused by the 120s LLM idle watchdog (src/agents/pi-embedded-runner/run/llm-idle-timeout.ts). Bumping the timeout does not help; the request never reaches a timeout — the model finishes streaming, but its output is garbage because it lost context.

  • Root Cause: Commit 7559845597 "fix(ollama): avoid implicit native num_ctx override" (2026-04-27, shipped in v2026.4.27 / 4.28 / 4.29) changed resolveOllamaModelOptions in extensions/ollama/src/stream.ts:293 from

    options.num_ctx = resolveOllamaNumCtx(model);  // user-config OR catalog fallback

    to

    const numCtx = resolveOllamaConfiguredNumCtx(model);
    if (numCtx !== undefined) {
      options.num_ctx = numCtx;
    }

    The catalog-based fallback (model.contextWindow ?? model.maxTokens) was lost. For users with a default openclaw.json (no explicit models.providers.ollama.timeoutSeconds and no params.num_ctx), every native /api/chat request now ships without options.num_ctx. Ollama silently falls back to the model's Modelfile default — typically 2048 tokens. A typical OpenClaw agent turn carries a system prompt (~3-5K tokens) plus tool definitions (~3-8K tokens) plus history; the request is silently truncated to the last ~2K, the model loses the tool catalog, and the stream completes with "wrong tools / nonsense" output. This explains every line of the reporter's symptom description and why bumping the timeout did nothing.

    The author's stated intent ("avoid implicit override") was reasonable in spirit — don't second-guess Ollama's Modelfile when the catalog has no opinion — but the implementation also dropped catalog-known windows (qwen3.6 → 32K/128K, llama3 → 128K, gemma3 → 128K) which OpenClaw's catalog already records. Those values are not an implicit override; they are explicit information Ollama has no way to recover.

  • Fix: Add a narrow resolveOllamaNativeNumCtx(model) helper that resolves in priority order: (1) explicit params.num_ctx; (2) catalog contextWindow / maxTokens if present; (3) undefined (let the Modelfile decide for unknown models). Wire it into resolveOllamaModelOptions. This restores 4.26 behavior for all known models without re-introducing the DEFAULT_CONTEXT_TOKENS fallback that the original commit deliberately removed for unknown models — which preserves the "avoid implicit override" intent for genuinely catalog-less models.

    In the same PR, this change is bundled with a related but distinct fix on the LLM idle watchdog (the original scope of this PR): the 120s default watchdog encodes a network-silence-as-hang assumption that does not hold for local providers (loopback / RFC 1918 / RFC 6598 CGNAT / .local). When no explicit timeout is configured, the watchdog is now skipped for local provider URLs. This is a real bug — local sockets do not stall — but it is adjacent to, not the cause of, the user-reported symptom in [Bug]: The local ollama llm was invalid after 4.26 version #76117. We keep the two changes together because the watchdog change had already passed bot review on this branch and removing it would be churn.

  • What changed:

    • extensions/ollama/src/stream.ts — added resolveOllamaNativeNumCtx; resolveOllamaModelOptions now uses it instead of resolveOllamaConfiguredNumCtx. Catalog windows survive the trip to Ollama.
    • extensions/ollama/src/stream-runtime.test.ts — updated four assertions that previously locked in the broken num_ctx === undefined behavior on models that explicitly set contextWindow; added two new assertions: catalog window is forwarded as num_ctx, and an unknown catalog still leaves num_ctx absent (preserving the original commit's intent for that case).
    • src/agents/pi-embedded-runner/run/llm-idle-timeout.ts — provider-aware watchdog: skip the default fallback when model.baseUrl is loopback / RFC 1918 / RFC 6598 / .local and no explicit timeout is configured. Strict IPv4-literal regex guards against numeric-looking hostnames such as http://10.0.0.5evil:11434.
    • src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts — coverage for local/non-local IPv4 boundaries (RFC 1918 ranges, RFC 6598 CGNAT, 127/8 full range), numeric-hostname injection cases, malformed baseUrl, explicit timeout interaction.
    • src/agents/pi-embedded-runner/run/attempt.ts — passes params.model (typed as { baseUrl?: string }, mirroring the existing requestTimeoutMs cast on the same call) into resolveLlmIdleTimeoutMs.
  • What did NOT change (scope boundary):

    • extensions/ollama/src/stream.ts:createOllamaStreamFn request flow, /api/chat body shape, header construction, SSRF policy — unchanged. Only the value carried in options.num_ctx changes, and only for models that have catalog data.
    • resolveOllamaNumCtx (used by the OpenAI-compatibility wrapper path) — unchanged. The compat path still has its DEFAULT_CONTEXT_TOKENS fallback because that path wraps a request that already shipped without num_ctx; it is the right place to backstop with a constant.
    • streamWithIdleTimeout itself, abort plumbing, hook signatures — unchanged.
    • DEFAULT_LLM_IDLE_TIMEOUT_MS, DEFAULT_LLM_IDLE_TIMEOUT_SECONDS, DEFAULT_CONTEXT_TOKENS — same values, same semantics, same defaults for cloud/wrapper paths.
    • Provider plugins, models.json schema, config schema, docs (docs/providers/ollama.md), CHANGELOG.md — unchanged. (CHANGELOG is intentionally left for the maintainer to slot under the right release on merge.)
    • Cron trigger, explicit runTimeoutMs, explicit agents.defaults.timeoutSeconds, explicit models.providers.<id>.timeoutSeconds paths — bit-for-bit identical.
    • Remote provider behavior (any non-local baseUrl) — bit-for-bit identical.

Reproduction

  1. Install OpenClaw v2026.4.29 (or any version since 7559845597, 2026-04-27) with a default openclaw.json — no models.providers.ollama.timeoutSeconds, no models.providers.ollama.params.num_ctx.
  2. Run an Ollama daemon on http://127.0.0.1:11434:
    ollama pull qwen3.6:35B
  3. Issue any agent-style request that requires tool selection:
    openclaw infer model run \
      --model ollama/qwen3.6:35B \
      --prompt "Plan three sequential bash commands and call them via tools."
  4. Before this PR: The model loses the tool catalog (truncated to ~2048 tokens), invents tool names that do not exist, picks the wrong tool, or returns plain prose where structured tool calls were required. Bumping agents.defaults.timeoutSeconds does not help — confirmed by the issue reporter: "I fixed it, and the local LLM still act like a moron."
  5. After this PR: The full system prompt and tool definitions reach the model because num_ctx = 131072 (or whatever the catalog records for that model) flows through to Ollama. Tool selection and answers behave as they did on v2026.4.26.
  6. Regression check (must keep working): a remote provider such as https://api.openai.com/v1 still receives the existing 120s idle watchdog; an Ollama model with no catalog contextWindow still ships without num_ctx so the Modelfile decides — preserving the 7559845597 author's "avoid implicit override" intent for that case.

Risk / Mitigation

  • Risk: Forwarding catalog contextWindow as num_ctx will cause Ollama to allocate the corresponding KV cache. On memory-constrained hosts, a model whose catalog says 131072 will use noticeably more RAM/VRAM than the Modelfile's 2048 default. This matches v2026.4.26 behavior exactly, but represents a change vs. current main for users who upgraded between 4.27 and now and silently adapted to the truncated context.
  • Risk: A user who manually created a custom Ollama provider entry without populating contextWindow will get the new "no num_ctx" behavior — the same as today, no regression.
  • Risk: The watchdog change disables a guard for local providers that was incorrectly applied to begin with. A genuinely hung local Ollama daemon will no longer self-abort at 120s; agent / run / explicit provider timeouts still bound the request.
  • Mitigation:
    • Users who want the post-7559845597 "trust Modelfile" behavior can opt out by removing contextWindow from their custom catalog entries, or by setting models.providers.<provider>.params.num_ctx to the Modelfile value explicitly.
    • The new resolveOllamaNativeNumCtx helper is internal; the public resolveOllamaNumCtx (used by the compat wrapper) is unchanged so the wrapper-side fallback semantics are untouched.
    • Coverage: existing extensions/ollama/src/stream-runtime.test.ts tests for native /api/chat are updated; two new tests assert the catalog-fallback and the no-catalog-no-num_ctx contracts. Coverage on the watchdog side asserts both local and non-local ranges, numeric-hostname injection, and explicit-timeout interaction.
    • No use of any. The new helper signature is (model: ProviderRuntimeModel) => number | undefined. The watchdog parameter is structurally typed as { baseUrl?: string }.

Why this is the root cause and not a symptom patch

The reporter's exact words ("I fixed it [the timeout], and the local LLM still act like a moron") are the load-bearing evidence. A fix that targets the 120s watchdog leaves the reporter's symptom intact, because the request was never being aborted; it was being truncated before it even left OpenClaw. The smallest change that restores 4.26-equivalent behavior for the reporter's repro is to put num_ctx back into the request body. The PR does exactly that, and bundles the orthogonal watchdog cleanup that was already accepted on this branch.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Ollama provider extension
  • Agents / pi-embedded-runner

Linked Issue/PR

Fixes #76117

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 2, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

Codex review: needs changes before merge.

Summary
The PR restores catalog-derived num_ctx for native Ollama /api/chat, skips the default idle watchdog for local provider URLs, threads model baseUrl into timeout resolution, and updates related tests plus a changelog entry.

Reproducibility: yes. at source level. Current main drops catalog windows from native Ollama options.num_ctx, and the linked regression report after v2026.4.29 is consistent with context truncation rather than the idle watchdog alone; no live Ollama run was performed in this read-only review.

Next step before merge
A narrow automated repair can update docs and changelog without changing the runtime patch; final merge still needs maintainer approval.

Security
Cleared: No concrete security or supply-chain regression found; the diff changes provider request options, URL classification for timeout policy, tests, and changelog without touching dependencies, workflows, secrets, or artifact execution.

Review findings

  • [P2] Update Ollama docs for native num_ctx fallback — extensions/ollama/src/stream.ts:331
  • [P3] Mention local watchdog behavior in the changelog — CHANGELOG.md:143
Review details

Best possible solution:

Keep the runtime direction, update the Ollama docs and release note to match the new native precedence and local watchdog semantics, then land after maintainer approval and normal checks.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main drops catalog windows from native Ollama options.num_ctx, and the linked regression report after v2026.4.29 is consistent with context truncation rather than the idle watchdog alone; no live Ollama run was performed in this read-only review.

Is this the best way to solve the issue?

No, not complete as-is. The runtime num_ctx fix is narrow and well targeted, but the docs and release note need to be aligned before this is the best maintainable fix.

Full review comments:

  • [P2] Update Ollama docs for native num_ctx fallback — extensions/ollama/src/stream.ts:331
    This makes native /api/chat send catalog contextWindow/maxTokens as options.num_ctx when params.num_ctx is absent, but docs/providers/ollama.md still says native requests leave num_ctx unset unless it is explicit and repeats that guidance in large-context troubleshooting. After merge, users would get the wrong memory and opt-out guidance, so the docs need the new explicit params -> catalog window -> unset order.
    Confidence: 0.93
  • [P3] Mention local watchdog behavior in the changelog — CHANGELOG.md:143
    The added release note covers the Ollama context regression, but this PR also disables the default 120s idle watchdog for loopback/private/.local provider URLs when no explicit timeout is set. That is user-visible for local provider runs, so the changelog should mention it if the watchdog change stays bundled.
    Confidence: 0.82

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm exec oxfmt --check --threads=1 docs/providers/ollama.md CHANGELOG.md
  • pnpm test extensions/ollama/src/stream-runtime.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts
  • pnpm check:changed

What I checked:

Likely related people:

  • steipete: Authored the explicit-only native Ollama num_ctx change and matching docs/tests, plus recent Ollama stream and timeout-related commits. (role: introduced current behavior and recent maintainer; confidence: high; commits: 755984559723, 18b76e399579, e899b32e1d79; files: extensions/ollama/src/stream.ts, extensions/ollama/src/stream-runtime.test.ts, docs/providers/ollama.md)
  • liuy: Added the LLM idle timeout helper, tests, and runner wrapping that this PR now changes for local provider URLs. (role: introduced idle watchdog; confidence: medium; commits: 84b72e66b918; files: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts, src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts, src/agents/pi-embedded-runner/run/attempt.ts)
  • obviyus: Recently unified the idle-timeout behavior with the runner abort path, which is directly adjacent to the watchdog portion of this PR. (role: adjacent maintainer; confidence: medium; commits: 179f713c88c6; files: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts, src/agents/pi-embedded-runner/run/attempt.ts)
  • ImLukeF: Recently adjusted explicit run-timeout handling for the same LLM idle watchdog helper. (role: adjacent maintainer; confidence: medium; commits: 7f2814fc4a76; files: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts, src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts)

Remaining risk / open question:

  • One head check run was still in progress during this read-only review.
  • The bundled watchdog behavior is user-visible and currently under-documented compared with the runtime diff.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a92e2b13b8b8.

@openperf openperf force-pushed the fix/ollama-local-idle-timeout branch from 3f27684 to 59eb92d Compare May 2, 2026 18:15
Copy link
Copy Markdown
Contributor

@martingarramon martingarramon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static review (no fresh checkout / pnpm test). The strict IPv4 literal regex stops the obvious 10.0.0.5evil mis-parse; this is a clean defensive shape. A few notes:

RFC coverage: loopback (127/8), RFC 1918, and RFC 6598 (100.64/10 CGNAT) are correct. Two gaps are worth a comment or follow-up:

  • IPv6 link-local (fe80::/10) and ULA (fc00::/7) are not covered, so a Tailscale-over-IPv6-only rig would still hit the watchdog.
  • DNS-resolved local aliases (a /etc/hosts entry like ollama-rig192.168.1.20) skip the disable because resolution happens after URL.hostname. .local mDNS works because the suffix is matched textually. Probably fine — users can use the IP — but worth being explicit for the next person.

Pre-e899b32e1d baseline: the PR body cites the 4.27 collapse-knobs commit as the regression source. Worth confirming whether the pre-refactor path also returned 0 for local providers (= this is a restoration) or whether local-skip never existed before (= behavior shift, not regression fix). The framing in the changelog/PR body matters.

Test coverage: the strict-IPv4-vs-attacker-suffix cases (10.0.0.5evil, 127.0.0.1foo, 1.2.3.4.5) are the load-bearing assertions; 100.127.255.254 correctly bounds CGNAT. Good defensive test design.

Type cast: model: params.model as { baseUrl?: string } at src/agents/pi-embedded-runner/run/attempt.ts:2098 is a structural narrowing. If params.model already declares baseUrl?: string, the cast is redundant; otherwise, widen the consumer signature or pull a shared type.

CI green.

@openperf openperf force-pushed the fix/ollama-local-idle-timeout branch from 59eb92d to a5c89c7 Compare May 3, 2026 01:24
@openperf openperf changed the title fix(agents): skip default idle watchdog for local provider streams fix(ollama): restore catalog-driven num_ctx for native /api/chat and skip idle watchdog for local streams May 3, 2026
@openperf openperf changed the title fix(ollama): restore catalog-driven num_ctx for native /api/chat and skip idle watchdog for local streams fix(ollama): restore catalog-driven num_ctx for native /api/chat May 3, 2026
@openperf
Copy link
Copy Markdown
Member Author

openperf commented May 3, 2026

@martingarramon — thanks for the careful read.

Pre- e899b32 baseline framing. You were right. The pre-refactor resolveLlmIdleTimeoutMs also fell through to the 120s default for users who hadn't set idleTimeoutSeconds, so the watchdog change isn't really a regression fix for default-config users. Following that thread led me to bisect more carefully and the actual root cause of #76117 sits in 7559845597: resolveOllamaModelOptions no longer forwards catalog contextWindow as num_ctx, so native /api/chat ships without it, Ollama uses the Modelfile's ~2048-token default, and the system prompt plus tool definitions are silently truncated. That matches the reporter's "I fixed it, and the local LLM still act like a moron" — they were referring to the 120s timeout in the preceding paragraph, but the stream was completing; the output was just garbage. PR title and body now lead with the num_ctx fix and treat the watchdog change as adjacent.

RFC coverage and DNS aliases. IPv6 ULA (fc00::/7) and link-local (fe80::/10) are in, with boundary tests (fbff::1, fec0::1, the abbreviated fc::100fc:… gotcha — all correctly stay remote). Tailscale's fd7a:115c:a1e0::/48 is the load-bearing case for ULA. I deliberately didn't extend IPv4-mapped IPv6 detection beyond loopback: the SSRF-policy helper in model-preflight.runtime.ts keeps the same scope and I'd rather not have the two helpers diverge on what counts as private. DNS-resolved aliases are documented in the JSDoc as a known limit rather than fixed; sync DNS in the watchdog hot path felt disproportionate when the user can use the IP directly or set models.providers.<id>.timeoutSeconds.

The as { baseUrl?: string } cast. I'll send a follow-up PR to widen params.model to ProviderRuntimeModel, which collapses both this and the requestTimeoutMs cast above.

Thanks again.

@openperf openperf force-pushed the fix/ollama-local-idle-timeout branch 4 times, most recently from 08b1035 to d802d80 Compare May 3, 2026 11:34
@openperf openperf force-pushed the fix/ollama-local-idle-timeout branch from d802d80 to d314225 Compare May 3, 2026 11:46
@openperf openperf merged commit c1b9af2 into openclaw:main May 3, 2026
92 of 93 checks passed
@openperf
Copy link
Copy Markdown
Member Author

openperf commented May 3, 2026

Merged via squash.

Thanks @openperf!

@openperf openperf deleted the fix/ollama-local-idle-timeout branch May 3, 2026 11:47
@openperf
Copy link
Copy Markdown
Member Author

openperf commented May 3, 2026

Merged as c1b9af2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: The local ollama llm was invalid after 4.26 version

2 participants