Skip to content

fix: tolerate missing streamed response content type#90205

Merged
jalehman merged 3 commits into
mainfrom
fix/codex-missing-content-type-transport
Jun 4, 2026
Merged

fix: tolerate missing streamed response content type#90205
jalehman merged 3 commits into
mainfrom
fix/codex-missing-content-type-transport

Conversation

@jalehman
Copy link
Copy Markdown
Contributor

@jalehman jalehman commented Jun 4, 2026

Summary

What problem does this PR solve?

  • Fixes OpenAI/Codex transport failures where a streamed HTTP 200 response with a missing content-type header was rejected before the OpenAI SDK could parse the body.
  • Preserves the existing protection for successful OpenAI-compatible responses that are actually HTML or otherwise invalid.

Why does this matter now?

  • Live openai/gpt-5.5 plugin llm.complete calls can route through openclaw-openai-responses-transport and hit https://chatgpt.com/backend-api/codex/responses.
  • ChatGPT Codex can return a valid SSE stream without a content-type header, causing background lossless/context-engine summarization to fail with invalid_provider_content_type.

What is the intended outcome?

  • Missing-header SSE and JSON bodies continue through the existing SDK/sanitizer path.
  • Explicit HTML and missing-header HTML still fail fast with the helpful provider base URL guidance.

What is intentionally out of scope?

What does success look like?

  • llm.complete Codex calls no longer fail solely because the upstream omitted content-type on a valid stream.
  • Custom OpenAI-compatible providers still get clear errors when they return HTML for streamed SDK requests.

What should reviewers focus on?

  • Whether the missing-header sniff keeps the original response body unconsumed.
  • Whether the behavior stays gated to streamed SDK requests.
  • Whether explicit bad content types still reject as intended.

Linked context

Which issue does this close?

Closes #

Which issues, PRs, or discussions are related?

Related #90193

Was this requested by a maintainer or owner?

Yes, requested during live Phaedrus/OpenClaw incident triage.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: streamed OpenAI/Codex transport responses with missing content-type no longer block or fail before the SDK sees valid SSE/JSON bodies.
  • Real environment tested: local OpenClaw source checkout using the source OpenAI extension, a local OAuth profile resolved without printing credentials, openai/gpt-5.5, and the ChatGPT Codex responses endpoint.
  • Exact steps or command run after this patch: one-off node --import tsx probe that called completeSimple through openclaw-openai-responses-transport against https://chatgpt.com/backend-api/codex/responses; then node scripts/run-vitest.mjs src/agents/provider-transport-fetch.test.ts -- --reporter=verbose.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): redacted live log showed status=200, empty contentType=, first_event ... type=response.created, stream_done ... events=15, and final probe output PATCHED_CODEX_TRANSPORT_OK with usagePresent=true. Focused Vitest passed 73 tests including missing-header SSE, still-open SSE, split-prefix SSE, JSON fallback, and HTML rejection cases.
  • Observed result after fix: the patched transport accepted a real missing-header ChatGPT Codex SSE response, preserved the original stream for the SDK, completed normally, and returned the requested marker text.
  • What was not tested: full provider matrix and full suite. The broad live gateway profile suite was attempted but skipped the OpenAI candidate because isolated test HOME could not see a portable API key/auth profile.
  • Proof limitations or environment constraints: the live probe used an existing local OAuth profile but did not print credentials; it is a narrow transport proof, not a complete gateway/profile matrix proof.
  • Before evidence (optional but encouraged): the same live probe on the previous PR commit 40880bca88 reached status=200 with empty contentType= from ChatGPT Codex and then timed out after 180s before first_event or completion. The new still-open SSE regression also failed against 40880bca88 with expected Symbol(timeout) not to be Symbol(timeout).

Tests and validation

Which commands did you run?

  • Red focused test on previous PR commit 40880bca88 with only the still-open SSE regression applied: failed with expected Symbol(timeout) not to be Symbol(timeout).
  • Live red probe on previous PR commit 40880bca88: reached ChatGPT Codex HTTP 200 with missing content-type, then timed out after 180s before first SSE event/completion.
  • Live green probe on patched commit 0f621713b0: reached ChatGPT Codex HTTP 200 with missing content-type, saw response.created, completed 15 SSE events, and returned PATCHED_CODEX_TRANSPORT_OK.
  • node scripts/run-vitest.mjs src/agents/provider-transport-fetch.test.ts -- --reporter=verbose: passed 73 tests.
  • git diff --check: passed.
  • Earlier focused validation on the original missing-header fix passed missing-header SSE, split-prefix SSE, JSON fallback, and HTML rejection coverage.

What regression coverage was added or updated?

  • Missing content-type with valid SSE is accepted.
  • Missing content-type with a still-open SSE stream returns promptly instead of waiting for stream close.
  • Missing content-type with split SSE prefix is accepted.
  • Missing content-type with JSON is normalized into the existing JSON-to-SSE path.
  • Missing content-type with HTML is rejected.
  • Existing explicit text/html rejection remains covered.

What failed before this fix, if known?

  • The still-open SSE regression failed against 40880bca88 by timing out before the response was returned.
  • The live old-code probe against ChatGPT Codex reached HTTP 200 with empty contentType= and then timed out before first event/completion.

If no test was added, why not?

  • Tests were added.

Risk checklist

Did user-visible behavior change? (Yes/No)

Yes. Valid missing-header streamed OpenAI/Codex transport responses are no longer rejected.

Did config, environment, or migration behavior change? (Yes/No)

No.

Did security, auth, secrets, network, or tool execution behavior change? (Yes/No)

Yes, narrowly in provider transport response validation for streamed SDK requests.

What is the highest-risk area?

  • Accidentally weakening the guard that catches wrong OpenAI-compatible base URLs returning HTML with HTTP 200.

How is that risk mitigated?

  • Explicit non-SSE/non-JSON content types still reject.
  • Missing-header HTML still rejects after sniffing a cloned response body.
  • The sniff result only synthesizes text/event-stream or application/json for bodies that look like SSE or JSON.
  • The original body is passed to the SDK unconsumed.

Current review state

What is the next action?

  • Maintainer review and CI on patched commit 0f621713b0.

What is still waiting on author, maintainer, CI, or external proof?

  • CI on the updated head and any maintainer follow-up. Live provider proof is now included above.

Which bot or reviewer comments were addressed?

  • None yet. Local autoreview passed clean before PR creation.

Agent Transcript

Redacted codex session transcript
source: [LOCAL_SESSION]
redaction: local paths, emails, phone-shaped strings, token-shaped strings, auth headers, auth query params
omitted: raw tool outputs, system/developer prompts, local paths, secrets, browser/session/auth details
stats: {"agent":"codex","entries":39,"user":19,"assistant":19,"toolCalls":344,"toolOutputsDropped":343,"web":0,"redactions":5,"omittedUnsafe":0,"rawEntries":151}

[user]
Investigate a live Phaedrus/OpenClaw unresponsiveness issue in the OpenClaw source.

Context: Phaedrus is running OpenClaw 2026.6.1 with default model openai/gpt-5.5, OAuth auth profiles OK, Telegram healthy, contextEngine=lossless-claw, heartbeat every 30m.

During live operation, gateway logs showed repeated OpenAI/Codex transport failures:
[provider-transport-fetch] response provider=openai api=openclaw-openai-responses-transport model=gpt-5.5 status=200 contentType=
followed by:
[openai-transport] [responses] error ... causeCode=invalid_provider_content_type message=Connection error.

This happened while context-engine deferred turn maintenance / heartbeat runs were active.

Please inspect the OpenClaw source and identify:
- exact call path mapping openai/gpt-5.5 OAuth to https://chatgpt.com/backend-api/codex/responses
- where content-type is validated
- whether empty content-type with HTTP 200 can come from upstream versus OpenClaw response wrapping
- whether context-engine/lossless-claw background maintenance can amplify or incorrectly trigger these requests
- likely fix locations and targeted tests

Prefer an investigation report with file/line references before making code changes. Investigate a live Phaedrus/OpenClaw unresponsiveness issue in the OpenClaw source.

Context: Phaedrus is running OpenClaw 2026.6.1 with default model openai/gpt-5.5, OAuth auth profiles OK, Telegram healthy, contextEngine=lossless-claw, heartbeat every 30m.

During live operation, gateway logs showed repeated OpenAI/Codex transport failures:
[provider-transport-fetch] response provider=openai api=openclaw-openai-responses-transport model=gpt-5.5 status=200 contentType=
followed by:
[openai-transport] [responses] error ... causeCode=invalid_provider_content_type message=Connection error.

This happened while context-engine deferred turn maintenance / heartbeat runs were active.

Please inspect the OpenClaw source and identify:
- exact call path mapping openai/gpt-5.5 OAuth to https://chatgpt.com/backend-api/codex/responses
- where content-type is validated
- whether empty content-type with HTTP 200 can come from upstream versus OpenClaw response wrapping
- whether context-engine/lossless-claw background maintenance can amplify or incorrectly trigger these requests
- likely fix locations and targeted tests

Prefer an investigation report with file/line references before making code changes.

[assistant]
I’ll use the `openclaw-debugging` skill here because this is a live OpenClaw runtime/provider failure investigation. I’ll read source first and keep this as an investigation report unless the evidence points to a small obvious fix you want me to make next.

The worktree is already dirty on branch `clawdbot-d02.1.9.1.28/31b-embedded-run-session-target`; I’m only reading source for this investigation and won’t touch those files. The skill path from the session list points inside this repo, so I’m opening that copy now.

The relevant code is concentrated in the provider transport layer, model runtime resolution, and context-engine maintenance hooks. I’m narrowing to those modules now, plus the sibling Codex source for the ChatGPT/Codex endpoint behavior required by the repo policy.

So far, the log’s `api=openclaw-openai-responses-transport` is the synthetic simple-completion API alias, not the original model API. The URL normalization for ChatGPT Codex also lives in a narrow helper, which makes the call path traceable.

The fetch wrapper logs the upstream `Response` headers before it wraps the body, and the managed wrapper preserves the same `Headers` object. That strongly points away from OpenClaw inventing an empty content type in that layer.

For the model mapping, the important owner is the bundled `extensions/codex` provider/catalog. I’m reading its scoped guide before using those files in the report.

The catalog owner exposes `gpt-5.5` as a Codex app-server model with `api: "openai-chatgpt-responses"` and `baseUrl: https://chatgpt.com/backend-api`. I’m checking the provider/auth alias layer next because your configured ref is `openai/gpt-5.5`, not `codex/gpt-5.5`.

The `openai/gpt-5.5` part is not a simple provider-auth alias from `openai` to `codex`; the auth alias code only rewrites explicit manifest aliases/deprecated choice IDs. I’m following model selection/runtime planning now to see where `openai` plus Codex runtime ends up with the ChatGPT base URL.

This looks like a simple-completion path, not the main Codex harness run path: the synthetic API alias is produced by `prepareModelForSimpleCompletion()`, and the context-engine host advertises `runtime-llm-complete`. I’m checking which maintenance paths call that helper.

The background maintenance scheduler coalesces by session key and reruns once when another turn completes during an active maintenance run. That can amplify a failing LLM-backed maintenance operation into repeated sequential calls, but it is not an unbounded parallel fan-out for the same session.

One broad `find` command walked generated output and worktrees longer than useful; I’m not using that path for evidence. I’ll keep searches scoped to source directories from here.

`lossless-claw` itself does not appear as a bundled source directory in this checkout; it is referenced as an installed/configured context-engine plugin. I’ll still identify the host paths that give it LLM access and schedule its background maintenance, and note that plugin internals would need the installed plugin source for exact per-engine trigger rules.

I found the installed plugin source at `../lossless-claw`, which is needed to answer the maintenance-trigger part accurately. I’ll read only its relevant context-engine implementation and use repo-relative OpenClaw references in the final as requested.

The external `lossless-claw` engine sets `turnMaintenanceMode: "background"` and only consumes deferred compaction debt when OpenClaw passes `allowDeferredCompactionExecution`. Its compaction core can call the runtime LLM capability, so provider failures in that path are plausible during maintenance.

I have the OpenClaw and lossless-claw paths mapped. I’m checking the upstream Codex and OpenAI SDK sources now so the report distinguishes OpenClaw behavior from dependency validation instead of guessing.

The dependency scan found that OpenClaw’s logged `contentType=` is taken before the response is wrapped. I’m narrowing the SDK search now because the exact `invalid_provider_content_type` string does not appear in the visible `openai` package source.

One correction to the earlier assumption: this installed `openai` SDK does not appear to validate SSE content-type before parsing. The `invalid_provider_content_type` code is likely produced by a different layer or by the provider response body; I’m checking OpenClaw’s error normalization and package store before I lock that down.

The exact error code is not in this checkout’s OpenClaw source, installed `openai` SDK, or sibling Codex tree. I’m still citing the SDK stream parser location because that is where streamed responses are consumed here; the missing code means the `causeCode` is probably coming from the thrown error object’s nested cause rather than a literal OpenClaw branch.

The lossless-claw side has the expected coalescing/backoff hooks, but there is still a path where repeated foreground turns or heartbeats can keep offering background maintenance another chance. I’m checking the finish/backoff block so the report can say whether it is unbounded or just repeated over time.

The maintenance store does back off generic failures, so this does not look like an unlimited tight loop in lossless-claw. The remaining risk is repeated scheduled opportunities plus retry debt, especially if the provider failure is fast and many turns/heartbeat runs are queued.

I’m doing one last version sanity check against local tags. The current checkout has unrelated dirty work, so I have not changed files or switched branches.

The local `v2026.6.1` tag exists. Three transport files have changed since that tag, so I’m using current-line references for investigation and noting the tag drift where it matters.

That last tag check found the important release-specific piece: `v2026.6.1` does validate the streaming content-type in `provider-transport-fetch.ts` before wrapping. The current checkout has drifted there, so the report should cite the release-tag line numbers for the observed `invalid_provider_content_type` log.

Used `$openclaw-debugging`. No code changes made.

**Investigation Report**
The live `v2026.6.1` path is:

`context-engine maintenance/lossless-claw summarizer`
→ `src/agents/embedded-agent-runner/context-engine-capabilities.ts:18` injects `runtimeContext.llm.complete`
→ `src/plugins/runtime/runtime-llm.runtime.ts:419` resolves the agent simple-completion model, then `:443` calls `completeWithPreparedSimpleCompletionModel`
→ `src/agents/simple-completion-runtime.ts:79` resolves `openai/gpt-5.5`; `:121` maps Codex-runtime OpenAI policy back to runtime provider `openai`; `:343` prepares the model for simple completion
→ `v2026.6.1:src/agents/simple-completion-transport.ts:20` normalizes ChatGPT base URL to `https://chatgpt.com/backend-api/codex`; `:52` rewrites `openai-chatgpt-responses` to the synthetic simple API
→ `src/agents/provider-transport-stream.ts:23` maps both `openai-responses` and `openai-chatgpt-responses` to `openclaw-openai-responses-transport`; `:76` routes that to `createOpenAIResponsesTransportStreamFn`
→ `v2026.6.1:src/agents/openai-transport-stream.ts:1847` creates the OpenAI SDK client with `baseURL: model.baseUrl`; `:951` calls `client.responses.create(...)`, so the SDK appends `/responses`
→ final URL: `https://chatgpt.com/backend-api/codex/responses`.

**Content Type**
In `v2026.6.1`, validation is in `src/agents/provider-transport-fetch.ts`, not the SDK:

- `v2026.6.1:src/agents/provider-transport-fetch.ts:227` accepts only `text/event-stream` or JSON for streamed SDK responses.
- `v2026.6.1:src/agents/provider-transport-fetch.ts:231` reads `response.headers.get("content-type")`.
- `v2026.6.1:src/agents/provider-transport-fetch.ts:248` throws `ProviderHttpError` with `code: "invalid_provider_content_type"`.
- `v2026.6.1:src/agents/openai-transport-stream.ts:751` logs nested `causeCode`, which explains the observed `causeCode=invalid_provider_content_type message=Connection error`.

The empty `contentType=` log is from `v2026.6.1:src/agents/provider-transport-fetch.ts:741`, before `assertOpenAISdkStreamContentType` at `:757` and before `buildManagedResponse` at `:765`. That means the missing header came from the raw `fetchWithSsrFGuard` response path, not from OpenClaw’s later response wrapping. The wrapper preserves headers, and the SSE sanitizer only rewrites JSON to SSE; it does not blank content-type.

**Maintenance Impact**
Yes, background maintenance can trigger these requests. `src/agents/embedded-agent-runner/context-engine-maintenance.ts:370` calls `contextEngine.maintain`; for background mode it passes `allowDeferredCompactionExecution` at `:383`. Lossless-claw advertises `turnMaintenanceMode: "background"` at `../lossless-claw/src/engine.ts:3435`, consumes deferred compaction debt in `maintain()` at `../lossless-claw/src/engine.ts:7916`, and calls `runtime.llm.complete` for summarization at `../lossless-claw/src/plugin/index.ts:1173`.

This does not look like an unbounded parallel storm: OpenClaw coalesces active deferred maintenance per session at `src/agents/embedded-agent-runner/context-engine-maintenance.ts:576`, and lossless-claw backs off generic compaction failures for 5-30 minutes at `../lossless-claw/src/store/compaction-maintenance-store.ts:41` and `:266`. But it can still amplify over time: heartbeat/turn maintenance keeps offering background maintenance opportunities while compaction debt remains pending. Heartbeat-only ack turns are pruned early at `../lossless-claw/src/engine.ts:8449`, but pending debt can still be retried by host-approved maintenance.

**Likely Fixes**
Primary fix surface: `src/agents/provider-transport-fetch.ts`. The current `v2026.6.1` error hint assumes a bad base URL, but for known ChatGPT Codex OAuth traffic an HTTP 200 with missing content-type is more likely upstream/CDN/proxy behavior or a malformed provider response. A targeted fix should improve classification/diagnostics and decide whether to sniff/accept body-shaped SSE/JSON for known Codex Responses, or fail with a clearer provider response error.

Secondary fix surfaces: `src/agents/openai-transport-stream.ts` for preserving/reporting `ProviderHttpError.errorBody`, and `src/agents/embedded-agent-runner/context-engine-maintenance.ts` / `../lossless-claw/src/store/compaction-maintenance-store.ts` if repeated provider transport failures should suppress maintenance reruns more aggressively.

Targeted tests:
- `src/agents/provider-transport-fetch.test.ts`: streamed `openai-chatgpt-responses`, HTTP 200, empty `content-type`.
- `src/agents/simple-completion-transport.test.ts`: `https://chatgpt.com/backend-api` normalizes to `/backend-api/codex` and aliases to `openclaw-openai-responses-transport`.
- `src/agents/embedded-agent-runner/context-engine-maintenance.test.ts`: active deferred run plus provider transport failure does not spin.
- `../lossless-claw` maintain/heartbeat tests: pending compaction debt + `runtime.llm.complete` transport failure enters backoff and heartbeat-only turns do not immediately retry.

**Commands Run**
```bash
git status -sb; git branch --show-current
# Found dirty existing work on branch clawdbot-d02.1.9.1.28/31b-embedded-run-session-target; made no edits.

rg / nl inspections across:
  src/agents/simple-completion-transport.ts
  src/agents/provider-transport-stream.ts
  src/agents/simple-completion-runtime.ts
  src/plugins/runtime/runtime-llm.runtime.ts
  src/agents/provider-transport-fetch.ts
  src/agents/openai-transport-stream.ts
  src/agents/embedded-agent-runner/context-engine-*.ts
  ../lossless-claw/src/engine.ts
  ../lossless-claw/src/plugin/index.ts
  ../lossless-claw/src/store/compaction-maintenance-store.ts
  ../codex/codex-rs/codex-api/src/endpoint/responses.rs
# Mapped the call path, validation, and maintenance behavior.

git tag -l '*2026.6.1*'; git diff --name-only v2026.6.1 -- <transport files>
# Confirmed v2026.6.1 exists; key validation is release-tag specific in provider-transport-fetch.ts.
```

[user]
Is this issue preventing lossless’ compaction from running?

[assistant]
I’m checking the exact lossless-claw compaction return path before answering, because the important distinction is whether the provider failure blocks all maintenance or specifically model-backed compaction.

The compaction path confirms the distinction: the failure can stop summary-producing compaction, while other maintenance can still run. I’m going to answer in those terms.

Yes, for the compaction that needs model-backed summarization.

Lossless gets into compaction at `../lossless-claw/src/engine.ts:4300`, then `executeCompactionCore()` resolves a summarizer at `../lossless-claw/src/engine.ts:4454`, and passes that summarizer into `this.compaction.compact(...)` at `../lossless-claw/src/engine.ts:4561`. That summarizer calls OpenClaw `runtime.llm.complete` at `../lossless-claw/src/summarize.ts:1540`, which is the path hitting `openai/gpt-5.5` and then failing on `invalid_provider_content_type`.

So the observed transport failure can prevent deferred threshold compaction from successfully producing summaries. When that happens, lossless keeps the compaction debt pending and applies retry backoff: `../lossless-claw/src/engine.ts:4310` through `:4326`, plus the maintenance store backoff at `../lossless-claw/src/store/compaction-maintenance-store.ts:41` and `:266`.

It does not prevent all lossless maintenance. Ingest, heartbeat pruning, transcript GC, and “below threshold” checks can still run. The blocked part is summary-producing compaction, especially deferred proactive compaction that needs `runtime.llm.complete`.

[user]
Can you find the change that broke this? It didn’t use to be broken like this.

[assistant]
Using `$openclaw-debugging` again. I’m going to trace the release-tag blame around the `invalid_provider_content_type` branch and then compare the parent commit behavior so we can identify the actual regression, not just the file that surfaced it.

The first blame command had the revision/range order wrong; I’m rerunning with the correct form and then I’ll inspect the candidate commit patch.

The release validation branch blames to `9ead0ae921` (`fix: repair live model inference edge cases`). I’m reading that patch and its parent behavior now to decide whether this is the breaking change or just the first visible symptom.

The candidate is a broad live-inference patch, not a small isolated change. I’m pulling just the transport diff now and checking whether parent behavior would have allowed a missing header through.

The diff confirms the specific branch was added in that commit. I’m checking the added test to see what behavior it was intended to protect and whether ChatGPT Codex was accidentally swept into a custom-provider guard.

The added test was aimed at custom OpenAI-compatible endpoints returning HTML from a bad base URL. The regression is that the guard also applied to the ChatGPT Codex OAuth transport, where a 200 with missing `content-type` is a live upstream edge case rather than a misconfigured `/v1` URL.

Yes. The breaking change is commit `9ead0ae9219e87ba2c223ada385f380fea9415cc`:

`fix: repair live model inference edge cases`
Author date: `2026-06-01T23:03:27-04:00`

It added `assertOpenAISdkStreamContentType()` in `src/agents/provider-transport-fetch.ts`. That function rejects any successful streamed OpenAI-compatible response unless `content-type` is SSE or JSON, and throws `code: "invalid_provider_content_type"` at `src/agents/provider-transport-fetch.ts:248` in `v2026.6.1`.

The intent was reasonable but overbroad: the added test only covers a custom OpenAI-compatible endpoint returning HTML from a bad base URL. It did not cover ChatGPT Codex OAuth, where the known base URL is `https://chatgpt.com/backend-api/codex/responses` and the observed failure is HTTP 200 with missing `content-type`. Before this commit, the parent path logged the response and passed it through to the managed/sanitized stream path; after this commit, it fails before wrapping.

Release placement:
- Not in `v2026.6.1-alpha.1`, `alpha.2`, `alpha.3`, or `beta.1`
- Present in `v2026.6.1-beta.2`, `beta.3`, and `v2026.6.1`

So the likely regression window is between `v2026.6.1-beta.1` and `v2026.6.1-beta.2`.

```bash
git blame -L 223,253 v2026.6.1 -- src/agents/provider-transport-fetch.ts
# All invalid_provider_content_type validation lines blame to 9ead0ae921.

git show 9ead0ae921 -- src/agents/provider-transport-fetch.ts
# Confirmed the commit added assertOpenAISdkStreamContentType() and the call before buildManagedResponse().

git show 9ead0ae921^:src/agents/provider-transport-fetch.ts | nl -ba | sed -n '705,750p'
# Confirmed the parent did not have the content-type assertion before wrapping/sanitizing.

for tag in v2026.6.1-alpha.1 ... v2026.6.1; do git merge-base --is-ancestor 9ead0ae921 "$tag"; done
# Confirmed the commit first appears in v2026.6.1-beta.2 and remains in v2026.6.1.
```

[user]
Who committed that?

[assistant]
Commit `9ead0ae9219e87ba2c223ada385f380fea9415cc` was authored by:

`Peter Steinberger <[REDACTED_EMAIL]>`

It was committed by GitHub:

`GitHub <[REDACTED_EMAIL]>`

[user]
Ok. What’s the right recommended fix for this? And are there any other similar code paths affected by this same change?

[assistant]
I’ll check the post-`v2026.6.1` history and the other OpenAI-compatible transport paths before recommending a fix, because the right shape depends on whether this guard should be narrowed, removed, or replaced with body sniffing for known providers.

The current checkout no longer contains the broad `invalid_provider_content_type` guard, which suggests there has already been follow-up churn. I’m checking whether that follow-up fully addresses the release bug or only fixes abandoned stream cleanup.

The validation guard lives in `buildGuardedModelFetch`, so it can affect any provider stream using that fetcher with `stream: true`, not just ChatGPT Codex. I’m enumerating those callers and the APIs they cover now.

Recommended fix: do not keep the `9ead0ae921` content-type guard as a blanket `buildGuardedModelFetch()` rule.

The guard was trying to catch bad custom OpenAI-compatible base URLs that return HTML with HTTP 200. That is valid, but the implementation rejects known ChatGPT Codex OAuth traffic when upstream returns a valid-looking 200 stream without a `content-type`.

Best code shape:

1. Keep rejecting obvious invalid bodies, especially `text/html`.
2. For missing `content-type`, peek a small prefix of the response body before rejecting.
3. If the body prefix is valid SSE (`data:`, `event:`, comment line) or JSON (`{` / `[`), synthesize the expected header and continue.
4. If the prefix is HTML or unknown text, throw the same provider error.
5. Add a ChatGPT Codex-specific test so this does not regress again.

Likely fix location: `src/agents/provider-transport-fetch.ts`, replacing the broad `assertOpenAISdkStreamContentType()` behavior added by `9ead0ae921`.

Targeted tests:
- `openai-chatgpt-responses` / `chatgpt.com` returns HTTP 200, missing `content-type`, SSE body: accepted and header normalized.
- Same path returns HTTP 200, missing `content-type`, JSON body: accepted or JSON-to-SSE sanitizer still works.
- Custom OpenAI-compatible endpoint returns HTTP 200, `text/html`: rejected.
- Custom endpoint returns HTTP 200, missing `content-type`, HTML body: rejected.

Other affected paths from the same shared guard:
- `openai-responses`
- `openai-chatgpt-responses`
- `azure-openai-responses`
- `openai-completions`
- `anthropic-messages` when the request body has `stream: true` and `sanitizeSse` is not disabled

Probably not affected by this exact guard:
- The direct `src/llm/providers/openai-chatgpt-responses.ts` path, because it uses raw `fetch`, not `buildGuardedModelFetch()`.
- Google transport in practice, because its SSE path does not appear to rely on a request body field `stream: true`, which is what triggers the guard.

One important note: the current checkout I inspected no longer contains the broad `invalid_provider_content_type` guard, but `v2026.6.1` does. So the fix may already exist on another branch/main as a removal, but for the release bug I would still prefer the narrower body-sniffing fix over a pure deletion.

[user]
Can you pull upstream main and find out if this was already fixed, and if so, by what and whom?

[assistant]
I can’t safely `git pull` into this worktree because it has existing uncommitted changes. I’ll do the non-mutating equivalent for this investigation: fetch `upstream/main`, inspect `upstream/main` directly, and leave the worktree untouched.

[user]
What worktree are you in and what are the changes?

[assistant]
I’ll check the current checkout identity and summarize only the working-tree changes without modifying anything.

The checkout is the main `clawdbot` worktree, not one of the `.worktrees/*` checkouts. I’m going one level deeper now so I can distinguish “which files are dirty” from “what behavior those edits appear to implement.”

You’re in the primary `clawdbot` checkout, on branch `clawdbot-d02.1.9.1.28/31b-embedded-run-session-target`, not one of the `.worktrees/*` checkouts.

The dirty changes are unrelated to the OpenAI content-type investigation. They appear to be an in-progress session target refactor: making `sessionFile` optional at public run-param boundaries, adding `sessionTarget`, resolving it early in CLI/embedded runner paths, and adding a new helper at `src/agents/run-session-target.ts`.

Dirty files:

```text
M  src/agents/cli-runner.ts
M  src/agents/cli-runner/prepare.ts
M  src/agents/cli-runner/types.ts
M  src/agents/embedded-agent-runner/run.ts
M  src/agents/embedded-agent-runner/run/params.ts
M  src/agents/embedded-agent-runner/run/types.ts
?? src/agents/run-session-target.ts
```

Commands run and results:

```text
pwd
# current checkout is the repo root checkout

git branch --show-current
# clawdbot-d02.1.9.1.28/31b-embedded-run-session-target

git status -sb
# branch is dirty with the six modified files above plus untracked src/agents/run-session-target.ts

git worktree list
# current repo root entry is on clawdbot-d02.1.9.1.28/31b-embedded-run-session-target
# many sibling worktrees exist under .worktrees/ and codex worktree dirs

git diff --name-status
# six modified tracked files; untracked helper is not included by this command

git diff --stat
# 6 tracked files changed, 67 insertions, 15 deletions

git diff -- src/agents/...
# shows sessionTarget/sessionFile type and runner resolution changes

sed -n '1,240p' src/agents/run-session-target.ts
# new helper resolves storage-neutral AgentRunSessionTarget into current file-backed session target
```

[user]
Ok. Let’s commit those changes as a work in progress, and then switch branches to main.

[assistant]
I’ll use the repo’s git workflow rules here: stage only the session-target WIP files, commit with `scripts/committer`, confirm the tree is clean, then switch to `main`.

The branch and dirty set match what I reported. I’m staging those seven files and committing them as a WIP save point now.

WIP commit succeeded. I’m checking that nothing remains dirty before switching branches, because this checkout is the primary worktree.

The tree is clean at commit `4e0fde2627`. I’m switching to `main`; I won’t pull or rebase unless you ask.

Committed the WIP changes and switched the primary checkout to `main`.

```text
git status -sb
# dirty branch before commit:
# clawdbot-d02.1.9.1.28/31b-embedded-run-session-target
# 6 modified files + 1 untracked helper

git add src/agents/cli-runner.ts ...
git diff --staged --stat
# 7 files staged, 141 insertions, 15 deletions

scripts/committer "wip: save embedded run session target work" ...
# committed 4e0fde2627
# wip: save embedded run session target work

git status -sb
# clean on clawdbot-d02.1.9.1.28/31b-embedded-run-session-target

git switch main
# switched to main

git status -sb
# ## main...upstream/main [behind 543]
```

`main` is clean but behind `upstream/main` by 543 commits.

[user]
Let’s pull from upstream. Then let’s do that check to figure out if the bug we identified is on main still or has been fixed. I’m guessing that last branch actually had working code because it was behind main.

[assistant]
I’ll pull `main` fast-forward from `upstream/main`, then inspect the current transport path and the fix history. Since this is provider/Codex runtime debugging, I’ll also use the OpenClaw debugging workflow and check the sibling Codex source before giving a verdict.

`main` is clean and behind `upstream/main`, so a fast-forward pull is safe. After that I’ll compare current `main` against the shipped `v2026.6.1` regression commit.

The fast-forward completed to `7e5ea598c5`. I’m checking whether current `main` still contains the strict content-type assertion and then tracing the commit that removed or changed it if it’s gone.

Current pulled `main` still has the `invalid_provider_content_type` path. I’m reading the function and its tests now to verify whether it still rejects an HTTP 200 stream with no `content-type`, rather than just matching a stale symbol.

The pulled `main` still rejects missing content type before response wrapping, while the older WIP branch only logs `contentType=` and has no assertion. I’m doing two final checks: the OpenClaw test coverage around this guard, and the Codex SDK/source contract for the `/codex/responses` stream handling.

The regression commit is still the only commit that introduced that guard; no later commit on pulled `main` removes it. I’m checking Codex source now for how the ChatGPT `/backend-api/codex/responses` endpoint is selected and consumed, so the conclusion isn’t based only on OpenClaw wrapper assumptions.

The Codex source confirms ChatGPT auth defaults to `https://chatgpt.com/backend-api/codex` and posts `responses` with `Accept: text/event-stream`. I’m checking the local OpenAI SDK stream parser too, because OpenClaw’s guard sits in front of that SDK.

The OpenAI SDK parser enters SSE streaming based on the request’s stream option, not a preliminary content-type whitelist in OpenClaw’s fetch wrapper. I’m grabbing the exact SDK lines now and then I’ll summarize the answer with the commands/results you asked for.

Pulled `main` from `upstream`. The bug is **still present on current `main`** at `7e5ea598c582b5422aea01afab48485e2df65c4c`.

Your guess about the WIP branch is right for this bug: `clawdbot-d02.1.9.1.28/31b-embedded-run-session-target` does **not** contain the regression commit. It forked before `9ead0ae9219e87ba2c223ada385f380fea9415cc`, while current `main` does contain it.

Evidence:

- Current `main` still has `assertOpenAISdkStreamContentType` in `src/agents/provider-transport-fetch.ts:231`.
- For `HTTP 200` with body and missing `content-type`, `contentType` becomes `""` at `src/agents/provider-transport-fetch.ts:237`, fails the SSE/JSON check at `src/agents/provider-transport-fetch.ts:238`, then throws `invalid_provider_content_type` at `src/agents/provider-transport-fetch.ts:248`.
- That happens before OpenClaw wraps the response at `src/agents/provider-transport-fetch.ts:765`.
- The log line showing `contentType=` comes from the raw fetched response at `src/agents/provider-transport-fetch.ts:742`.
- The only matching history on `main` since the WIP branch base is still:
  `9ead0ae9219e87ba2c223ada385f380fea9415cc Peter Steinberger <[REDACTED_EMAIL]> fix: repair live model inference edge cases`
- The current test only covers rejecting explicit HTML at `src/agents/provider-transport-fetch.test.ts:167`; it does not cover a valid SSE stream with missing content-type.

Codex/source contract checked:

- `../codex/codex-rs/model-provider-info/src/lib.rs:231` defaults ChatGPT auth providers to `https://chatgpt.com/backend-api/codex`.
- `../codex/codex-rs/codex-api/src/endpoint/responses.rs:127` posts to `responses` and sets `Accept: text/event-stream`.
- Local OpenAI SDK streaming path uses `stream: true` to enter SSE parsing at `node_modules/openai/src/internal/parse.ts:24` and `Stream.fromSSEResponse` at `node_modules/openai/src/core/streaming.ts:34`; it does not require OpenClaw’s pre-parse content-type assertion.

Commands run:

```text
git status -sb
# clean on main, behind upstream/main

git pull --ff-only upstream main
# fast-forwarded main to 7e5ea598c5

rg -n "invalid_provider_content_type|assertOpenAISdkStreamContentType|isOpenAISdkStreamContentType|contentType=" ...
# current main still contains the guard and error path

git log --all -G 'invalid_provider_content_type|assertOpenAISdkStreamContentType|isOpenAISdkStreamContentType' -- src/agents/provider-transport-fetch.ts
# only commit found: 9ead0ae921... Peter Steinberger

git merge-base --is-ancestor 9ead0ae... clawdbot-d02.1.9.1.28/31b-embedded-run-session-target
# branch-does-not-contain-regression

git merge-base --is-ancestor 9ead0ae... main
# main-contains-regression

git diff --stat && git status -sb
# no local changes after investigation
```

Conclusion: upstream `main` has **not** already fixed this. The WIP branch looked working because it predates the regression, not because `main` has a fix.

[user]
Ok. Now let’s rearticulate the plan for the solution such that it preserves Peter’s intent and fixes this issue. Make sure to educate me on why it is the correct solution. Walk me through the problem and the flow so I can really understand this.

[assistant]
Here’s the solution I’d recommend: keep Peter’s guard, but change it from a **header-only gate** into an **evidence-based invalid-response gate**.

The bug is not that Peter added a guard. The bug is that the guard treats **missing `content-type`** as proof that the upstream response is invalid. That is too strong for the ChatGPT Codex endpoint.

**The Flow**
For `openai/gpt-5.5` with OAuth, OpenClaw maps the model into the Codex Responses transport:

1. `src/agents/simple-completion-transport.ts` normalizes ChatGPT base URLs to `https://chatgpt.com/backend-api/codex`.
2. `src/agents/provider-transport-stream.ts` maps `openai-chatgpt-responses` to `openclaw-openai-responses-transport`.
3. `src/agents/openai-transport-stream.ts` creates an OpenAI SDK client with `baseURL: model.baseUrl`.
4. The SDK calls `responses.create(...)`, producing:
   `https://chatgpt.com/backend-api/codex/responses`
5. OpenClaw’s guarded fetch sees the raw `Response`.
6. Current `main` logs `contentType=` from the raw upstream response at `src/agents/provider-transport-fetch.ts:742`.
7. Then it calls `assertOpenAISdkStreamContentType` at `src/agents/provider-transport-fetch.ts:758`.
8. That helper rejects anything that is not `text/event-stream` or JSON, including an empty header.

So a valid HTTP 200 stream from ChatGPT with no `content-type` gets rejected before the OpenAI SDK ever sees the body.

**Peter’s Intent**
Peter’s change was trying to catch a real failure mode: custom OpenAI-compatible providers sometimes return an HTML page with HTTP 200 when the base URL is wrong, for example missing `/v1`. Without a guard, the SDK tries to parse HTML as SSE and reports a vague connection/parse error. The new guard makes that operator error obvious:

```text
OpenAI-compatible streamed responses must be text/event-stream or JSON; got text/html...
Check the provider baseUrl...
```

That intent is correct. We should preserve it.

**What’s Wrong**
The guard currently assumes:

```text
missing content-type == invalid provider response
```

That is false for the ChatGPT Codex Responses endpoint in live operation. The upstream can return HTTP 200 with an empty `content-type` while still sending a valid SSE stream. OpenClaw then turns a usable response into `invalid_provider_content_type`.

Also, this is not OpenClaw response wrapping causing the empty header. The debug log happens before `buildManagedResponse`, and the guard also runs before wrapping. So `contentType=` is upstream/raw fetch state.

**Recommended Fix**
Change the guard to this policy:

1. If response is non-OK or has no body: do nothing here.
2. If `content-type` is `text/event-stream` or JSON: allow.
3. If `content-type` is explicitly incompatible, especially `text/html`: reject with Peter’s helpful baseUrl error.
4. If `content-type` is missing:
   - Peek a small prefix from a cloned response body.
   - If it looks like SSE or JSON, allow the original response through unchanged.
   - If it looks like HTML or another clearly wrong body, reject with the same helpful error.
   - If inconclusive, probably allow for native ChatGPT Codex and reject for custom OpenAI-compatible providers, or make that split explicit in the helper.

That preserves the original protection while fixing the live Codex path. The key is that **missing metadata is not proof of invalid content**. The body gives better evidence.

**Why This Is Correct**
The SDK streaming path is already designed around `stream: true`. In `node_modules/openai/src/internal/parse.ts`, the SDK enters `Stream.fromSSEResponse(...)` based on the request option, not based on a preliminary content-type whitelist. OpenClaw added an extra guard in front of the SDK. That guard should only reject when OpenClaw has strong evidence that the body is not stream/JSON-compatible.

So the corrected invariant becomes:

```text
Reject proven bad 200 responses.
Do not reject merely under-labeled 200 responses.
```

That is the right boundary.

**Tests I’d Add**
Target `src/agents/provider-transport-fetch.test.ts`:

- Keep existing test: streamed custom OpenAI response with `text/html` rejects.
- Add regression: streamed `openai-chatgpt-responses` / transport response with missing `content-type` but SSE body is accepted.
- Add regression: missing `content-type` but JSON body is accepted or synthesized consistently with existing JSON-to-SSE behavior.
- Add protection: missing `content-type` but HTML body still rejects, at least for custom OpenAI-compatible providers.
- Add one assertion that the original response body remains readable after the sniff.

This is a small fix in `src/agents/provider-transport-fetch.ts`, not a context-engine fix. Lossless/heartbeat only amplify the symptom because they create background LLM calls; they are not the root cause.

[user]
I think this is probably a good fix, but I have one question: during normal turns with the agent using the same codex provider, I get responses just fine. Why does it only manifest via the LLM.complete plugin api?

[assistant]
I’ll verify the two call paths before answering, because this distinction is exactly where the behavior diverges: normal agent streaming versus plugin `llm.complete` simple completion.

The split is confirmed: plugin `llm.complete` deliberately routes Codex through the OpenAI SDK transport wrapper, while normal Codex provider streaming uses the provider’s own ChatGPT Codex implementation. That explains why the guard only hits the plugin path.

Because “same provider/model” is not the same runtime path.

For normal agent turns, `openai/gpt-5.5` uses the Codex provider implementation directly: `src/llm/providers/openai-chatgpt-responses.ts`. That code builds the ChatGPT Codex URL itself, sends the request with `fetch`, and parses the body with its own SSE parser. It does **not** call `buildGuardedModelFetch`, so it never hits Peter’s content-type guard.

For plugin `llm.complete`, OpenClaw goes through the “simple completion” path. That path rewrites Codex models into the transport-aware OpenAI SDK bridge:

- `src/plugins/runtime/runtime-llm.runtime.ts:419` prepares the plugin completion model.
- `src/agents/simple-completion-runtime.ts:343` calls `prepareModelForSimpleCompletion`.
- `src/agents/simple-completion-transport.ts:52` detects `openai` + `openai-chatgpt-responses`.
- `src/agents/simple-completion-transport.ts:60` says why: static Codex catalogs omit credentials, so simple completion must use OpenClaw’s transport to apply resolved auth.
- That maps to `openclaw-openai-responses-transport`.
- `src/agents/openai-transport-stream.ts:1866` creates an OpenAI SDK client using `fetch: buildGuardedModelFetch(model)`.
- Then `src/agents/provider-transport-fetch.ts:758` runs the content-type assertion.

So the live failure is specific to this bridge:

```text
plugin llm.complete
→ simple completion
→ Codex model rewritten to OpenClaw transport alias
→ OpenAI SDK Responses client
→ buildGuardedModelFetch
→ content-type guard
→ rejects missing content-type
```

Normal agent turns are more like:

```text
agent turn
→ openai-chatgpt-responses provider
→ direct ChatGPT Codex fetch or websocket path
→ parse body as Codex SSE/websocket events
→ no content-type guard
```

That’s why normal turns can succeed while lossless/context-engine maintenance fails. Lossless compaction/summarization is using the runtime/plugin `llm.complete` capability, so it exercises the broken simple-completion transport path, not the normal agent-turn Codex path.

This also reinforces the right fix: do not “fix Codex” broadly. The direct Codex provider already tolerates this. The broken code is the generic OpenAI SDK transport guard in `src/agents/provider-transport-fetch.ts`. We should make that guard reject proven HTML/wrong-body responses while allowing missing-header responses that are valid SSE/JSON.

Commands run:

```text
rg -n "prepareModelForSimpleCompletion|llm.complete|openclaw-openai-responses-transport" ...
# confirmed plugin llm.complete routes through simple completion and the transport alias

nl -ba src/plugins/runtime/runtime-llm.runtime.ts
# llm.complete prepares and executes simple completion

nl -ba src/agents/simple-completion-runtime.ts
# simple completion calls prepareModelForSimpleCompletion

nl -ba src/agents/simple-completion-transport.ts
# Codex simple completion is intentionally rewritten to OpenClaw transport

nl -ba src/llm/providers/openai-chatgpt-responses.ts
# normal Codex provider uses direct fetch/parser or websocket path

rg -n "buildGuardedModelFetch" ...
# only the OpenAI SDK transport path uses the guarded fetch
```

[user]
Is there a good reason for these two different paths that do effectively the same thing?

[assistant]
Yes and no.

There is a good reason for two **abstractions**:

- Normal agent turns need the full provider runtime: tools, streaming events, Codex-specific headers, ChatGPT account id, reasoning, usage handling, websocket/cache behavior, provider-specific errors.
- Plugin `llm.complete` needs a small “give me text from this model” API that works across providers and applies resolved auth/policy without exposing the full agent runtime.

But there is not a great reason for two **Codex HTTP/SSE implementations** with different invariants.

They look like they do the same thing because, at the wire level, both can end up talking to:

```text
https://chatgpt.com/backend-api/codex/responses
```

But internally they are different:

- `src/llm/providers/openai-chatgpt-responses.ts` is a Codex-specific implementation. It knows ChatGPT Codex details directly and parses the stream itself.
- `src/agents/openai-transport-stream.ts` uses the generic OpenAI SDK transport and guarded fetch. It was reused by simple completion so OpenClaw could apply auth and transport behavior consistently.

The “good reason” was probably pragmatic: `llm.complete` needed a transport-aware path that could inject the resolved OAuth credential, and the generic OpenAI SDK transport already had that machinery. But the cost is exactly what we’re seeing: Codex has two paths with subtly different assumptions. One tolerates missing content-type. The other rejects it.

So my take:

- Short-term fix: correct `src/agents/provider-transport-fetch.ts` so the generic transport guard does not reject valid missing-header Codex streams.
- Better architectural follow-up: reduce the split. Either make simple completion use the provider-owned Codex simple stream path, or factor the shared Codex request/response handling so both normal turns and `llm.complete` share the same validation/parsing contract.

I would not do the larger unification as the emergency fix. It has more blast radius. But this bug is a useful signal that the duplicate Codex path should be cleaned up later.

[user]
All right, I think we should do two things. We should one, open an issue flagging the potential for refactoring that we have here, given that there are these two paths in the code base that have duplicate logic that works in one case and doesn't in another, and has generally divergent implementations to do the same thing for future reference. The second thing we should do is fix the blast radius on this issue for now. I agree, it's not the time to do a refactor. We need to fix this. We may, I don't know if it's worth looking into the way that the other path for normal agent turns works. I'm curious if that approach is, basically which approach is more robust, because that is the primary way that we talk to models in codecs through OpenCaw. And so maybe before we implement the new solution, take a look at that and make sure that we shouldn't just be doing things the way that that does. Maybe it's the case that Pete's change there is behavior that should be ported back to the normal path, but anyway, if that's not the case, then it's fine to proceed with the plan that you outlined previously about how to fix this.

[assistant]
I’ll treat the refactor item as a local pebbles issue unless you explicitly want a public GitHub issue. For the fix, I’ll branch off updated `main`, compare the direct Codex provider’s response handling against the transport guard, then patch only the low-blast-radius transport validation.

[user]
No, I meant the GitHub issue, not a Pebbles issue.

[assistant]
Got it. I’ll switch that from local planning to a public GitHub issue workflow, and I’ll follow the repo’s GitHub-create rules before posting anything.

I have GitHub auth and the checkout is clean on updated `main`. I’m checking for duplicate/refactor issues first, then I’ll create the GitHub issue without adding transcript logs unless there’s a safe transcript and you approve it.

No duplicate issue showed up from GitHub search; gitcrawl only surfaced an unrelated broad app issue. A local transcript candidate exists, but I’m not including transcript logs in the GitHub issue because you have not approved adding sanitized session logs.

Created the GitHub issue: https://github.com/openclaw/openclaw/issues/90193. I’m moving on to the narrow incident fix now: branch from current `main`, add a failing regression test for missing-header valid SSE, then patch the transport guard.

This is a normal source checkout, so I’ll use the repo’s targeted `pnpm test` wrapper for `src/agents/provider-transport-fetch.test.ts`. Before editing under `src/agents`, I’m reading the scoped guide and creating a fix branch.

I’m adding the regression test first. The test will prove that a streamed OpenAI/Codex transport response with no `content-type` but valid SSE is currently rejected.

I’m editing only the transport fetch test file first. This should fail on current code because the existing guard rejects the missing header before the SDK stream parser can read the SSE body.

The regression test is in place. I’m running just that test file now to confirm the failure before touching production code.

The targeted test command triggered pnpm’s dependency reconciliation before running tests, which is noisy but not a source change by itself. I’m waiting for the test result and will check git status immediately after.

pnpm is still fetching optional platform packages before the test starts. I’m going to let this command finish rather than leave a package-manager process running in the background.

No new output yet; the same test command is still active. I’m continuing to poll it and will stop broadening commands until it exits.

The local wrapper is still stuck in package fetch retries, not in Vitest. I’m waiting for the final retry outcome; after this I’ll switch to the repo’s `node scripts/run-vitest.mjs` harness to avoid more pnpm reconciliation.

The red test exposed a test-detail problem: `new Response(string)` adds `text/plain` automatically, while the live failure had an empty header. I’m correcting the regression test to use a stream body with no headers, then I’ll patch the guard.

I’m making the guard return a normalized response instead of only throwing. For missing headers it will sniff a small prefix from a clone, synthesize the right header for valid SSE/JSON, and still reject HTML or unknown bodies.

I also need to fix the test fixture itself so it really has no `content-type`. I’m using stream-backed responses for the missing-header cases because `Response(string)` auto-adds `text/plain`.

Production code and regression tests are patched. I’m running the same file through the node Vitest wrapper now to avoid another package reconciliation pass.

The targeted test file now passes: 71/71. I’m checking the diff and working tree for any side effects from the earlier pnpm reconciliation before deciding if further cleanup is needed.

I spotted one robustness gap in my first pass: a valid SSE prefix could be split across chunks, so the sniffer should keep reading up to its small cap instead of rejecting after a one-character partial prefix. I’m tightening that before broader validation.

The split-prefix case is now covered. I’m rerunning the focused file.

The focused regression suite is green after the split-prefix adjustment. I’m doing a final whitespace/status check and reviewing the exact diff before I either commit or hand it off.

The prod diff grew by 86 lines because the fix has to sniff a body clone without consuming the SDK’s stream. Before committing or handing off, repo policy requires a fresh autoreview for code changes, so I’m running that review now.

Autoreview is running; no output yet, which is normal for the helper in the first stretch. I’ll keep polling rather than starting more validation in parallel.

Autoreview is healthy and still running at the 60s heartbeat.

Autoreview returned clean with no actionable findings. I’m committing the two-file fix now using the repo committer, staging only the transport fetch source and test.

The fix is committed at `40880bca88`. I’m doing one final status check so the handoff is accurate.

Created the refactor follow-up GitHub issue and landed the narrow fix on a branch.

GitHub issue:
https://github.com/openclaw/openclaw/issues/90193

Fix branch:
`fix/codex-missing-content-type-transport`

Commit:
`40880bca88 fix: tolerate missing streamed response content type`

What changed:
- `src/agents/provider-transport-fetch.ts`
  - Keeps Peter’s explicit bad-content guard.
  - Still rejects explicit HTML/wrong content types.
  - For missing `content-type`, sniffs a small prefix from a cloned body.
  - If the body is valid SSE, normalizes the header to `text/event-stream`.
  - If the body is JSON, normalizes to `application/json` so the existing JSON-to-SSE path works.
  - If it is HTML or unknown, still rejects with `invalid_provider_content_type`.

- `src/agents/provider-transport-fetch.test.ts`
  - Covers missing-header valid SSE.
  - Covers split SSE prefix across chunks.
  - Covers missing-header JSON fallback.
  - Covers missing-header HTML rejection.
  - Preserves the existing explicit `text/html` rejection test.

Validation:

```text
gitcrawl search ...
#

...[transcript truncated to 50000 chars]

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M maintainer Maintainer-authored PR labels Jun 4, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Jun 4, 2026

Codex review: needs maintainer review before merge. Reviewed June 4, 2026, 2:53 PM ET / 18:53 UTC.

Summary
The branch changes guarded provider fetch to sniff missing-content-type streamed OpenAI-compatible responses, synthesize SSE or JSON content types for valid-looking bodies, and adds regression tests.

PR surface: Source +79, Tests +193. Total +272 across 2 files.

Reproducibility: yes. Source inspection shows current main and v2026.6.1 reject successful streamed SDK responses with a missing content-type, and the PR body includes red/green live Codex proof plus focused regression tests.

Review metrics: 1 noteworthy metric.

  • Stream validation gate: 1 hot-path response-validation gate changed. The diff changes the guarded fetch gate shared by streamed OpenAI-compatible SDK requests, so maintainer review should focus on availability behavior as well as unit coverage.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Wait for relevant current-head CI on the reviewed head.
  • Get maintainer sign-off on the provider transport availability risk.

Risk before merge

  • [P1] This changes a hot OpenAI-compatible streamed SDK response-validation path; an incorrect clone/sniff interaction could make provider calls hang or fail before SDK parsing.
  • [P1] The real behavior proof is strong for the live ChatGPT Codex missing-header path, but the full provider matrix was not covered in the PR body.
  • [P1] The protected maintainer label means merge should wait for explicit maintainer review and relevant current-head CI.

Maintainer options:

  1. Land after hot-path sign-off (recommended)
    If current-head CI is green, maintainers can accept the remaining provider-matrix risk because the focused tests and live Codex proof cover the incident shape.
  2. Require broader provider proof
    Ask for a broader OpenAI-compatible provider transport matrix or Testbox lane before merge if maintainers want evidence beyond the incident path.
  3. Keep the refactor separate
    Do not expand this PR into the duplicate Codex path refactor; leave that work in Refactor duplicate Codex Responses paths for agent turns and llm.complete #90193 if the narrow fix is otherwise acceptable.

Next step before merge

  • No automated repair is needed; the remaining action is maintainer review and CI for a protected hot provider transport path.

Security
Cleared: No concrete security or supply-chain concern was found; the diff only changes source/tests for response validation and keeps explicit and sniffed HTML rejection covered.

Review details

Best possible solution:

Land the narrow transport fix after maintainer sign-off and current-head CI, while keeping the broader duplicate Codex path refactor in #90193.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main and v2026.6.1 reject successful streamed SDK responses with a missing content-type, and the PR body includes red/green live Codex proof plus focused regression tests.

Is this the best way to solve the issue?

Yes. This is the best narrow incident fix because it changes the shared OpenAI SDK transport guard while preserving HTML rejection, and leaves the broader duplicate Codex-path refactor to the existing follow-up issue.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9dcf42472b8a.

Label changes

Label justifications:

  • P1: The PR addresses a live OpenAI/Codex transport failure that can break background agent maintenance and provider calls for real users.
  • merge-risk: 🚨 availability: A mistake in this streamed response sniffing path could make provider requests hang, reject valid streams, or pass invalid bodies before SDK parsing.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body includes redacted live before/after Codex logs showing missing content-type failure before the patch and successful SSE completion after it, plus focused regression test output.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes redacted live before/after Codex logs showing missing content-type failure before the patch and successful SSE completion after it, plus focused regression test output.
Evidence reviewed

PR surface:

Source +79, Tests +193. Total +272 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 83 4 +79
Tests 1 193 0 +193
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 276 4 +272

What I checked:

  • Repository policy read: Root and scoped agent policy were read fully; the Codex/dependency inspection and hot-path review requirements were applied to this provider transport PR. (AGENTS.md:7, 9dcf42472b8a)
  • Current main still has the rejecting path: Current main calls assertOpenAISdkStreamContentType for streamed SDK requests; a successful response with a body and a missing content-type is not accepted as SSE or JSON and falls into invalid_provider_content_type handling. (src/agents/provider-transport-fetch.ts:236, 9dcf42472b8a)
  • Latest release still has the same behavior: v2026.6.1 contains the same streamed content-type assertion and call site, so the PR is not superseded by the latest shipped release. (src/agents/provider-transport-fetch.ts:236, 2e08f0f4221f)
  • PR branch normalizes only missing-header valid-looking bodies: The PR branch sniffs a cloned response body for missing content-type responses and synthesizes text/event-stream or application/json only when the prefix looks like SSE or JSON; otherwise it preserves the existing error path. (src/agents/provider-transport-fetch.ts:299, 8335c78a89cb)
  • Regression coverage on the PR branch: The branch adds focused tests for missing-header SSE, still-open SSE, split-prefix SSE, missing-header JSON, and missing-header HTML rejection. (src/agents/provider-transport-fetch.test.ts:241, 8335c78a89cb)
  • Affected runtime entry path: Simple completions for openai-chatgpt-responses are rewritten to the internal openclaw-openai-responses-transport API and the OpenAI SDK client uses buildGuardedModelFetch, so this change is on the reported llm.complete path. (src/agents/simple-completion-transport.ts:57, 9dcf42472b8a)

Likely related people:

  • vincentkoc: Introduced and refined the provider transport adapter seam and adjacent managed transport runtime paths used by this PR. (role: feature-history owner; confidence: high; commits: 4798e125f40c, ea4265a82063, 1a13c34f5bc8; files: src/agents/provider-transport-stream.ts, src/agents/simple-completion-transport.ts, src/agents/openai-transport-stream.ts)
  • steipete: Current main blame for the streamed content-type guard points to a recent provider-transport-fetch maintenance commit, and recent history also shows adjacent provider fetch maintenance. (role: recent area contributor; confidence: medium; commits: fff04af46db4, 7740c4d53050; files: src/agents/provider-transport-fetch.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 4, 2026
@jalehman jalehman force-pushed the fix/codex-missing-content-type-transport branch from 8335c78 to 0d7f8ab Compare June 4, 2026 19:11
@jalehman
Copy link
Copy Markdown
Contributor Author

jalehman commented Jun 4, 2026

Land-ready verification for head 0d7f8abb17b43d95838792e1bd5c37da074969a1:

  • Local targeted proof before PR refresh: node scripts/run-vitest.mjs src/agents/provider-transport-fetch.test.ts -- --reporter=verbose passed.
  • Local rebase proof: git diff --check origin/main...HEAD passed after rebasing; changed files remain src/agents/provider-transport-fetch.ts and src/agents/provider-transport-fetch.test.ts.
  • Local prepare gates completed through pnpm build and pnpm check; full local pnpm test was intentionally stopped because CI is the test proof for the pinned head.
  • GitHub CI: gh pr checks 90205 --watch=false exited 0 for the pinned head.
  • OpenGrep PR-diff: https://github.com/openclaw/openclaw/actions/runs/26973663514/job/79595070927 passed.
  • Agent transcript was uploaded to the PR body.

Known proof gap: full pnpm test was not completed locally for the rebased head; GitHub CI is green and is being used as the full test proof.

@jalehman jalehman merged commit ab0a633 into main Jun 4, 2026
159 checks passed
@jalehman jalehman deleted the fix/codex-missing-content-type-transport branch June 4, 2026 19:17
@jalehman
Copy link
Copy Markdown
Contributor Author

jalehman commented Jun 4, 2026

Merged via squash.

Thanks @jalehman!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant