feat(cli): add codex doctor diagnostics by fcoury-oai · Pull Request #22336 · openai/codex

fcoury-oai · 2026-05-12T14:59:16Z

Why

Users and support need a single command that captures the local Codex runtime, configuration, auth, terminal, network, and state shape without asking the user to know which diagnostic depth to choose first. codex doctor now runs the useful checks by default and makes the detailed human output the default because the command is usually run when someone already needs context.

The command also targets concrete support failure modes we have seen while iterating on the design:

update-target mismatches like fix(tui): avoid update loops for mismatched npm installs #21956, where the installed package manager target can differ from the running executable
terminal and multiplexer issues that depend on TERM, tmux/zellij state, color handling, and TTY metadata
provider-specific HTTP/WebSocket connectivity, including ChatGPT WebSocket handshakes and API-key/provider endpoint reachability
local state/log SQLite integrity problems and large rollout directories
feedback reports that need an attached, redacted diagnostic snapshot without asking the user to run a second command

What Changed

Adds codex doctor as a grouped CLI diagnostic report with default detailed output and --summary for the compact view.
Adds stable report sections for Environment, Configuration, Updates, Connectivity, and Background Server, plus a top Notes block that promotes anomalies such as available updates, large rollout directories, optional MCP issues, and mixed auth signals.
Adds runtime provenance, install consistency, bundled/system search readiness, terminal/multiplexer metadata, config.toml parse status, auth mode details, sandbox details, feature flag summaries, update cache/latest-version state, app-server daemon state, SQLite integrity checks, rollout statistics, and provider-aware network diagnostics.
Adds ChatGPT WebSocket diagnostics that report the negotiated HTTP upgrade as HTTP 101 Switching Protocols and include timeout, DNS, auth, and provider context in detailed output.
Makes reachability provider-aware: API-key OpenAI setups check the API endpoint, ChatGPT auth checks the ChatGPT path, and custom/AWS/local providers check configured HTTP endpoints when available.
Adds structured, redacted JSON output where checks is keyed by check id and details is a key/value object for support tooling.
Integrates doctor with feedback uploads by attaching a best-effort codex-doctor-report.json report and adding derived Sentry tags for overall status and failing/warning checks.
Updates the TUI feedback consent copy so users can see that the doctor report is included when logs/diagnostics are uploaded.
Updates the CLI bug issue template to ask reporters for codex doctor --json and render pasted reports as JSON.

Example Output

The examples below are sanitized from local smoke runs with --no-color so the structure is reviewable in plain text.

`codex doctor`

Codex Doctor v0.0.0 · macos-aarch64

Notes
   ↑ updates      0.130.0 available (current 0.0.0, dismissed 0.128.0)
   ⚠ rollouts     1,526 active files · 2.53 GB on disk
   ⚠ mcp          MCP configuration has optional issues
   ⚠ auth         mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode
─────────────────────────────────────────────────────────────

Environment
  ✓ runtime      local debug build
      version                  0.0.0
      install method           other
      commit                   unknown
      executable               ~/code/codex.fcoury-doct…x-rs/target/debug/codex
  ✓ install      consistent
      context                  other
      managed by               npm: no · bun: no · package root —
      PATH entries (2)         ~/.local/share/mise/installs/node/24/bin/codex
                               ~/.local/share/mise/shims/codex
  ✓ search       ripgrep 15.1.0 (system, `rg`)
  ✓ terminal     Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color
      terminal                 Ghostty
      TERM_PROGRAM             ghostty
      terminal version         1.3.2-main-+b0f827665
      TERM                     xterm-256color
      multiplexer              tmux 3.6a
      tmux extended-keys       on
      tmux allow-passthrough   on
      tmux set-clipboard       on
  ✓ state        databases healthy
      CODEX_HOME               ~/.codex (dir)
      state DB                 ~/.codex/state_5.sqlite (file) · integrity ok
      log DB                   ~/.codex/logs_2.sqlite (file) · integrity ok
      active rollouts          1,526 files · 2.53 GB (avg 1.70 MB)
      archived rollouts        8 files · 3.84 MB (avg 491.11 KB)

Configuration
  ✓ config       loaded
      model                    gpt-5.5 · openai
      cwd                      ~/code/codex.fcoury-doctor/codex-rs
      config.toml              ~/.codex/config.toml
      config.toml parse        ok
      MCP servers              1
      feature flags            36 enabled · 7 overridden (full list with --all)
      overrides                code_mode, code_mode_only, memories, chronicle, goals, remote_control, prevent_idle_sleep
  ✓ auth         auth is configured
      auth storage mode        File
      auth file                ~/.codex/auth.json
      auth env vars present    OPENAI_API_KEY
      stored auth mode         chatgpt
      stored API key           false
      stored ChatGPT tokens    true
      stored agent identity    false
  ⚠ mcp          MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server.
      configured servers       1
      disabled servers         0
      streamable_http servers  1
      optional reachability    openaiDeveloperDocs: https://developers.openai.com/mcp (HEAD connect failed; GET connect failed)
  ✓ sandbox      restricted fs + restricted network · approval OnRequest
      approval policy          OnRequest
      filesystem sandbox       restricted
      network sandbox          restricted

Connectivity
  ✓ network      network-related environment looks readable
  ✓ websocket    connected (HTTP 101 Switching Protocols) · 15s timeout
      model provider           openai
      provider name            OpenAI
      wire API                 responses
      supports websockets      true
      connect timeout          15000 ms
      auth mode                chatgpt
      endpoint                 wss://chatgpt.com/backend-api/<redacted>
      DNS                      2 IPv4, 2 IPv6, first IPv6
      handshake result         HTTP 101 Switching Protocols
  ✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration.
      reachability mode        API key auth
      openai API               https://api.openai.com/v1 connect failed (required)

Background Server
  ○ app-server   not running (ephemeral mode)

─────────────────────────────────────────────────────────────
11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed

--summary compact output           --all expand truncated lists
--json redacted report

`codex doctor --summary`

Codex Doctor v0.0.0 · macos-aarch64

Notes
   ↑ updates      0.130.0 available (current 0.0.0, dismissed 0.128.0)
   ⚠ rollouts     1,526 active files · 2.53 GB on disk
   ⚠ mcp          MCP configuration has optional issues
   ⚠ auth         mixed auth signals: ChatGPT login plus API key env var; HTTP reachability uses API-key mode
─────────────────────────────────────────────────────────────

Environment
  ✓ runtime      local debug build
  ✓ install      consistent
  ✓ search       ripgrep 15.1.0 (system, `rg`)
  ✓ terminal     Ghostty 1.3.2-main-+b0f827665 · tmux 3.6a · TERM=xterm-256color
  ✓ state        databases healthy

Configuration
  ✓ config       loaded
  ✓ auth         auth is configured
  ⚠ mcp          MCP configuration has optional issues — Set the missing MCP env vars or disable the affected server.
  ✓ sandbox      restricted fs + restricted network · approval OnRequest

Updates
  ✓ updates      update configuration is locally consistent

Connectivity
  ✓ network      network-related environment looks readable
  ✓ websocket    connected (HTTP 101 Switching Protocols) · 15s timeout
  ✗ reachability one or more required provider endpoints are unreachable over HTTP — Check proxy, VPN, firewall, DNS, and custom CA configuration.

Background Server
  ○ app-server   not running (ephemeral mode)

─────────────────────────────────────────────────────────────
11 ok · 1 idle · 4 notes · 1 warn · 1 fail failed

Run codex doctor without --summary for detailed diagnostics.
--all expand truncated lists       --json redacted report

`codex doctor --json` shape

{
  "schema_version": 1,
  "overall_status": "fail",
  "checks": {
    "runtime.provenance": {
      "id": "runtime.provenance",
      "category": "Environment",
      "status": "ok",
      "summary": "local debug build",
      "details": {
        "version": "0.0.0",
        "install method": "other",
        "commit": "unknown"
      }
    },
    "sandbox.helpers": {
      "id": "sandbox.helpers",
      "category": "Configuration",
      "status": "ok",
      "summary": "restricted fs + restricted network · approval OnRequest",
      "details": {
        "approval policy": "OnRequest",
        "filesystem sandbox": "restricted",
        "network sandbox": "restricted"
      }
    }
  }
}

`/feedback` new sentry attachment

New section in CLI issue template

How to Test

Run cargo run --bin codex -- doctor --no-color.
Confirm the detailed report is the default and includes promoted Notes, grouped sections, terminal details, state DB integrity, rollout stats, provider reachability, WebSocket diagnostics, and app-server status.
Run cargo run --bin codex -- doctor --summary --no-color.
Confirm the compact view keeps the same sections and summary counts but omits detailed key/value rows.
Run cargo run --bin codex -- doctor --json.
Confirm the output is redacted JSON, checks is an object keyed by check id, and each check's details is a key/value object.
Preview the CLI bug issue template and confirm the Codex doctor report field appears after the terminal field, asks for codex doctor --json, and renders pasted output as JSON.
Start a feedback flow that includes logs.
Confirm the upload consent copy lists codex-doctor-report.json alongside the log attachments.

Targeted tests:

cargo test -p codex-cli doctor
cargo test -p codex-app-server doctor_report_tags_summarize_status_counts
cargo test -p codex-feedback
cargo test -p codex-tui feedback_view
just argument-comment-lint
git diff --check

etraut-openai

This is a great feature! I've been thinking about doing something like this for a while but never got around to it. Thanks for doing this!

Here are some thoughts & questions:

What does "running local" mean? Does that mean it's not installed using npm, bun, or homebrew? Or does it mean that it's not connected to a remote app server?
I noticed that you're checking the connectivity of the "OpenAI endpoints" via HTTP. Do you check for websocket connectivity? That's a common complaint from customers.
Do you have any checks for Azure connectivity? We receive a lot of bugs about Azure endpoint connectivity.
I wonder if there's an opportunity to integrate this with the existing /feedback mechanism. For example, would it make sense to run this diagnostic report and upload it via sentry? Could be done as a follow-on feature.
Another common error that we're seeing lately has to do with the integrity of the state db and the log db (both SQLite). Is there an integrity check that we could run on these and include it in the report?
Another thing that might be useful to include in the report is some stats about local rollout files - both non-archived and archived: counts, total disk space consumed, average rollout size.
It might be useful in the configuration section to output which feature flags are enabled in the config. Users sometimes get into trouble by enabling features that are not yet ready for use.
I find the verbose output much more useful than the non-verbose output. I'm wondering if we should always output verbose for this feature. If you're running this, you probably want as much information as possible. What do you think?
It runs locally, so it won't help diagnose problems when connecting remotely. For example, if a remote app-server is having problems connecting to the responses endpoint, this won't help. I think that OK. Just pointing it out.

fcoury-oai · 2026-05-13T19:49:44Z

@etraut-openai thanks for the detailed pass. Here’s what changed in response to each item:

What does "running local" mean? Does that mean it's not installed using npm, bun, or homebrew? Or does it mean that it's not connected to a remote app server?

Updated the wording to avoid the ambiguous “local” label. Doctor now reports runtime provenance more explicitly, e.g. local debug build, and keeps install/package-manager context in the install section.

I noticed that you're checking the connectivity of the "OpenAI endpoints" via HTTP. Do you check for websocket connectivity? That's a common complaint from customers.

Added a WebSocket diagnostic. It checks the active provider/auth path when WebSockets are supported, reports DNS shape, timeout, endpoint, auth mode, and the handshake result as connected (HTTP 101 Switching Protocols) when the upgrade succeeds.

Do you have any checks for Azure connectivity? We receive a lot of bugs about Azure endpoint connectivity.

Added provider-aware reachability. Doctor no longer hard-codes OpenAI/ChatGPT as the only meaningful endpoints. It now uses the active provider configuration and probes the configured provider endpoint when present, including custom/Azure-style endpoints. Deeper Azure-specific validation for deployment names, API versions, and Azure auth conventions is still a follow-up.

I wonder if there's an opportunity to integrate this with the existing /feedback mechanism. For example, would it make sense to run this diagnostic report and upload it via sentry? Could be done as a follow-on feature.

Implemented this integration. When users consent to upload logs through /feedback, the app-server runs codex doctor --json best-effort, attaches it as codex-doctor-report.json, and adds doctor-derived Sentry tags such as overall status, warning/fail counts, and failing check ids. The upload consent UI also lists the doctor report explicitly.

Another common error that we're seeing lately has to do with the integrity of the state db and the log db (both SQLite). Is there an integrity check that we could run on these and include it in the report?

Added SQLite integrity checks for both state and log databases using PRAGMA integrity_check. Existing DBs that are corrupt or unreadable now fail doctor.

Another thing that might be useful to include in the report is some stats about local rollout files - both non-archived and archived: counts, total disk space consumed, average rollout size.

Added rollout stats for active and archived rollouts, including file count, total disk usage, and average size. Large active rollout usage is promoted into the Notes block.

It might be useful in the configuration section to output which feature flags are enabled in the config. Users sometimes get into trouble by enabling features that are not yet ready for use.

Added feature flag details to the config section: enabled count, overridden count, explicit overrides, and legacy alias mappings. The default human output keeps long lists truncated, with --all to expand.

I find the verbose output much more useful than the non-verbose output. I'm wondering if we should always output verbose for this feature. If you're running this, you probably want as much information as possible. What do you think?

Changed doctor to be detailed by default and added --summary for compact output. The default output now uses hierarchy, Notes, grouped sections, and two-column details so the extra information is still scannable.

It runs locally, so it won't help diagnose problems when connecting remotely. For example, if a remote app-server is having problems connecting to the responses endpoint, this won't help. I think that OK. Just pointing it out.

Kept this PR scoped to local diagnostics. The report now makes local runtime/app-server status clearer, but remote app-server diagnostics remain a follow-up rather than being mixed into this local command.

Additional improvements added while addressing the feedback:

Added progress output while doctor runs, with JSON/summary modes staying quiet.
Added redacted structured JSON output with checks keyed by stable check id and details represented as structured fields.
Added safer redaction for URLs, including credentials in userinfo, query strings, fragments, and secret-looking path segments.
Added MCP diagnostics for disabled servers, optional vs required reachability, stdio command resolution, executable permissions, and invalid remote-sourced env vars in local stdio configs.
Added provider/auth correctness fixes so API-key users do not require ChatGPT reachability, provider-specific auth is respected, and malformed stored auth is detected.
Added clearer warning/failure output: non-ok checks now carry a cause, measured/expected values, offending detail fields, and a concrete remedy.
Added terminal diagnostics for TERM, locale, terminal size, TERMINFO, tmux, zellij, and related terminal environment signals.
Added CLI issue template guidance asking users to paste codex doctor --json output.

etraut-openai · 2026-05-13T19:54:27Z

The upload consent UI also lists the doctor report explicitly

I think the consent dialog is a per-client UI. Let's make sure that this change to the /feedback flow doesn't break the IDE extension or app.

fcoury-oai force-pushed the fcoury/doctor branch from 0bd72f9 to 5a82696 Compare May 12, 2026 15:21

fcoury-oai changed the title ~~Add richer codex doctor diagnostics~~ Run full codex doctor diagnostics by default May 12, 2026

fcoury-oai changed the title ~~Run full codex doctor diagnostics by default~~ feat(doctor): add codex doctor diagnostics May 12, 2026

fcoury-oai changed the title ~~feat(doctor): add codex doctor diagnostics~~ feat(cli): add codex doctor diagnostics May 12, 2026

canvrno-oai approved these changes May 13, 2026

View reviewed changes

This was referenced May 13, 2026

📊 AI CLI 工具社区动态日报 2026-05-13 gsscsd/big_model_radar#336

Open

📊 AI CLI 工具社区动态日报 2026-05-13 ivanweng2077/big_model_radar#35

Open

etraut-openai approved these changes May 13, 2026

View reviewed changes

fcoury-oai added 22 commits May 13, 2026 15:55

feat(tui): add codex doctor command

4f895de

feat(doctor): add richer diagnostics

545a672

feat(doctor): run full diagnostics by default

b6116bc

refactor(doctor): trim diagnostic churn

d54a0b3

docs(doctor): document diagnostic contracts

c3a2f14

refactor(doctor): remove single-use helpers

56e5ce6

refactor(doctor): remove skipped status

3521f79

fix(doctor): reduce reachability false negatives

f723184

fix(doctor): respect provider-specific diagnostics

0e30144

fix(doctor): honor overrides and redact reports

29f878e

fix(doctor): tighten auth and mcp diagnostics

51ef8a9

fix(doctor): harden local diagnostics

9b9f4f7

fix(doctor): flag remote mcp env vars

9bd38fa

feat(doctor): diagnose websocket connectivity

65b4649

feat(doctor): add provider and state diagnostics

92ca361

feat(doctor): make detailed output the default

98819fd

feat(doctor): improve human output scanability

4b32589

feat(doctor): show progress while checks run

09962f0

fix(doctor): refine human report details

4cd24c3

feat(doctor): add terminal diagnostics

fed4cf5

fix(doctor): clarify websocket handshake status

f49abbc

feat(doctor): attach diagnostics to feedback

8f97ab7

fcoury-oai added 2 commits May 13, 2026 15:59

docs(cli): request doctor report in bug template

0c4d671

docs(doctor): clarify diagnostic report contracts

8f051ee

fcoury-oai force-pushed the fcoury/doctor branch from cd7f219 to 8f051ee Compare May 13, 2026 18:59

fcoury-oai added 2 commits May 13, 2026 16:30

fix(doctor): explain warning causes

3bbcc6c

test(tui): update feedback consent snapshots

0c83267

fcoury-oai added 2 commits May 13, 2026 17:00

fix(doctor): gate stale socket status on unix

a3dfeea

fix(doctor): validate provider route prefixes

0761cff

fcoury-oai enabled auto-merge (squash) May 13, 2026 21:21

fcoury-oai merged commit 9798eb3 into main May 13, 2026
31 checks passed

fcoury-oai deleted the fcoury/doctor branch May 13, 2026 21:23

github-actions Bot locked and limited conversation to collaborators May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): add codex doctor diagnostics#22336

feat(cli): add codex doctor diagnostics#22336
fcoury-oai merged 28 commits into
mainfrom
fcoury/doctor

fcoury-oai commented May 12, 2026 •

edited

Loading

Uh oh!

etraut-openai left a comment

Uh oh!

fcoury-oai commented May 13, 2026

Uh oh!

etraut-openai commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fcoury-oai commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What Changed

Example Output

codex doctor

codex doctor --summary

codex doctor --json shape

/feedback new sentry attachment

New section in CLI issue template

How to Test

Uh oh!

etraut-openai left a comment

Choose a reason for hiding this comment

Uh oh!

fcoury-oai commented May 13, 2026

Uh oh!

etraut-openai commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fcoury-oai commented May 12, 2026 •

edited

Loading

`codex doctor`

`codex doctor --summary`

`codex doctor --json` shape

`/feedback` new sentry attachment