feat(cli): add codex doctor diagnostics#22336
Conversation
etraut-openai
left a comment
There was a problem hiding this comment.
This is a great feature! I've been thinking about doing something like this for a while but never got around to it. Thanks for doing this!
Here are some thoughts & questions:
- What does "running local" mean? Does that mean it's not installed using npm, bun, or homebrew? Or does it mean that it's not connected to a remote app server?
- I noticed that you're checking the connectivity of the "OpenAI endpoints" via HTTP. Do you check for websocket connectivity? That's a common complaint from customers.
- Do you have any checks for Azure connectivity? We receive a lot of bugs about Azure endpoint connectivity.
- I wonder if there's an opportunity to integrate this with the existing
/feedbackmechanism. For example, would it make sense to run this diagnostic report and upload it via sentry? Could be done as a follow-on feature. - Another common error that we're seeing lately has to do with the integrity of the state db and the log db (both SQLite). Is there an integrity check that we could run on these and include it in the report?
- Another thing that might be useful to include in the report is some stats about local rollout files - both non-archived and archived: counts, total disk space consumed, average rollout size.
- It might be useful in the configuration section to output which feature flags are enabled in the config. Users sometimes get into trouble by enabling features that are not yet ready for use.
- I find the verbose output much more useful than the non-verbose output. I'm wondering if we should always output verbose for this feature. If you're running this, you probably want as much information as possible. What do you think?
- It runs locally, so it won't help diagnose problems when connecting remotely. For example, if a remote app-server is having problems connecting to the responses endpoint, this won't help. I think that OK. Just pointing it out.
|
@etraut-openai thanks for the detailed pass. Here’s what changed in response to each item:
Updated the wording to avoid the ambiguous “local” label. Doctor now reports runtime provenance more explicitly, e.g.
Added a WebSocket diagnostic. It checks the active provider/auth path when WebSockets are supported, reports DNS shape, timeout, endpoint, auth mode, and the handshake result as
Added provider-aware reachability. Doctor no longer hard-codes OpenAI/ChatGPT as the only meaningful endpoints. It now uses the active provider configuration and probes the configured provider endpoint when present, including custom/Azure-style endpoints. Deeper Azure-specific validation for deployment names, API versions, and Azure auth conventions is still a follow-up.
Implemented this integration. When users consent to upload logs through
Added SQLite integrity checks for both state and log databases using
Added rollout stats for active and archived rollouts, including file count, total disk usage, and average size. Large active rollout usage is promoted into the Notes block.
Added feature flag details to the config section: enabled count, overridden count, explicit overrides, and legacy alias mappings. The default human output keeps long lists truncated, with
Changed doctor to be detailed by default and added
Kept this PR scoped to local diagnostics. The report now makes local runtime/app-server status clearer, but remote app-server diagnostics remain a follow-up rather than being mixed into this local command. Additional improvements added while addressing the feedback:
|
I think the consent dialog is a per-client UI. Let's make sure that this change to the |
Why
Users and support need a single command that captures the local Codex runtime, configuration, auth, terminal, network, and state shape without asking the user to know which diagnostic depth to choose first.
codex doctornow runs the useful checks by default and makes the detailed human output the default because the command is usually run when someone already needs context.The command also targets concrete support failure modes we have seen while iterating on the design:
TERM, tmux/zellij state, color handling, and TTY metadataWhat Changed
codex doctoras a grouped CLI diagnostic report with default detailed output and--summaryfor the compact view.config.tomlparse status, auth mode details, sandbox details, feature flag summaries, update cache/latest-version state, app-server daemon state, SQLite integrity checks, rollout statistics, and provider-aware network diagnostics.HTTP 101 Switching Protocolsand include timeout, DNS, auth, and provider context in detailed output.checksis keyed by check id anddetailsis a key/value object for support tooling.codex-doctor-report.jsonreport and adding derived Sentry tags for overall status and failing/warning checks.codex doctor --jsonand render pasted reports as JSON.Example Output
The examples below are sanitized from local smoke runs with
--no-colorso the structure is reviewable in plain text.codex doctorcodex doctor --summarycodex doctor --jsonshape{ "schema_version": 1, "overall_status": "fail", "checks": { "runtime.provenance": { "id": "runtime.provenance", "category": "Environment", "status": "ok", "summary": "local debug build", "details": { "version": "0.0.0", "install method": "other", "commit": "unknown" } }, "sandbox.helpers": { "id": "sandbox.helpers", "category": "Configuration", "status": "ok", "summary": "restricted fs + restricted network · approval OnRequest", "details": { "approval policy": "OnRequest", "filesystem sandbox": "restricted", "network sandbox": "restricted" } } } }/feedbacknew sentry attachmentNew section in CLI issue template
How to Test
cargo run --bin codex -- doctor --no-color.cargo run --bin codex -- doctor --summary --no-color.cargo run --bin codex -- doctor --json.checksis an object keyed by check id, and each check'sdetailsis a key/value object.Codex doctor reportfield appears after the terminal field, asks forcodex doctor --json, and renders pasted output as JSON.codex-doctor-report.jsonalongside the log attachments.Targeted tests:
cargo test -p codex-cli doctorcargo test -p codex-app-server doctor_report_tags_summarize_status_countscargo test -p codex-feedbackcargo test -p codex-tui feedback_viewjust argument-comment-lintgit diff --check