Codex Voice Agent is a local desktop app that lets you talk to Codex through OpenAI
Realtime voice. Realtime handles the voice/control layer; codex app-server
owns local execution, approvals, questions, project state, and tool work.
- Compact voice window for speaking requests to Codex.
- Project creation, resume, summarize, interrupt, and steer flows for
codex app-serverturns. - Per-project workspaces stored under
~/Documents/Codex Voice Agent Projects/. - Approval and tool-question forwarding between Codex, the UI, and the voice layer.
- Debug window for project state, chats, runtime status, events, pending approvals, and manual send/steer controls.
- Node.js and npm
- Codex CLI on
PATHwithcodex app-serversupport - OpenAI API key for Realtime voice
Install dependencies.
npm installConfigure an OpenAI API key in one of two ways.
- Set
OPENAI_API_KEYin the environment before launching the app. - Add the key from the app menu after launch. The app can store it through the local key store.
Optional Realtime settings:
export OPENAI_REALTIME_MODEL=gpt-realtime-2
export OPENAI_REALTIME_VOICE=marin
export OPENAI_REALTIME_REASONING_EFFORT=lowThe app also exposes a Realtime model selector in Settings, with
gpt-realtime-2 and gpt-realtime-1.5 available. GPT Realtime 2 is the default
and supports minimal, low, medium, high, or xhigh reasoning effort for voice
sessions.
The app uses OpenAI's ephemeral-token WebRTC path: the desktop main process
creates a Realtime client secret with the saved or environment API key, and the
renderer posts browser SDP to /v1/realtime/calls with that short-lived secret.
It does not use the unified server-side multipart /v1/realtime/calls sample.
Run the app in development mode.
npm run devTypecheck.
npm run typecheckBuild.
npm run buildPreview the built desktop app.
npm run previewsrc/main/ Desktop main process, Codex bridge, Realtime secret creation,
project store, and orchestration.
src/preload/ Context-isolated renderer bridge.
src/renderer/ React UI and browser-side Realtime client.
src/shared/ Shared TypeScript types.
The voice layer should stay narrow. It passes spoken intent, status requests, approval answers, and steering instructions to Codex; it should not inspect the computer, infer local state, or perform the task itself.
MIT. See LICENSE.
