Skip to content

Codex Desktop dictation uses ASCII commas instead of Chinese full-width punctuation for Chinese speech #21892

@Charles-1234567890

Description

@Charles-1234567890

What issue are you seeing?

Codex Desktop voice dictation/transcription for Chinese speech inserts ASCII punctuation, especially ,, between Chinese clauses instead of Chinese full-width punctuation such as .

For Chinese users, this makes dictated prompts look unnatural and requires manual cleanup after nearly every dictation.

Environment

  • App: Codex Desktop for macOS
  • Bundle identifier: com.openai.codex
  • Codex app version: 26.506.31421
  • macOS app bundle version: 2620
  • Platform: macOS / Darwin arm64
  • User input language: Simplified Chinese
  • macOS selected input source: Simplified Chinese ITABC
  • macOS Dictation preferred language includes zh_CN

Steps to reproduce

  1. Open Codex Desktop on macOS.

  2. Use Codex built-in dictation / global dictation from the composer.

  3. Speak a normal Chinese sentence with natural clause pauses, for example:

    帮我看看为什么 Codex 的语音输入中文之间的逗号为什么是英文格式的,不是中文格式的逗号。

  4. Submit the dictation result into the composer.

Expected behavior

When the detected/transcribed language is Chinese, punctuation inserted between Chinese characters should use Chinese full-width punctuation, for example:

帮我看看为什么 Codex 的语音输入中文之间的逗号为什么是英文格式的,不是中文格式的逗号。

Actual behavior

Codex dictation may produce ASCII punctuation between Chinese clauses, for example:

帮我看看为什么 Codex 的语音输入中文之间的逗号为什么是英文格式的,不是中文格式的逗号。

Local investigation

This appears to be in the Codex dictation pipeline rather than the user's macOS Chinese input method.

Local inspection of the app bundle shows a Codex-specific global dictation route:

  • webview/assets/global-dictation-page-*.js records audio with MediaRecorder and posts the result through the Codex transcription flow.
  • webview/assets/use-recording-waveform-*.js posts audio to /transcribe.
  • The multipart builder supports an optional language field, but the global dictation call path I inspected did not appear to pass zh-CN / zh_CN into the transcription request.
  • The cleanup prompt says: Clean up dictation transcripts. Fix likely speech recognition mistakes, punctuation, capitalization, and formatting..., but does not appear to instruct the cleanup step to use locale-appropriate full-width Chinese punctuation when the transcript is Chinese.

So a likely fix would be one or both of:

  • Pass the detected/user language such as zh-CN to /transcribe for Chinese dictation.
  • Add locale-aware punctuation guidance in the post-transcription cleanup step, e.g. Chinese transcripts should prefer ,。?!;: between Chinese text rather than ASCII ,.?!;:.

Additional context

This issue is distinct from transcription failure. The transcription succeeds, but punctuation style is wrong for Chinese text.

Metadata

Metadata

Assignees

No one assigned

    Labels

    appIssues related to the Codex desktop appbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions