Skip to content

False positive "cybersecurity risk" flags on routine sysadmin/devops work in Codex CLI #19738

@atreidesmodi

Description

@atreidesmodi

Description

Codex CLI (gpt-5-codex, ChatGPT Pro) is repeatedly flagging routine sysadmin and web-development work as cybersecurity risk on infrastructure I personally own and operate. The flags have escalated to an account-level state:

"Your conversations have multiple flags for possible cybersecurity risk. Responses may take longer because extra safety checks are on."

The banner has been persistent for several days and the classifier is now scoring on conversational context rather than command intent.

Evidence the classifier is context-scoped, not command-scoped

On 2026-04-25 ~19:58, immediately after a clean Commerce7 image-upload batch, the very next turn was:

date +%Y-%m-%dT%H:%M:%S%z

That command was flagged for cybersecurity risk. A bare date invocation is not a credible cyber signal under any reasonable threat model — which means the classifier is operating on conversation history, not the prompt in front of it. Once a session is "poisoned" by trigger-keyword adjacency, every subsequent turn inherits the flag regardless of content. This should be reproducible against any session that has touched WordPress, cPanel, or staging-deploy vocabulary earlier in the same conversation.

Trigger keyword cluster (Codex's own self-diagnosis)

"production clone," "staging," "secret store," "Facebook signin," "traverse," "blocked public route," "SSH deploys," "WordPress operations." Stripped of ownership context, these read like credentialed web automation. With ownership context — which is in the session preamble — they're vanilla devops on personally-owned assets.

Confirmed false-positive surface (all on owned infrastructure)

  • WordPress staging → production deploys (rsync, WP-CLI, plugin/theme updates)
  • cPanel/WHM administration of my own hosting accounts
  • Commerce7 e-commerce tenant administration
  • LAN device inventory on my home/office network
  • The date command above

What I've already done

  • Multiple in-CLI /feedback → "Safety Check" submissions with full session logs
  • OpenAI support case #08180548 — three email turns, escalated to a Tier 4 response that did not engage with the technical claim
  • Declined to enroll in Trusted Access for Cyber. The category is wrong. Trusted Access exists for security research; I am administering my own WordPress site. Asking devops users to submit government ID and biometrics to run wp plugin update is the wrong fix for a classifier tuning problem — and per the sibling reports below, it doesn't actually fix it.

Sibling reports — this is a cluster, not an isolated complaint

Three of these explicitly note that Trusted Access enrollment did not stop the flagging. Most were closed without engaging the underlying classifier-tuning problem. Worth treating these as one signal rather than five tickets.

Impact

Codex CLI was my daily driver. It no longer is. I've migrated active operational workloads to Claude Opus 4.7 + a local multi-agent setup, and the migration is complete — this report is feedback, not leverage. I'm filing it because the classifier behavior is diagnosable, the false-positive rate is high, and the next professional user about to hit the same wall deserves better triage than case #08180548 received.

Capitalism balances books. I'd rather Codex stay competitive — the product is good when the harness lets it work — but that's an OpenAI decision, not mine.

Ask

Tune the cyber_policy classifier so conversational-context drift doesn't escalate benign commands. Reopen #08180548 with engineering, or provide a notification channel when the classifier is retuned, so affected users know when to retest.

Happy to provide additional session logs, redacted prompt traces, or a 30-minute calibration call.

Environment: macOS 26.4.1, Codex CLI 0.125.0, ChatGPT Pro (not Enterprise)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLIIssues related to the Codex CLIbugSomething isn't workingsafety-checkIssues related to safety and abuse checks

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions