Description
Codex CLI (gpt-5-codex, ChatGPT Pro) is repeatedly flagging routine sysadmin and web-development work as cybersecurity risk on infrastructure I personally own and operate. The flags have escalated to an account-level state:
"Your conversations have multiple flags for possible cybersecurity risk. Responses may take longer because extra safety checks are on."
The banner has been persistent for several days and the classifier is now scoring on conversational context rather than command intent.
Evidence the classifier is context-scoped, not command-scoped
On 2026-04-25 ~19:58, immediately after a clean Commerce7 image-upload batch, the very next turn was:
date +%Y-%m-%dT%H:%M:%S%z
That command was flagged for cybersecurity risk. A bare date invocation is not a credible cyber signal under any reasonable threat model — which means the classifier is operating on conversation history, not the prompt in front of it. Once a session is "poisoned" by trigger-keyword adjacency, every subsequent turn inherits the flag regardless of content. This should be reproducible against any session that has touched WordPress, cPanel, or staging-deploy vocabulary earlier in the same conversation.
Trigger keyword cluster (Codex's own self-diagnosis)
"production clone," "staging," "secret store," "Facebook signin," "traverse," "blocked public route," "SSH deploys," "WordPress operations." Stripped of ownership context, these read like credentialed web automation. With ownership context — which is in the session preamble — they're vanilla devops on personally-owned assets.
Confirmed false-positive surface (all on owned infrastructure)
- WordPress staging → production deploys (rsync, WP-CLI, plugin/theme updates)
- cPanel/WHM administration of my own hosting accounts
- Commerce7 e-commerce tenant administration
- LAN device inventory on my home/office network
- The
date command above
What I've already done
- Multiple in-CLI
/feedback → "Safety Check" submissions with full session logs
- OpenAI support case #08180548 — three email turns, escalated to a Tier 4 response that did not engage with the technical claim
- Declined to enroll in Trusted Access for Cyber. The category is wrong. Trusted Access exists for security research; I am administering my own WordPress site. Asking devops users to submit government ID and biometrics to run
wp plugin update is the wrong fix for a classifier tuning problem — and per the sibling reports below, it doesn't actually fix it.
Sibling reports — this is a cluster, not an isolated complaint
Three of these explicitly note that Trusted Access enrollment did not stop the flagging. Most were closed without engaging the underlying classifier-tuning problem. Worth treating these as one signal rather than five tickets.
Impact
Codex CLI was my daily driver. It no longer is. I've migrated active operational workloads to Claude Opus 4.7 + a local multi-agent setup, and the migration is complete — this report is feedback, not leverage. I'm filing it because the classifier behavior is diagnosable, the false-positive rate is high, and the next professional user about to hit the same wall deserves better triage than case #08180548 received.
Capitalism balances books. I'd rather Codex stay competitive — the product is good when the harness lets it work — but that's an OpenAI decision, not mine.
Ask
Tune the cyber_policy classifier so conversational-context drift doesn't escalate benign commands. Reopen #08180548 with engineering, or provide a notification channel when the classifier is retuned, so affected users know when to retest.
Happy to provide additional session logs, redacted prompt traces, or a 30-minute calibration call.
Environment: macOS 26.4.1, Codex CLI 0.125.0, ChatGPT Pro (not Enterprise)

Description
Codex CLI (
gpt-5-codex, ChatGPT Pro) is repeatedly flagging routine sysadmin and web-development work as cybersecurity risk on infrastructure I personally own and operate. The flags have escalated to an account-level state:The banner has been persistent for several days and the classifier is now scoring on conversational context rather than command intent.
Evidence the classifier is context-scoped, not command-scoped
On 2026-04-25 ~19:58, immediately after a clean Commerce7 image-upload batch, the very next turn was:
That command was flagged for cybersecurity risk. A bare
dateinvocation is not a credible cyber signal under any reasonable threat model — which means the classifier is operating on conversation history, not the prompt in front of it. Once a session is "poisoned" by trigger-keyword adjacency, every subsequent turn inherits the flag regardless of content. This should be reproducible against any session that has touched WordPress, cPanel, or staging-deploy vocabulary earlier in the same conversation.Trigger keyword cluster (Codex's own self-diagnosis)
"production clone," "staging," "secret store," "Facebook signin," "traverse," "blocked public route," "SSH deploys," "WordPress operations." Stripped of ownership context, these read like credentialed web automation. With ownership context — which is in the session preamble — they're vanilla devops on personally-owned assets.
Confirmed false-positive surface (all on owned infrastructure)
datecommand aboveWhat I've already done
/feedback→ "Safety Check" submissions with full session logswp plugin updateis the wrong fix for a classifier tuning problem — and per the sibling reports below, it doesn't actually fix it.Sibling reports — this is a cluster, not an isolated complaint
Three of these explicitly note that Trusted Access enrollment did not stop the flagging. Most were closed without engaging the underlying classifier-tuning problem. Worth treating these as one signal rather than five tickets.
Impact
Codex CLI was my daily driver. It no longer is. I've migrated active operational workloads to Claude Opus 4.7 + a local multi-agent setup, and the migration is complete — this report is feedback, not leverage. I'm filing it because the classifier behavior is diagnosable, the false-positive rate is high, and the next professional user about to hit the same wall deserves better triage than case #08180548 received.
Capitalism balances books. I'd rather Codex stay competitive — the product is good when the harness lets it work — but that's an OpenAI decision, not mine.
Ask
Tune the cyber_policy classifier so conversational-context drift doesn't escalate benign commands. Reopen #08180548 with engineering, or provide a notification channel when the classifier is retuned, so affected users know when to retest.
Happy to provide additional session logs, redacted prompt traces, or a 30-minute calibration call.
Environment: macOS 26.4.1, Codex CLI 0.125.0, ChatGPT Pro (not Enterprise)