Skip to content

Warn on invalid UTF-8 in AGENTS.md files#23232

Merged
etraut-openai merged 5 commits into
mainfrom
etraut/warn-invalid-global-agents-md
May 20, 2026
Merged

Warn on invalid UTF-8 in AGENTS.md files#23232
etraut-openai merged 5 commits into
mainfrom
etraut/warn-invalid-global-agents-md

Conversation

@etraut-openai
Copy link
Copy Markdown
Collaborator

@etraut-openai etraut-openai commented May 18, 2026

Fixes #23223.

Why

Malformed AGENTS instructions should not fail silently. The reported issue had invalid UTF-8 in a global AGENTS.md; before this change, Codex treated that decode failure like a missing file, so the personal instructions disappeared without a user-visible explanation and the rollout had no # AGENTS.md instructions block.

Project-level AGENTS files already used lossy decoding, so their instructions still appeared, but invalid bytes were replaced without telling the user. Global and project AGENTS files should behave consistently: keep usable instruction text when possible, and surface a diagnostic when bytes had to be replaced.

What changed

Global AGENTS.override.md and AGENTS.md loading now reads bytes and decodes with replacement characters on invalid UTF-8, matching project-level AGENTS behavior. Both global and project AGENTS loading now emit a startup warning when invalid UTF-8 is found, and both keep the instruction text with invalid byte sequences replaced.

Missing files, non-file candidates, empty files, and the existing AGENTS.override.md before AGENTS.md precedence keep their current behavior.

How users see it

The warnings flow through the existing startup warning surface. App-server clients receive config-time startup warnings as configWarning notifications during initialization, and thread startup emits startup warnings as thread-scoped warning notifications.

Global AGENTS invalid UTF-8 warnings can appear on both surfaces. Project-level AGENTS invalid UTF-8 warnings are discovered while building thread instructions, so they appear as thread-scoped warning notifications. Clients that render warning notifications in the conversation surface show the message as a visible diagnostic instead of silently hiding or altering instructions.

@etraut-openai etraut-openai requested a review from a team as a code owner May 18, 2026 03:09
@etraut-openai etraut-openai changed the title Warn on invalid global AGENTS.md Warn when global AGENTS.md cannot be decoded May 18, 2026
@etraut-openai etraut-openai changed the title Warn when global AGENTS.md cannot be decoded Warn on invalid UTF-8 in AGENTS.md files May 18, 2026
@bolinfest bolinfest self-requested a review May 18, 2026 16:45
Comment thread codex-rs/core/src/agents_md.rs Outdated
contents: trimmed.to_string(),
path,
});
if !path.as_path().is_file() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly better to just std::fs::read() and then check to see whether err is ENOENT (and also EISDIR to be extra conservative?) rather than the separate is_file() check to avoid TOCTOU, though I concede the risk is minor.

Comment on lines +84 to +85
warn_invalid_utf8(&path, &data, "Global", startup_warnings);
let contents = String::from_utf8_lossy(&data);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If warn_invalid_utf8() returned Option<String>, returning Some when std::str::from_utf8() succeeds, you would only do the pass once in the common case:

Suggested change
warn_invalid_utf8(&path, &data, "Global", startup_warnings);
let contents = String::from_utf8_lossy(&data);
let contents = warn_invalid_utf8(&path, &data, "Global", startup_warnings).unwrap_or_else(|| String::from_utf8_lossy(&data));

Comment thread codex-rs/core/src/agents_md.rs
Comment thread codex-rs/core/src/agents_md.rs
@etraut-openai etraut-openai force-pushed the etraut/warn-invalid-global-agents-md branch from 9f4a2c6 to a0142a6 Compare May 20, 2026 04:19
@etraut-openai etraut-openai merged commit 9dda71d into main May 20, 2026
46 of 47 checks passed
@etraut-openai etraut-openai deleted the etraut/warn-invalid-global-agents-md branch May 20, 2026 04:56
@github-actions github-actions Bot locked and limited conversation to collaborators May 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AGENTS.md is silently ignored when invalid utf8 characters are present

2 participants