Warn on invalid UTF-8 in AGENTS.md files#23232
Merged
Merged
Conversation
bolinfest
approved these changes
May 18, 2026
| contents: trimmed.to_string(), | ||
| path, | ||
| }); | ||
| if !path.as_path().is_file() { |
Collaborator
There was a problem hiding this comment.
It's slightly better to just std::fs::read() and then check to see whether err is ENOENT (and also EISDIR to be extra conservative?) rather than the separate is_file() check to avoid TOCTOU, though I concede the risk is minor.
Comment on lines
+84
to
+85
| warn_invalid_utf8(&path, &data, "Global", startup_warnings); | ||
| let contents = String::from_utf8_lossy(&data); |
Collaborator
There was a problem hiding this comment.
If warn_invalid_utf8() returned Option<String>, returning Some when std::str::from_utf8() succeeds, you would only do the pass once in the common case:
Suggested change
| warn_invalid_utf8(&path, &data, "Global", startup_warnings); | |
| let contents = String::from_utf8_lossy(&data); | |
| let contents = warn_invalid_utf8(&path, &data, "Global", startup_warnings).unwrap_or_else(|| String::from_utf8_lossy(&data)); |
9f4a2c6 to
a0142a6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #23223.
Why
Malformed AGENTS instructions should not fail silently. The reported issue had invalid UTF-8 in a global
AGENTS.md; before this change, Codex treated that decode failure like a missing file, so the personal instructions disappeared without a user-visible explanation and the rollout had no# AGENTS.md instructionsblock.Project-level AGENTS files already used lossy decoding, so their instructions still appeared, but invalid bytes were replaced without telling the user. Global and project AGENTS files should behave consistently: keep usable instruction text when possible, and surface a diagnostic when bytes had to be replaced.
What changed
Global
AGENTS.override.mdandAGENTS.mdloading now reads bytes and decodes with replacement characters on invalid UTF-8, matching project-level AGENTS behavior. Both global and project AGENTS loading now emit a startup warning when invalid UTF-8 is found, and both keep the instruction text with invalid byte sequences replaced.Missing files, non-file candidates, empty files, and the existing
AGENTS.override.mdbeforeAGENTS.mdprecedence keep their current behavior.How users see it
The warnings flow through the existing startup warning surface. App-server clients receive config-time startup warnings as
configWarningnotifications during initialization, and thread startup emits startup warnings as thread-scopedwarningnotifications.Global AGENTS invalid UTF-8 warnings can appear on both surfaces. Project-level AGENTS invalid UTF-8 warnings are discovered while building thread instructions, so they appear as thread-scoped
warningnotifications. Clients that render warning notifications in the conversation surface show the message as a visible diagnostic instead of silently hiding or altering instructions.