Skip to content

fix: robustly parse THREAT_DETECTION_RESULT with literal newlines in reasons (#issue)#22982

Merged
davidslater merged 3 commits intomainfrom
copilot/improve-threat-detection-parsing
Mar 25, 2026
Merged

fix: robustly parse THREAT_DETECTION_RESULT with literal newlines in reasons (#issue)#22982
davidslater merged 3 commits intomainfrom
copilot/improve-threat-detection-parsing

Conversation

Copy link
Contributor

Copilot AI commented Mar 25, 2026

Summary

Fixes a bug in parse_threat_detection_results.cjs where a reasons array containing values with literal newlines (encoded as \n escapes in the outer stream-json envelope) would cause the verdict JSON to be silently truncated and fail to parse.

Root cause: After JSON.parse decodes the outer stream-json envelope, \n escape sequences in the result field become actual newline characters (U+000A) in the JS string. Calling .split("\n") on this string would break the verdict JSON mid-value at those embedded newlines, returning a truncated line that fails JSON.parse.

Changes

parse_threat_detection_results.cjs

  • extractResultFromText (new helper): Character-by-character brace-counting (no regex) that extracts the complete JSON object from a string starting with THREAT_DETECTION_RESULT:. Tracks string context and escape sequences so braces inside string values are ignored. Stops at the first matching closing brace, discarding any trailing content.

  • extractFromStreamJson (fixed): Instead of returning the first line starting with the prefix (which may be truncated at an embedded newline), it now finds the prefix line by index, rejoins all subsequent lines with "\n", then delegates to extractResultFromText to extract the complete JSON object.

  • parseDetectionLog raw mode (fixed): Applies the same join+brace-count approach for raw (non-stream-json) log entries. Falls back to the trimmed line when extractResultFromText returns null (e.g. null, [], truncated JSON) so that existing error messages are preserved.

  • Literal-newline normalization: Before calling JSON.parse, replaces any remaining actual newline characters with \n escape sequences (split("\n").join("\\n")). This restores valid JSON for the edge case where embedded newlines survive the extraction step.

  • Improved debug logging in main(): Replaces the first-5/last-5 lines preview with the total line count (already in the stats line) and a focused listing of only lines containing THREAT_DETECTION_RESULT.

parse_threat_detection_results.test.cjs

  • New describe("extractResultFromText"): 9 tests covering simple objects, trailing content truncation, nested objects, braces inside string values, escaped quotes, literal newlines in strings, non-object JSON (null/array/number/empty), and truncated JSON.

  • New stream-json newline test: Verifies that a stream-json log line whose result field contains a reason with a literal newline is parsed correctly end-to-end, covering both extractFromStreamJson output and the full parseDetectionLog flow.

Testing

All 44 tests pass (make fmt-cjs + make lint-cjs clean).

/cc @davidslater

Copilot AI and others added 2 commits March 25, 2026 21:17
…reasons

- Add extractResultFromText helper using brace-counting (no regex) to extract
  the complete JSON object even when reasons contain literal newlines
- extractFromStreamJson now rejoins split lines and uses brace-counting instead
  of returning the first truncated line
- parseDetectionLog raw mode uses same join+brace-count approach; falls back to
  trimmed line for non-object JSON (null, [], truncated) to preserve error messages
- Normalize literal newlines to \n escape before JSON.parse to handle unescaped
  newlines introduced by outer stream-json decoder
- Replace first-5/last-5 debug lines with line count + THREAT_DETECTION_RESULT lines
- Add test for literal-newline-in-reasons case covering both extractFromStreamJson
  and full parseDetectionLog flow

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c9f11d1-b986-441a-a4db-8bfbf6723a0f

Co-authored-by: davidslater <12449447+davidslater@users.noreply.github.com>
…nd test extractResultFromText

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c9f11d1-b986-441a-a4db-8bfbf6723a0f

Co-authored-by: davidslater <12449447+davidslater@users.noreply.github.com>
@davidslater davidslater marked this pull request as ready for review March 25, 2026 22:29
Copilot AI review requested due to automatic review settings March 25, 2026 22:29
@davidslater davidslater enabled auto-merge (squash) March 25, 2026 22:29
@davidslater davidslater merged commit 0a9ae39 into main Mar 25, 2026
57 checks passed
@davidslater davidslater deleted the copilot/improve-threat-detection-parsing branch March 25, 2026 22:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes parsing of THREAT_DETECTION_RESULT entries when verdict JSON can be split/truncated due to literal newline characters appearing inside reasons strings (after decoding stream-json envelopes).

Changes:

  • Added extractResultFromText() brace-counting extractor to reliably capture the full verdict JSON object.
  • Updated stream-json and raw parsing paths to rejoin lines from the prefix onward and extract the complete JSON object.
  • Added targeted tests for the new extractor and the stream-json literal-newline scenario; updated debug logging to focus on matching lines.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
actions/setup/js/parse_threat_detection_results.cjs Adds brace-count extraction and updates extraction/parsing logic to handle embedded newlines; adjusts debug logging and exports new helper.
actions/setup/js/parse_threat_detection_results.test.cjs Adds unit tests for extractResultFromText() and an end-to-end stream-json test for literal newline handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +126 to +127
// Extract the complete JSON object using brace-counting.
return extractResultFromText(joined);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractFromStreamJson() now always delegates to extractResultFromText(joined) and returns null when a prefix line exists but the value is not a {...} object (e.g. THREAT_DETECTION_RESULT:null, [], or truncated JSON). In stream-json mode this means parseDetectionLog() will report "No THREAT_DETECTION_RESULT found" instead of surfacing a parse/type error with the raw value, which is a regression in diagnostics.

Consider preserving the old fallback behavior here: if extractResultFromText(joined) returns null, return the trimmed prefix line (or the joined string trimmed) so the downstream parser can emit the existing, more informative error messages.

Suggested change
// Extract the complete JSON object using brace-counting.
return extractResultFromText(joined);
// Extract the complete JSON object using brace-counting. If extraction
// fails (e.g. the value is "null", "[]", or truncated JSON), fall back to
// returning the raw trimmed line(s) so downstream parsers can report a
// precise parse/type error using the original value.
const extracted = extractResultFromText(joined);
return extracted !== null ? extracted : joined;

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +219
// Normalize literal newline characters to JSON escape sequences before parsing.
// When the outer stream-json decoder unescapes \n sequences, actual newline
// characters may end up inside JSON string values (e.g. in reasons entries).
// Replacing them with the two-character sequence \n restores valid JSON so
// that JSON.parse can handle them correctly.
const normalizedJson = jsonPart.split("\n").join("\\n");
const parsed = JSON.parse(normalizedJson);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newline normalization before JSON.parse currently replaces all \n characters in jsonPart with the two-character sequence \\n. That will break otherwise-valid JSON that uses newlines as whitespace between tokens (pretty-printed / multi-line JSON), because \\n is not valid outside of a string.

A safer approach is to only escape literal newline characters that occur inside JSON string literals (you can reuse the same inString/escaped state machine used by extractResultFromText), or attempt JSON.parse(jsonPart) first and only apply targeted normalization when parsing fails due to an unescaped control character in a string.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants