Skip to content

fix: reorder removeXmlComments before mention neutralization to prevent HTML comment bypass#25462

Merged
pelikhan merged 2 commits intomainfrom
copilot/fix-xml-comments-sanitization
Apr 9, 2026
Merged

fix: reorder removeXmlComments before mention neutralization to prevent HTML comment bypass#25462
pelikhan merged 2 commits intomainfrom
copilot/fix-xml-comments-sanitization

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 9, 2026

HTML comments containing @mentions survived sanitization because neutralizeAllMentions ran first, wrapping the mention in backticks. applyFnOutsideInlineCode then split the string at backtick boundaries, so neither segment contained a complete <!--...--> pattern and removeXmlComments never matched.

Input:  <!-- @exploituser injected payload -->
After neutralizeAllMentions:  <!-- `@exploituser` injected payload -->
Segments seen by removeXmlComments:  "<!-- "  and  " injected payload -->"
Result: comment survives unchanged  ← bug

Changes

  • sanitize_content_core.cjs / sanitize_content.cjs: Move applyToNonCodeRegions(removeXmlComments) to run before neutralizeAllMentions/neutralizeMentions. HTML comments are now stripped from raw input before any backtick-wrapping can fragment the <!--...--> delimiters.

  • sanitize_content.test.cjs: Add regression tests covering the exact bypass vector — <!-- @exploituser injected payload --> must produce an empty string after a full sanitization pass.

…nt bypass

HTML comments containing @mentions could survive sanitization because
neutralizeAllMentions wrapped @mentions in backticks before removeXmlComments
ran. The backtick-wrapped mention split the <!--...--> pattern across
inline-code boundaries, causing applyFnOutsideInlineCode to never pass a
complete comment pattern to removeXmlComments.

Fix: move applyToNonCodeRegions(removeXmlComments) before
neutralizeAllMentions/neutralizeMentions in both sanitize_content_core.cjs
and sanitize_content.cjs.

Adds regression tests for <!-- @mention payload --> patterns.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/ac4b8c22-07a3-4df6-b7ae-991bf386ec7b

Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix XML comments sanitization bypass in neutralizeAllMentions fix: reorder removeXmlComments before mention neutralization to prevent HTML comment bypass Apr 9, 2026
Copilot AI requested a review from szabta89 April 9, 2026 12:26
@pelikhan pelikhan marked this pull request as ready for review April 9, 2026 12:55
Copilot AI review requested due to automatic review settings April 9, 2026 12:55
@pelikhan pelikhan merged commit 4947797 into main Apr 9, 2026
67 of 68 checks passed
@pelikhan pelikhan deleted the copilot/fix-xml-comments-sanitization branch April 9, 2026 12:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a sanitization-order bug where HTML/XML comments containing @mentions could survive sanitization after mention neutralization inserted backticks and fragmented the <!--...--> delimiters.

Changes:

  • Reorders XML/HTML comment stripping to run before mention neutralization in both sanitize_content_core.cjs and the filtered wrapper sanitize_content.cjs.
  • Adds regression tests to ensure <!-- @user ... --> style comments are fully removed.
Show a summary per file
File Description
actions/setup/js/sanitize_content.cjs Moves removeXmlComments earlier in the pipeline (before neutralizeMentions) to prevent delimiter fragmentation.
actions/setup/js/sanitize_content_core.cjs Moves removeXmlComments earlier in the core pipeline (before neutralizeAllMentions) to close the reported bypass vector.
actions/setup/js/sanitize_content.test.cjs Adds regression tests covering the bypass scenario and a few variants with multiple mentions / surrounding text.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Comment on lines +293 to +308
it("should remove XML comments containing @mentions (regression: bypass via backtick wrapping)", () => {
// If removeXmlComments ran after neutralizeMentions, the @mention would be wrapped in
// backticks first, splitting the <!--...--> pattern and causing it to survive sanitization.
const result = sanitizeContent("<!-- @exploituser injected payload -->");
expect(result).toBe("");
});

it("should remove XML comments containing multiple @mentions", () => {
const result = sanitizeContent("<!-- @attacker1 and @attacker2 payload -->");
expect(result).toBe("");
});

it("should remove XML comments with @mentions mixed with surrounding text", () => {
const result = sanitizeContent("before <!-- @exploituser payload --> after");
expect(result).toBe("before after");
});
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new regression tests cover the default/core path (no options), but this PR also changes the allowedAliases branch in sanitize_content.cjs (which uses neutralizeMentions instead of neutralizeAllMentions). Add at least one test that calls sanitizeContent with an allowedAliases option and verifies that an XML/HTML comment containing a disallowed @mention is still fully removed (i.e., the bypass is fixed in both pipelines).

Copilot uses AI. Check for mistakes.
Comment on lines +1038 to +1042
// Remove XML comments before mention neutralization to prevent bypass: if removeXmlComments
// ran after neutralizeAllMentions, a comment like <!-- @user payload --> would first become
// <!-- `@user` payload --> and applyFnOutsideInlineCode would split at the backtick boundary,
// preventing the full <!--...--> pattern from being matched.
sanitized = applyToNonCodeRegions(sanitized, removeXmlComments);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with this reorder, removeXmlComments can still be bypassed when the input itself contains balanced backticks inside the comment (e.g., ""). Because applyToNonCodeRegions protects inline code spans, removeXmlComments is applied to "" separately and never sees a complete to remove. Consider removing XML/HTML comments before splitting on inline-code backticks (or have a comment-stripper that only skips fenced code blocks, not inline code within a comment).

Suggested change
// Remove XML comments before mention neutralization to prevent bypass: if removeXmlComments
// ran after neutralizeAllMentions, a comment like <!-- @user payload --> would first become
// <!-- `@user` payload --> and applyFnOutsideInlineCode would split at the backtick boundary,
// preventing the full <!--...--> pattern from being matched.
sanitized = applyToNonCodeRegions(sanitized, removeXmlComments);
// Remove XML comments on the full string before mention neutralization to prevent bypasses.
// If comment stripping is routed through applyToNonCodeRegions, balanced backticks inside
// the comment body (for example <!-- `@user` payload -->) can cause inline-code splitting,
// so removeXmlComments never sees a complete <!--...--> span to remove.
sanitized = removeXmlComments(sanitized);

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +96
// Remove XML comments before mention neutralization to prevent bypass: if removeXmlComments
// ran after neutralizeMentions, a comment like <!-- @user payload --> would first become
// <!-- `@user` payload --> and applyFnOutsideInlineCode would split at the backtick boundary,
// preventing the full <!--...--> pattern from being matched.
sanitized = applyToNonCodeRegions(sanitized, removeXmlComments);

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with this reorder, removeXmlComments can still be bypassed when the input itself contains balanced backticks inside the comment (e.g., ""). Because applyToNonCodeRegions protects inline code spans, removeXmlComments is applied to partial segments and never matches a full to remove. Consider stripping comments before inline-code splitting (or using a comment-stripper that only skips fenced code blocks).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants