Skip to content

fix: decode named invisible-char HTML entities in decodeHtmlEntities to close @mention bypass#31703

Merged
pelikhan merged 6 commits into
mainfrom
copilot/fix-decode-html-entities
May 12, 2026
Merged

fix: decode named invisible-char HTML entities in decodeHtmlEntities to close @mention bypass#31703
pelikhan merged 6 commits into
mainfrom
copilot/fix-decode-html-entities

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 12, 2026

Bug Fix

What was the bug?

decodeHtmlEntities decoded numeric invisible-character entities (­, ­) but not their named equivalents (­, ‌, ‍, ‎, ‏, etc.). Because hardenUnicodeText Step 3 strips actual Unicode code points, named entities survived the pipeline as literal &...; text. neutralizeAllMentions then failed to match @­victim& is not in [A-Za-z0-9] — so the mention passed through unsanitized. GitHub renders the entity on display, resulting in a live @victim notification trigger.

How did you fix it?

Extended decodeHtmlEntities (after the & block) with replace calls for every named entity whose code point is stripped by Steps 3–4 of hardenUnicodeText, including double-encoded variants (­ etc.):

Named entity Aliases Code point
­ U+00AD soft hyphen
‌ U+200C zero-width non-joiner
‍ U+200D zero-width joiner
‎ U+200E left-to-right mark
‏ U+200F right-to-left mark
​ U+200B zero-width space
⁠ U+2060 word joiner
⁡ ⁡ U+2061 invisible function application
⁢ ⁢ U+2062 invisible times
⁣ ⁣ U+2063 invisible separator

After decoding, the real code points hit Step 3's strip regex and the resulting bare @victim is neutralized normally:

// Before fix
sanitizeContentCore('@­victim say hi')
// → '@­victim say hi'  ← bypassed

// After fix
sanitizeContentCore('@­victim say hi')
// → '`@victim` say hi'  ← neutralized

Testing

Added 15 regression tests in sanitize_content.test.cjs covering each new named entity form (including case-insensitive variants and multi-entity combinations).

Copilot AI and others added 3 commits May 12, 2026 14:44
Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
…to prevent @mention bypass

Extend `decodeHtmlEntities` in `sanitize_content_core.cjs` to handle the
named entity forms of invisible/formatting characters (&shy;, &zwnj;, &zwj;,
&lrm;, &rlm;, &ZeroWidthSpace;, &NoBreak;, &af;/&ApplyFunction;,
&it;/&InvisibleTimes;, &ic;/&InvisibleComma;) including double-encoded
variants (&amp;shy; etc.).

Without this fix, `@­victim` (using the named entity form) survived
`decodeHtmlEntities` and defeated `neutralizeAllMentions` because `&`
is not matched by `[A-Za-z0-9]` in the mention regex. After GitHub
renders the output, the entity decodes to an invisible char and the
result appears as `@victim` to readers — a partial bypass of the fix
from gh-aw#24154 / gh-aw-security#2086.

Each named entity is now decoded to its actual Unicode code point so
that hardenUnicodeText Step 3 strips it before mention neutralization.

Regression tests added in sanitize_content.test.cjs.

Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
…omment

Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix decodeHtmlEntities to handle named entities for invisible characters fix: decode named invisible-char HTML entities in decodeHtmlEntities to close @mention bypass May 12, 2026
Copilot AI requested a review from szabta89 May 12, 2026 14:50
@pelikhan pelikhan marked this pull request as ready for review May 12, 2026 14:57
Copilot AI review requested due to automatic review settings May 12, 2026 14:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR closes an @mention neutralization bypass where named HTML entities for invisible/formatting characters (e.g., &shy;, &zwnj;, &lrm;) could survive sanitization and render as live mentions on GitHub.

Changes:

  • Extend decodeHtmlEntities to decode named invisible/formatting entities (and their &amp;...; forms) into real Unicode code points so hardenUnicodeText can strip them.
  • Add regression tests ensuring @&name;user inputs are neutralized after sanitization.
  • Update dependabot locked workflows to derive the Docker socket path from DOCKER_HOST for improved compatibility.
Show a summary per file
File Description
actions/setup/js/sanitize_content_core.cjs Adds decoding for named invisible-character entities to prevent @mention bypass.
actions/setup/js/sanitize_content.test.cjs Adds regression coverage for named-entity @mention bypass cases.
.github/workflows/dependabot-worker.lock.yml Uses DOCKER_HOST to choose the Docker socket path when starting the MCP gateway container.
.github/workflows/dependabot-campaign.lock.yml Same Docker socket path handling update as the worker workflow.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 3

// &it; / &InvisibleTimes; and double-encoded variants → U+2062 (invisible times)
result = result.replace(/&(?:amp;)?(?:it|InvisibleTimes);/gi, "\u2062");
// &ic; / &InvisibleComma; and double-encoded variants → U+2063 (invisible separator)
result = result.replace(/&(?:amp;)?(?:ic|InvisibleComma);/gi, "\u2063");
/* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;;
* ) DOCKER_SOCK_PATH=/var/run/docker.sock ;;
esac
DOCKER_SOCK_GID=$(stat -c '%g' "$DOCKER_SOCK_PATH" 2>/dev/null || echo '0')
/* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;;
* ) DOCKER_SOCK_PATH=/var/run/docker.sock ;;
esac
DOCKER_SOCK_GID=$(stat -c '%g' "$DOCKER_SOCK_PATH" 2>/dev/null || echo '0')
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

…idation

- Add &ip;/&InvisiblePlus; named entity → U+2064 (INVISIBLE PLUS) to
  decodeHtmlEntities, closing the gap identified by the reviewer: Step 3's
  strip regex covers \u2060-\u2064 but the named entity block only reached
  U+2063. Add two matching regression tests.
- Add test -S socket validation in dependabot-campaign.lock.yml and
  dependabot-worker.lock.yml: if DOCKER_SOCK_PATH derived from DOCKER_HOST
  is not a socket, fall back to /var/run/docker.sock before stat/mount.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 12, 2026

@copilot review all comments

All three review comments have been addressed in the latest commit:

  • sanitize_content_core.cjs: Added &ip;/&InvisiblePlus; → U+2064 (INVISIBLE PLUS) to decodeHtmlEntities, closing the gap between U+2063 and the Step 3 strip range's upper bound \u2064. Two matching regression tests added.

  • dependabot-campaign.lock.yml and dependabot-worker.lock.yml: Added if ! test -S "$DOCKER_SOCK_PATH"; then DOCKER_SOCK_PATH=/var/run/docker.sock; fi after the case block, so a non-existent or non-socket path derived from DOCKER_HOST falls back to the standard socket path before stat and the -v mount.

Copilot AI requested a review from pelikhan May 12, 2026 15:45
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot merge main and recompile

…ml-entities

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 12, 2026

@copilot merge main and recompile

Done. Merged origin/main (merge commit fd0510b) and ran make recompile — the lock files were already up-to-date, so no additional changes were generated.

@pelikhan pelikhan merged commit ea33dfc into main May 12, 2026
@pelikhan pelikhan deleted the copilot/fix-decode-html-entities branch May 12, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants