fix: decode named invisible-char HTML entities in decodeHtmlEntities to close @mention bypass by Copilot · Pull Request #31703 · github/gh-aw

Copilot · 2026-05-12T14:38:45Z

Bug Fix

What was the bug?

decodeHtmlEntities decoded numeric invisible-character entities (, ) but not their named equivalents (, &zwnj;, &zwj;, &lrm;, &rlm;, etc.). Because hardenUnicodeText Step 3 strips actual Unicode code points, named entities survived the pipeline as literal &...; text. neutralizeAllMentions then failed to match @victim — & is not in [A-Za-z0-9] — so the mention passed through unsanitized. GitHub renders the entity on display, resulting in a live @victim notification trigger.

How did you fix it?

Extended decodeHtmlEntities (after the & block) with replace calls for every named entity whose code point is stripped by Steps 3–4 of hardenUnicodeText, including double-encoded variants (&shy; etc.):

Named entity	Aliases	Code point
``	—	U+00AD soft hyphen
`&zwnj;`	—	U+200C zero-width non-joiner
`&zwj;`	—	U+200D zero-width joiner
`&lrm;`	—	U+200E left-to-right mark
`&rlm;`	—	U+200F right-to-left mark
`&ZeroWidthSpace;`	—	U+200B zero-width space
`&NoBreak;`	—	U+2060 word joiner
`⁡`	`⁡`	U+2061 invisible function application
`⁢`	`⁢`	U+2062 invisible times
`⁣`	`⁣`	U+2063 invisible separator

After decoding, the real code points hit Step 3's strip regex and the resulting bare @victim is neutralized normally:

// Before fix
sanitizeContentCore('@&shy;victim say hi')
// → '@&shy;victim say hi'  ← bypassed

// After fix
sanitizeContentCore('@&shy;victim say hi')
// → '`@victim` say hi'  ← neutralized

Testing

Added 15 regression tests in sanitize_content.test.cjs covering each new named entity form (including case-insensitive variants and multi-entity combinations).

Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>

…to prevent @mention bypass Extend `decodeHtmlEntities` in `sanitize_content_core.cjs` to handle the named entity forms of invisible/formatting characters (, &zwnj;, &zwj;, &lrm;, &rlm;, &ZeroWidthSpace;, &NoBreak;, ⁡/⁡, ⁢/⁢, ⁣/⁣) including double-encoded variants (&shy; etc.). Without this fix, `@victim` (using the named entity form) survived `decodeHtmlEntities` and defeated `neutralizeAllMentions` because `&` is not matched by `[A-Za-z0-9]` in the mention regex. After GitHub renders the output, the entity decodes to an invisible char and the result appears as `@victim` to readers — a partial bypass of the fix from gh-aw#24154 / gh-aw-security#2086. Each named entity is now decoded to its actual Unicode code point so that hardenUnicodeText Step 3 strips it before mention neutralization. Regression tests added in sanitize_content.test.cjs. Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>

…omment Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>

Copilot

Pull request overview

This PR closes an @mention neutralization bypass where named HTML entities for invisible/formatting characters (e.g., , &zwnj;, &lrm;) could survive sanitization and render as live mentions on GitHub.

Changes:

Extend decodeHtmlEntities to decode named invisible/formatting entities (and their &...; forms) into real Unicode code points so hardenUnicodeText can strip them.
Add regression tests ensuring @&name;user inputs are neutralized after sanitization.
Update dependabot locked workflows to derive the Docker socket path from DOCKER_HOST for improved compatibility.

Show a summary per file

File	Description
`actions/setup/js/sanitize_content_core.cjs`	Adds decoding for named invisible-character entities to prevent `@mention` bypass.
`actions/setup/js/sanitize_content.test.cjs`	Adds regression coverage for named-entity `@mention` bypass cases.
`.github/workflows/dependabot-worker.lock.yml`	Uses `DOCKER_HOST` to choose the Docker socket path when starting the MCP gateway container.
`.github/workflows/dependabot-campaign.lock.yml`	Same Docker socket path handling update as the worker workflow.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 4/4 changed files
Comments generated: 3

+  // &it; / &InvisibleTimes; and double-encoded variants → U+2062 (invisible times)
+  result = result.replace(/&(?:amp;)?(?:it|InvisibleTimes);/gi, "\u2062");
+  // &ic; / &InvisibleComma; and double-encoded variants → U+2063 (invisible separator)
+  result = result.replace(/&(?:amp;)?(?:ic|InvisibleComma);/gi, "\u2063");


+            /* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;;
+            * ) DOCKER_SOCK_PATH=/var/run/docker.sock ;;
+          esac
+          DOCKER_SOCK_GID=$(stat -c '%g' "$DOCKER_SOCK_PATH" 2>/dev/null || echo '0')


+            /* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;;
+            * ) DOCKER_SOCK_PATH=/var/run/docker.sock ;;
+          esac
+          DOCKER_SOCK_GID=$(stat -c '%g' "$DOCKER_SOCK_PATH" 2>/dev/null || echo '0')


pelikhan · 2026-05-12T15:39:22Z

@copilot review all comments

…idation - Add &ip;/⁤ named entity → U+2064 (INVISIBLE PLUS) to decodeHtmlEntities, closing the gap identified by the reviewer: Step 3's strip regex covers \u2060-\u2064 but the named entity block only reached U+2063. Add two matching regression tests. - Add test -S socket validation in dependabot-campaign.lock.yml and dependabot-worker.lock.yml: if DOCKER_SOCK_PATH derived from DOCKER_HOST is not a socket, fall back to /var/run/docker.sock before stat/mount. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-12T15:44:49Z

@copilot review all comments

All three review comments have been addressed in the latest commit:

sanitize_content_core.cjs: Added &ip;/⁤ → U+2064 (INVISIBLE PLUS) to decodeHtmlEntities, closing the gap between U+2063 and the Step 3 strip range's upper bound \u2064. Two matching regression tests added.
dependabot-campaign.lock.yml and dependabot-worker.lock.yml: Added if ! test -S "$DOCKER_SOCK_PATH"; then DOCKER_SOCK_PATH=/var/run/docker.sock; fi after the case block, so a non-existent or non-socket path derived from DOCKER_HOST falls back to the standard socket path before stat and the -v mount.

pelikhan · 2026-05-12T15:47:29Z

@copilot merge main and recompile

…ml-entities Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-12T15:55:15Z

@copilot merge main and recompile

Done. Merged origin/main (merge commit fd0510b) and ran make recompile — the lock files were already up-to-date, so no additional changes were generated.

Initial plan

e7ff9ce

Copilot AI assigned Copilot and szabta89 May 12, 2026

Copilot started work on behalf of szabta89 May 12, 2026 14:38 View session

Copilot AI linked an issue May 12, 2026 that may be closed by this pull request

decodeHtmlEntities must decode invisible-char named entities (, &zwj;, &lrm;, etc.) before hardenUnicodeText Step 3 strippi [Content truncated due to length] #31702

Closed

Copilot AI and others added 3 commits May 12, 2026 14:44

Initial plan

79fc2bb

Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>

fix comment: use  entity notation instead of literal U+00AD in c…

dbc1ed6

…omment Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix decodeHtmlEntities to handle named entities for invisible characters~~ fix: decode named invisible-char HTML entities in decodeHtmlEntities to close @mention bypass May 12, 2026

Copilot finished work on behalf of szabta89 May 12, 2026 14:50

Copilot AI requested a review from szabta89 May 12, 2026 14:50

pelikhan marked this pull request as ready for review May 12, 2026 14:57

Copilot AI review requested due to automatic review settings May 12, 2026 14:57

Copilot started reviewing on behalf of pelikhan May 12, 2026 14:58 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 12, 2026 15:39 View session

Copilot finished work on behalf of pelikhan May 12, 2026 15:45

Copilot AI requested a review from pelikhan May 12, 2026 15:45

Copilot started work on behalf of pelikhan May 12, 2026 15:47 View session

Merge remote-tracking branch 'origin/main' into copilot/fix-decode-ht…

fd0510b

…ml-entities Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 12, 2026 15:56

pelikhan merged commit ea33dfc into main May 12, 2026

pelikhan deleted the copilot/fix-decode-html-entities branch May 12, 2026 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: decode named invisible-char HTML entities in decodeHtmlEntities to close @mention bypass#31703

fix: decode named invisible-char HTML entities in decodeHtmlEntities to close @mention bypass#31703
pelikhan merged 6 commits into
mainfrom
copilot/fix-decode-html-entities

Copilot AI commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

pelikhan commented May 12, 2026

Uh oh!

Copilot AI commented May 12, 2026

Uh oh!

pelikhan commented May 12, 2026

Uh oh!

Copilot AI commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug Fix

What was the bug?

How did you fix it?

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

pelikhan commented May 12, 2026

Uh oh!

Copilot AI commented May 12, 2026

Uh oh!

pelikhan commented May 12, 2026

Uh oh!

Copilot AI commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented May 12, 2026 •

edited

Loading