fix: decode named invisible-char HTML entities in decodeHtmlEntities to close @mention bypass#31703
Conversation
Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
…to prevent @mention bypass Extend `decodeHtmlEntities` in `sanitize_content_core.cjs` to handle the named entity forms of invisible/formatting characters (­, ‌, ‍, ‎, ‏, ​, ⁠, ⁡/⁡, ⁢/⁢, ⁣/⁣) including double-encoded variants (&shy; etc.). Without this fix, `@victim` (using the named entity form) survived `decodeHtmlEntities` and defeated `neutralizeAllMentions` because `&` is not matched by `[A-Za-z0-9]` in the mention regex. After GitHub renders the output, the entity decodes to an invisible char and the result appears as `@victim` to readers — a partial bypass of the fix from gh-aw#24154 / gh-aw-security#2086. Each named entity is now decoded to its actual Unicode code point so that hardenUnicodeText Step 3 strips it before mention neutralization. Regression tests added in sanitize_content.test.cjs. Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
…omment Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR closes an @mention neutralization bypass where named HTML entities for invisible/formatting characters (e.g., ­, ‌, ‎) could survive sanitization and render as live mentions on GitHub.
Changes:
- Extend
decodeHtmlEntitiesto decode named invisible/formatting entities (and their&...;forms) into real Unicode code points sohardenUnicodeTextcan strip them. - Add regression tests ensuring
@&name;userinputs are neutralized after sanitization. - Update dependabot locked workflows to derive the Docker socket path from
DOCKER_HOSTfor improved compatibility.
Show a summary per file
| File | Description |
|---|---|
actions/setup/js/sanitize_content_core.cjs |
Adds decoding for named invisible-character entities to prevent @mention bypass. |
actions/setup/js/sanitize_content.test.cjs |
Adds regression coverage for named-entity @mention bypass cases. |
.github/workflows/dependabot-worker.lock.yml |
Uses DOCKER_HOST to choose the Docker socket path when starting the MCP gateway container. |
.github/workflows/dependabot-campaign.lock.yml |
Same Docker socket path handling update as the worker workflow. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 3
| // ⁢ / ⁢ and double-encoded variants → U+2062 (invisible times) | ||
| result = result.replace(/&(?:amp;)?(?:it|InvisibleTimes);/gi, "\u2062"); | ||
| // ⁣ / ⁣ and double-encoded variants → U+2063 (invisible separator) | ||
| result = result.replace(/&(?:amp;)?(?:ic|InvisibleComma);/gi, "\u2063"); |
| /* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;; | ||
| * ) DOCKER_SOCK_PATH=/var/run/docker.sock ;; | ||
| esac | ||
| DOCKER_SOCK_GID=$(stat -c '%g' "$DOCKER_SOCK_PATH" 2>/dev/null || echo '0') |
| /* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;; | ||
| * ) DOCKER_SOCK_PATH=/var/run/docker.sock ;; | ||
| esac | ||
| DOCKER_SOCK_GID=$(stat -c '%g' "$DOCKER_SOCK_PATH" 2>/dev/null || echo '0') |
|
@copilot review all comments |
…idation - Add &ip;/&InvisiblePlus; named entity → U+2064 (INVISIBLE PLUS) to decodeHtmlEntities, closing the gap identified by the reviewer: Step 3's strip regex covers \u2060-\u2064 but the named entity block only reached U+2063. Add two matching regression tests. - Add test -S socket validation in dependabot-campaign.lock.yml and dependabot-worker.lock.yml: if DOCKER_SOCK_PATH derived from DOCKER_HOST is not a socket, fall back to /var/run/docker.sock before stat/mount. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
All three review comments have been addressed in the latest commit:
|
|
@copilot merge main and recompile |
…ml-entities Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done. Merged |
Bug Fix
What was the bug?
decodeHtmlEntitiesdecoded numeric invisible-character entities (­,­) but not their named equivalents (­,‌,‍,‎,‏, etc.). BecausehardenUnicodeTextStep 3 strips actual Unicode code points, named entities survived the pipeline as literal&...;text.neutralizeAllMentionsthen failed to match@­victim—&is not in[A-Za-z0-9]— so the mention passed through unsanitized. GitHub renders the entity on display, resulting in a live@victimnotification trigger.How did you fix it?
Extended
decodeHtmlEntities(after the&block) withreplacecalls for every named entity whose code point is stripped by Steps 3–4 ofhardenUnicodeText, including double-encoded variants (&shy;etc.):­‌‍‎‏​⁠⁡⁡⁢⁢⁣⁣After decoding, the real code points hit Step 3's strip regex and the resulting bare
@victimis neutralized normally:Testing
Added 15 regression tests in
sanitize_content.test.cjscovering each new named entity form (including case-insensitive variants and multi-entity combinations).