Skip to content

Strip data-* attributes in stripDangerousAttributes#31714

Closed
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-data-attributes-stripping
Closed

Strip data-* attributes in stripDangerousAttributes#31714
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-data-attributes-stripping

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 12, 2026

Bug Fix

What was the bug?

data-* attributes on GFM-allowed tags (e.g. <span data-x="...">) silently passed through stripDangerousAttributes unchanged. They produce zero visible output in rendered Markdown, so adversarial content embedded in them is undetectable by human reviewers but delivered verbatim to the agent — a structurally equivalent hidden channel to HTML comments and zero-width-space splits, both of which are already neutralized.

How did you fix it?

  • sanitize_content_core.cjs — Extended the stripDangerousAttributes regex to include data-[a-z0-9_-]+ alongside the existing on\w+ and style clauses. The existing /gi flag handles case-insensitivity, making the A-Z range in the character class redundant. Updated JSDoc and call-site comment to document the rationale.
// Before
/[\s/]+(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi

// After
/[\s/]+(?:on\w+|style|data-[a-z0-9_-]+)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi

Testing

  • sanitize_content.test.cjs — Added 8 regression tests covering: double-quoted, single-quoted, unquoted, and bare (valueless) data-* attributes; multiple data-* attributes on one tag; data-* combined with on*/style; data-* alongside preserved safe attributes; and case-insensitive matching (DATA-X).

Copilot AI and others added 2 commits May 12, 2026 14:56
Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
data-* attributes on GFM-allowed tags (e.g. <span data-x="...">)
are invisible in rendered GitHub Markdown but passed through the
sanitizer verbatim, making them a hidden injection channel.

Add data-[a-z0-9_-]+ (with /gi flag for case-insensitivity) to
the stripDangerousAttributes regex alongside the existing on* and
style clauses. Also update JSDoc and inline comments to document
the rationale, and add 8 regression tests.

Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix stripDangerousAttributes to strip data-* custom attributes Strip data-* attributes in stripDangerousAttributes May 12, 2026
Copilot AI requested a review from szabta89 May 12, 2026 15:02
@pelikhan pelikhan marked this pull request as ready for review May 12, 2026 15:42
Copilot AI review requested due to automatic review settings May 12, 2026 15:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a hidden prompt-injection channel by removing data-* HTML attributes from otherwise-allowed GFM tags during sanitization, and adds regression tests to prevent reintroduction.

Changes:

  • Extend stripDangerousAttributes to strip data-* attributes in addition to on* and style.
  • Add regression tests for data-* stripping across quoting forms, valueless attributes, and mixed safe/dangerous attributes.
  • Update Dependabot workflows to derive the Docker socket path from DOCKER_HOST when building the MCP gateway docker run command.
Show a summary per file
File Description
actions/setup/js/sanitize_content_core.cjs Extends dangerous-attribute stripping logic to remove data-* attributes and updates inline documentation/comments.
actions/setup/js/sanitize_content.test.cjs Adds regression tests ensuring data-* attributes are removed from allowed tags while preserving safe attributes.
.github/workflows/dependabot-worker.lock.yml Uses DOCKER_HOST (when it’s a unix socket/path) to select the socket file mounted into the MCP gateway container.
.github/workflows/dependabot-campaign.lock.yml Same DOCKER_HOST-aware socket-path logic for the campaign workflow’s MCP gateway container.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

actions/setup/js/sanitize_content_core.cjs:689

  • This note says the pattern intentionally uses \s+ to require whitespace before attribute names, but the actual regex uses [\s/]+, which can also match a bare / (as described a few lines below). To avoid confusion for future maintainers, consider updating this note to refer to [\s/]+ / “whitespace-or-slash” instead of just \s+.
   * Note: `\s+` (requiring at least one whitespace before the attribute name) is
   * intentional — HTML attributes are always separated from the tag name and from
   * each other by at least one whitespace character. Using `\s*` would risk false
   * matches inside tag names (e.g. matching "ong" inside "strong").
  • Files reviewed: 4/4 changed files
  • Comments generated: 2

// Using [\s/]+ (instead of \s+) also strips dangerous attributes that are immediately
// preceded by a "/" with no space — e.g. the malformed <img/onerror=alert(1) src=x>.
return tagContent.replace(/[\s/]+(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, "");
return tagContent.replace(/[\s/]+(?:on\w+|style|data-[a-z0-9_-]+)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, "");
const result = sanitizeContent('<span data-x="INJECT">text</span>');
expect(result).toBe("<span>text</span>");
});

@pelikhan pelikhan closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants