Skip to content

fix(sanitize): switch sanitizeUrlProtocols to allowlist for ://-scheme URLs#31715

Merged
pelikhan merged 7 commits into
mainfrom
copilot/fix-sanitize-url-protocols
May 12, 2026
Merged

fix(sanitize): switch sanitizeUrlProtocols to allowlist for ://-scheme URLs#31715
pelikhan merged 7 commits into
mainfrom
copilot/fix-sanitize-url-protocols

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 12, 2026

Bug Fix

What was the bug?

sanitizeUrlProtocols() used a hardcoded blocklist of :// schemes (http, ftp, file, ssh, git). Any unlisted scheme — ws://, wss://, smb://, irc://, ldap://, ldaps://, rtsp://, feed://, etc. — passed through the full sanitizeContentCore pipeline unchanged and was written verbatim to GitHub. In a prompt-injection scenario this enables active markdown links to attacker-controlled endpoints (e.g. smb:// triggers NTLM credential leaks on Windows via Responder).

How did you fix it?

Step 1 — allowlist for :// schemes (sanitize_content_core.cjs)

Replace the explicit blocklist with an allowlist: redact every protocol:// URL that is not https://. Two anchors prevent corrupting valid https:// URLs:

  • (?<![a-z0-9]) (negative lookbehind) — stops the regex from matching a suffix of https:// (e.g. ttps://) mid-string.
  • (?!https://) (negative lookahead) — explicitly skips the https:// scheme, which is forwarded to sanitizeUrlDomains for domain-level filtering.
// Before — explicit blocklist, incomplete:
normalized.replace(/((?:http|ftp|file|ssh|git):\/\/([\w.-]*)(?:[^\s]*)|...)/gi, ...)

// After — allowlist, catches everything except https://:
normalized.replace(/(?<![a-z0-9])(?!https:\/\/)([a-z][a-z0-9+.-]*)(:\/\/)([\w.-]*)([^\s]*)/gi, ...)

Step 2 — extend the single-colon blocklist

Add magnet to the existing data|javascript|vbscript|about|mailto|tel list to cover magnet:?xt=… URLs. A fully general single-colon pattern is intentionally avoided to prevent false positives on YAML key:value, C++ std::, and Windows C:\ paths.

The two steps are implemented as separate replace calls for clarity.

Testing

10 new regression tests added covering ws://, wss://, smb://, irc://, ldap://, ldaps://, rtsp://, magnet:, the https:// pass-through, and a guard against the "ttps:// suffix" regression.

Copilot AI and others added 3 commits May 12, 2026 15:02
Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
… single-colon blocklist

The previous `sanitizeUrlProtocols` regex matched only a fixed set of
`://` protocols (http, ftp, file, ssh, git), leaving ws://, wss://,
smb://, irc://, ldap://, ldaps://, rtsp://, feed://, and any future
schemes able to pass through unredacted.

Switch the `://` branch to an allowlist: redact **every** protocol://
URL except https:// (which is handled downstream by sanitizeUrlDomains
for domain-level filtering).  Two anchors keep https:// safe:

* Negative lookbehind `(?<![a-z0-9])` – prevents the regex from
  matching a suffix of "https://" (e.g. "ttps://") when scanning
  inside an existing HTTPS URL.
* Negative lookahead `(?!https://)` – explicitly excludes the https://
  scheme at position 0.

Also add `magnet` to the single-colon scheme blocklist to cover
`magnet:?xt=…` URLs.

Adds 10 new regression tests covering ws://, wss://, smb://, irc://,
ldap://, ldaps://, rtsp://, magnet:, the https:// pass-through, and the
"ttps://" suffix regression guard.

Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
…ce calls

Addresses code-review feedback: the combined regex was hard to read.
Step 1 (allowlist) handles protocol:// URLs; step 2 (blocklist) handles
single-colon schemes. Logic is unchanged; all 467 tests still pass.

Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix incomplete URL sanitization for additional protocols fix(sanitize): switch sanitizeUrlProtocols to allowlist for ://-scheme URLs May 12, 2026
Copilot AI requested a review from szabta89 May 12, 2026 15:12
@pelikhan pelikhan marked this pull request as ready for review May 12, 2026 15:42
Copilot AI review requested due to automatic review settings May 12, 2026 15:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens URL protocol sanitization to prevent non-HTTPS scheme:// links from passing through unredacted, and adds regression tests for newly covered schemes.

Changes:

  • Switch sanitizeUrlProtocols() from a :// scheme blocklist to an allowlist approach that only permits https://.
  • Extend single-colon scheme blocking to include magnet:.
  • Add protocol sanitization regression tests for several scheme:// URLs and magnet:; update two Dependabot workflow lockfiles to derive the Docker socket path from DOCKER_HOST.
Show a summary per file
File Description
actions/setup/js/sanitize_content_core.cjs Changes protocol sanitization to redact all non-HTTPS scheme:// URLs and adds magnet: to the single-colon blocklist.
actions/setup/js/sanitize_content.test.cjs Adds regression tests for new protocol sanitization behavior.
.github/workflows/dependabot-worker.lock.yml Updates MCP gateway Docker socket handling to respect DOCKER_HOST.
.github/workflows/dependabot-campaign.lock.yml Updates MCP gateway Docker socket handling to respect DOCKER_HOST.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 3

Comment on lines +214 to +218
const sanitized = sanitizeDomainName(domainLower);
const truncated = domainLower.length > 12 ? domainLower.substring(0, 12) + "..." : domainLower;
core.info(`Redacted URL: ${truncated}`);
core.debug(`Redacted URL (full): ${fullMatch}`);
addRedactedDomain(domainLower);
Comment on lines +783 to +787
case "${DOCKER_HOST:-}" in
unix://* ) DOCKER_SOCK_PATH="${DOCKER_HOST#unix://}" ;;
/* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;;
* ) DOCKER_SOCK_PATH=/var/run/docker.sock ;;
esac
Comment on lines +674 to +678
case "${DOCKER_HOST:-}" in
unix://* ) DOCKER_SOCK_PATH="${DOCKER_HOST#unix://}" ;;
/* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;;
* ) DOCKER_SOCK_PATH=/var/run/docker.sock ;;
esac
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot merge main, recompile, review all comments

Copilot AI and others added 2 commits May 12, 2026 15:56
…url-protocols

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…edaction

When the domain capture group is empty (e.g. `file:///etc/passwd` or
bare `http://`), the previous code called `addRedactedDomain("")` and
logged an empty truncated string.  Now the empty-domain branch uses the
scheme name (e.g. `file://`) as the redacted-domain token and returns
`(redacted)` directly.

Adds a regression test for `file:///etc/passwd`.

Also merges origin/main.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 12, 2026

@copilot merge main, recompile, review all comments

Done — merged main, typechecked, and addressed all review comments:

  • Empty-domain guard (file:///path, bare http://): the Step 1 callback now detects an empty domain capture, uses scheme + "://" as the addRedactedDomain token, and returns (redacted) directly — no empty-string entry is recorded. Regression test added. (commit fix(sanitize): guard empty-domain case)
  • Workflow file comments: the dependabot-*.lock.yml changes were present on the branch before my commits (they are part of the upstream branch state, not introduced by this fix). They are unrelated to the sanitization change.

…Protocols

The comment said "skip addRedactedDomain" but the code called it.
Updated the comment to accurately describe that the scheme token
(e.g. "file://") is used as the redacted-domain entry.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan May 12, 2026 16:00
@pelikhan pelikhan merged commit 8eb7e09 into main May 12, 2026
@pelikhan pelikhan deleted the copilot/fix-sanitize-url-protocols branch May 12, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sanitizeUrlProtocols incomplete — ws://, wss://, smb://, irc://, ldap:// bypass safe-output URL sanitization

4 participants