fix(sanitize): switch sanitizeUrlProtocols to allowlist for ://-scheme URLs#31715
Merged
Conversation
Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
… single-colon blocklist The previous `sanitizeUrlProtocols` regex matched only a fixed set of `://` protocols (http, ftp, file, ssh, git), leaving ws://, wss://, smb://, irc://, ldap://, ldaps://, rtsp://, feed://, and any future schemes able to pass through unredacted. Switch the `://` branch to an allowlist: redact **every** protocol:// URL except https:// (which is handled downstream by sanitizeUrlDomains for domain-level filtering). Two anchors keep https:// safe: * Negative lookbehind `(?<![a-z0-9])` – prevents the regex from matching a suffix of "https://" (e.g. "ttps://") when scanning inside an existing HTTPS URL. * Negative lookahead `(?!https://)` – explicitly excludes the https:// scheme at position 0. Also add `magnet` to the single-colon scheme blocklist to cover `magnet:?xt=…` URLs. Adds 10 new regression tests covering ws://, wss://, smb://, irc://, ldap://, ldaps://, rtsp://, magnet:, the https:// pass-through, and the "ttps://" suffix regression guard. Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
…ce calls Addresses code-review feedback: the combined regex was hard to read. Step 1 (allowlist) handles protocol:// URLs; step 2 (blocklist) handles single-colon schemes. Logic is unchanged; all 467 tests still pass. Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix incomplete URL sanitization for additional protocols
fix(sanitize): switch sanitizeUrlProtocols to allowlist for ://-scheme URLs
May 12, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR tightens URL protocol sanitization to prevent non-HTTPS scheme:// links from passing through unredacted, and adds regression tests for newly covered schemes.
Changes:
- Switch
sanitizeUrlProtocols()from a://scheme blocklist to an allowlist approach that only permitshttps://. - Extend single-colon scheme blocking to include
magnet:. - Add protocol sanitization regression tests for several
scheme://URLs andmagnet:; update two Dependabot workflow lockfiles to derive the Docker socket path fromDOCKER_HOST.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/sanitize_content_core.cjs | Changes protocol sanitization to redact all non-HTTPS scheme:// URLs and adds magnet: to the single-colon blocklist. |
| actions/setup/js/sanitize_content.test.cjs | Adds regression tests for new protocol sanitization behavior. |
| .github/workflows/dependabot-worker.lock.yml | Updates MCP gateway Docker socket handling to respect DOCKER_HOST. |
| .github/workflows/dependabot-campaign.lock.yml | Updates MCP gateway Docker socket handling to respect DOCKER_HOST. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 3
Comment on lines
+214
to
+218
| const sanitized = sanitizeDomainName(domainLower); | ||
| const truncated = domainLower.length > 12 ? domainLower.substring(0, 12) + "..." : domainLower; | ||
| core.info(`Redacted URL: ${truncated}`); | ||
| core.debug(`Redacted URL (full): ${fullMatch}`); | ||
| addRedactedDomain(domainLower); |
Comment on lines
+783
to
+787
| case "${DOCKER_HOST:-}" in | ||
| unix://* ) DOCKER_SOCK_PATH="${DOCKER_HOST#unix://}" ;; | ||
| /* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;; | ||
| * ) DOCKER_SOCK_PATH=/var/run/docker.sock ;; | ||
| esac |
Comment on lines
+674
to
+678
| case "${DOCKER_HOST:-}" in | ||
| unix://* ) DOCKER_SOCK_PATH="${DOCKER_HOST#unix://}" ;; | ||
| /* ) DOCKER_SOCK_PATH="$DOCKER_HOST" ;; | ||
| * ) DOCKER_SOCK_PATH=/var/run/docker.sock ;; | ||
| esac |
Collaborator
|
@copilot merge main, recompile, review all comments |
…url-protocols Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…edaction
When the domain capture group is empty (e.g. `file:///etc/passwd` or
bare `http://`), the previous code called `addRedactedDomain("")` and
logged an empty truncated string. Now the empty-domain branch uses the
scheme name (e.g. `file://`) as the redacted-domain token and returns
`(redacted)` directly.
Adds a regression test for `file:///etc/passwd`.
Also merges origin/main.
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Contributor
Author
Done — merged main, typechecked, and addressed all review comments:
|
…Protocols The comment said "skip addRedactedDomain" but the code called it. Updated the comment to accurately describe that the scheme token (e.g. "file://") is used as the redacted-domain entry. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug Fix
What was the bug?
sanitizeUrlProtocols()used a hardcoded blocklist of://schemes (http,ftp,file,ssh,git). Any unlisted scheme —ws://,wss://,smb://,irc://,ldap://,ldaps://,rtsp://,feed://, etc. — passed through the fullsanitizeContentCorepipeline unchanged and was written verbatim to GitHub. In a prompt-injection scenario this enables active markdown links to attacker-controlled endpoints (e.g.smb://triggers NTLM credential leaks on Windows via Responder).How did you fix it?
Step 1 — allowlist for
://schemes (sanitize_content_core.cjs)Replace the explicit blocklist with an allowlist: redact every
protocol://URL that is nothttps://. Two anchors prevent corrupting validhttps://URLs:(?<![a-z0-9])(negative lookbehind) — stops the regex from matching a suffix ofhttps://(e.g.ttps://) mid-string.(?!https://)(negative lookahead) — explicitly skips thehttps://scheme, which is forwarded tosanitizeUrlDomainsfor domain-level filtering.Step 2 — extend the single-colon blocklist
Add
magnetto the existingdata|javascript|vbscript|about|mailto|tellist to covermagnet:?xt=…URLs. A fully general single-colon pattern is intentionally avoided to prevent false positives on YAMLkey:value, C++std::, and WindowsC:\paths.The two steps are implemented as separate
replacecalls for clarity.Testing
10 new regression tests added covering
ws://,wss://,smb://,irc://,ldap://,ldaps://,rtsp://,magnet:, thehttps://pass-through, and a guard against the "ttps:// suffix" regression.