-
Notifications
You must be signed in to change notification settings - Fork 9
Enhance URL allow list matching #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR significantly enhances the URL filtering guardrail's robustness and usability by improving URL validation, normalization, and matching logic. The changes address JS issue #39 with better handling of edge cases and more flexible URL matching.
Key improvements include:
- Scheme normalization: Handles various scheme formats (
HTTPS://,http:,https) uniformly - Enhanced security: Now blocks URLs containing passwords in addition to usernames when
block_userinfo=true - Scheme-less URL support: Enables more natural allow list entries (e.g.,
example.com) that match URLs regardless of scheme, while still enforcing explicit scheme restrictions when specified - Improved validation: Better port validation, path segment boundary checking, and fallback host extraction for malformed URLs
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/checks/urls.ts |
Core implementation updates including new helper functions (normalizeAllowedSchemes, extractHostCandidate, safeGetPort, isIpv4Address), refactored validateUrlSecurity to track scheme presence, and completely reworked isUrlAllowed with comprehensive matching logic for schemes, ports, paths, queries, and fragments |
src/__tests__/unit/checks/keywords-urls.test.ts |
Extensive test additions covering scheme normalization, path boundaries, scheme-less matching, CIDR ranges, port matching, query/fragment requirements, password-only userinfo blocking, malformed port handling, and trailing slash edge cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Only check allowed_schemes if the URL explicitly had a scheme | ||
| const normalizedScheme = originalScheme.toLowerCase(); | ||
|
|
||
| if (hadScheme && !config.allowed_schemes.has(normalizedScheme)) { | ||
| return { parsedUrl: null, reason: `Blocked scheme: ${normalizedScheme}`, hadScheme }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforce allowed_schemes for scheme-less URLs
Scheme validation is skipped whenever the input omits an explicit scheme (hadScheme is false), so a plain example.com now bypasses allowed_schemes. For configs that deliberately disallow http (e.g., allowed_schemes: ['https']) but still have an HTTP entry in the allow list, users can simply drop the http:// prefix and the URL will be allowed because both the security check here and the later allow-list scheme check are gated on hadScheme. This reintroduces disallowed schemes via scheme-less inputs and weakens the scheme restriction.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAI. Implemented this way for usability. A typical user will type example.com and not https://example.com. A scheme not allowed will still be caught (http://example.com)
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gabor-openai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY
Making URL detection more robust in response to #39
allowed_schemesentries for matching_is_url_allowedto handle host only entries by using theallowed_schemesblock_userinfo=trueexample.comdoesn't get blocked when https://example.com is in the allow list. But http://example.com does)