-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Two new queries for URL and hostname sanitization (CWE-020). #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
taus-semmle
merged 4 commits into
github:master
from
markshannon:python-incomplete-url-sanitize
Jan 29, 2019
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
88d8cb5
Python: Two new queries for URL and hostname sanitization (CWE-020).
markshannon 6ddbed7
Python: Minor tweaks to qldoc and release note.
markshannon 3850f87
Make qhelp for 'Incomplete URL substring sanitization' consistent acr…
markshannon 9adb19f
Merge branch 'master' into python-incomplete-url-sanitize
taus-semmle File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
69 changes: 69 additions & 0 deletions
69
python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| <!DOCTYPE qhelp PUBLIC | ||
| "-//Semmle//qhelp//EN" | ||
| "qhelp.dtd"> | ||
| <qhelp> | ||
|
|
||
| <overview> | ||
| <p> | ||
|
|
||
| Sanitizing untrusted URLs is an important technique for | ||
| preventing attacks such as request forgeries and malicious | ||
| redirections. Often, this is done by checking that the host of a URL | ||
| is in a set of allowed hosts. | ||
|
|
||
| </p> | ||
|
|
||
| <p> | ||
|
|
||
| If a regular expression implements such a check, it is | ||
| easy to accidentally make the check too permissive by not escaping the | ||
| <code>.</code> meta-characters appropriately. | ||
|
|
||
| Even if the check is not used in a security-critical | ||
| context, the incomplete check may still cause undesirable behaviors | ||
| when it accidentally succeeds. | ||
|
|
||
| </p> | ||
| </overview> | ||
|
|
||
| <recommendation> | ||
| <p> | ||
|
|
||
| Escape all meta-characters appropriately when constructing | ||
| regular expressions for security checks, pay special attention to the | ||
| <code>.</code> meta-character. | ||
|
|
||
| </p> | ||
| </recommendation> | ||
|
|
||
| <example> | ||
|
|
||
| <p> | ||
|
|
||
| The following example code checks that a URL redirection | ||
| will reach the <code>example.com</code> domain, or one of its | ||
| subdomains. | ||
|
|
||
| </p> | ||
|
|
||
| <sample src="examples/IncompleteHostnameRegExp.py"/> | ||
|
|
||
| <p> | ||
| The <code>unsafe</code> check is easy to bypass because the unescaped | ||
| <code>.</code> allows for any character before | ||
| <code>example.com</code>, effectively allowing the redirect to go to | ||
| an attacker-controlled domain such as <code>wwwXexample.com</code>. | ||
|
|
||
| </p> | ||
| <p> | ||
| The <code>safe</code> check closes this vulnerability by escaping the <code>.</code> | ||
| so that URLs of the form <code>wwwXexample.com</code> are rejected. | ||
| </p> | ||
|
|
||
| </example> | ||
|
|
||
| <references> | ||
| <li>OWASP: <a href="https://www.owasp.org/index.php/Server_Side_Request_Forgery">SSRF</a></li> | ||
| <li>OWASP: <a href="https://www.owasp.org/index.php/Unvalidated_Redirects_and_Forwards_Cheat_Sheet">XSS Unvalidated Redirects and Forwards Cheat Sheet</a>.</li> | ||
| </references> | ||
| </qhelp> |
44 changes: 44 additions & 0 deletions
44
python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.ql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| /** | ||
| * @name Incomplete regular expression for hostnames | ||
| * @description Matching a URL or hostname against a regular expression that contains an unescaped dot as part of the hostname might match more hostnames than expected. | ||
| * @kind problem | ||
| * @problem.severity warning | ||
| * @precision high | ||
| * @id py/incomplete-hostname-regexp | ||
| * @tags correctness | ||
| * security | ||
| * external/cwe/cwe-20 | ||
| */ | ||
|
|
||
| import python | ||
| import semmle.python.regex | ||
|
|
||
| private string commonTopLevelDomainRegex() { | ||
| result = "com|org|edu|gov|uk|net|io" | ||
| } | ||
|
|
||
| /** | ||
| * Holds if `pattern` is a regular expression pattern for URLs with a host matched by `hostPart`, | ||
| * and `pattern` contains a subtle mistake that allows it to match unexpected hosts. | ||
| */ | ||
| bindingset[pattern] | ||
| predicate isIncompleteHostNameRegExpPattern(string pattern, string hostPart) { | ||
| hostPart = pattern | ||
| .regexpCapture("(?i).*" + | ||
| // an unescaped single `.` | ||
| "(?<!\\\\)[.]" + | ||
| // immediately followed by a sequence of subdomains, perhaps with some regex characters mixed in, followed by a known TLD | ||
| "([():|?a-z0-9-]+(\\\\)?[.](" + commonTopLevelDomainRegex() + "))" + ".*", 1) | ||
| } | ||
|
|
||
| from Regex r, string pattern, string hostPart | ||
| where | ||
| ( | ||
| r.getText() = pattern | ||
| ) and | ||
| isIncompleteHostNameRegExpPattern(pattern, hostPart) and | ||
| // ignore patterns with capture groups after the TLD | ||
| not pattern.regexpMatch("(?i).*[.](" + commonTopLevelDomainRegex() + ").*[(][?]:.*[)].*") | ||
| select r, | ||
| "This regular expression has an unescaped '.' before '" + hostPart + | ||
| "', so it might match more hosts than expected." |
85 changes: 85 additions & 0 deletions
85
python/ql/src/Security/CWE-020/IncompleteUrlSubstringSanitization.qhelp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| <!DOCTYPE qhelp PUBLIC | ||
| "-//Semmle//qhelp//EN" | ||
| "qhelp.dtd"> | ||
| <qhelp> | ||
|
|
||
| <overview> | ||
| <p> | ||
|
|
||
| Sanitizing untrusted URLs is an important technique for | ||
| preventing attacks such as request forgeries and malicious | ||
| redirections. Usually, this is done by checking that the host of a URL | ||
| is in a set of allowed hosts. | ||
|
|
||
| </p> | ||
|
|
||
| <p> | ||
| However, treating the URL as a string and checking if one of the | ||
| allowed hosts is a substring of the URL is very prone to errors. | ||
| Malicious URLs can bypass such security checks by embedding one | ||
| of the allowed hosts in an unexpected location. | ||
|
|
||
| </p> | ||
|
|
||
| <p> | ||
|
|
||
| Even if the substring check is not used in a | ||
| security-critical context, the incomplete check may still cause | ||
| undesirable behaviors when the check succeeds accidentally. | ||
|
|
||
| </p> | ||
| </overview> | ||
|
|
||
| <recommendation> | ||
| <p> | ||
|
|
||
| Parse a URL before performing a check on its host value, | ||
| and ensure that the check handles arbitrary subdomain sequences | ||
| correctly. | ||
|
|
||
| </p> | ||
| </recommendation> | ||
|
|
||
| <example> | ||
|
|
||
| <p> | ||
|
|
||
| The following example code checks that a URL redirection | ||
| will reach the <code>example.com</code> domain. | ||
|
|
||
| </p> | ||
|
|
||
| <sample src="examples/IncompleteUrlSubstringSanitization.py"/> | ||
|
|
||
| <p> | ||
|
|
||
| The first two examples show unsafe checks that are easily bypassed. | ||
| In <code>unsafe1</code> the attacker can simply add | ||
| <code>example.com</code> anywhere in the url. For example, | ||
| <code>http://evil-example.net/example.com</code>. | ||
| </p> | ||
| <p> | ||
| In <code>unsafe2</code> the attacker must use a hostname ending in | ||
| <code>example.com</code>, but that is easy to do. For example, | ||
| <code>http://benign-looking-prefix-example.com</code>. | ||
|
|
||
| </p> | ||
|
|
||
| <p> | ||
|
|
||
| The second two examples show safe checks. | ||
| In <code>safe1</code>, a white-list is used. Although fairly inflexible, | ||
| this is easy to get right and is most likely to be safe. | ||
| </p> | ||
| <p> | ||
| In <code>safe2</code>, <code>urlparse</code> is used to parse the URL, | ||
| then the hostname is checked to make sure it ends with <code>.example.com</code>. | ||
| </p> | ||
|
|
||
| </example> | ||
|
|
||
| <references> | ||
| <li>OWASP: <a href="https://www.owasp.org/index.php/Server_Side_Request_Forgery">SSRF</a></li> | ||
| <li>OWASP: <a href="https://www.owasp.org/index.php/Unvalidated_Redirects_and_Forwards_Cheat_Sheet">XSS Unvalidated Redirects and Forwards Cheat Sheet</a>.</li> | ||
| </references> | ||
| </qhelp> |
58 changes: 58 additions & 0 deletions
58
python/ql/src/Security/CWE-020/IncompleteUrlSubstringSanitization.ql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| /** | ||
| * @name Incomplete URL substring sanitization | ||
| * @description Security checks on the substrings of an unparsed URL are often vulnerable to bypassing. | ||
| * @kind problem | ||
| * @problem.severity warning | ||
| * @precision high | ||
| * @id py/incomplete-url-substring-sanitization | ||
| * @tags correctness | ||
| * security | ||
| * external/cwe/cwe-20 | ||
| */ | ||
|
|
||
|
|
||
| import python | ||
| import semmle.python.regex | ||
|
|
||
| private string commonTopLevelDomainRegex() { | ||
| result = "com|org|edu|gov|uk|net|io" | ||
| } | ||
|
|
||
| predicate looksLikeUrl(StrConst s) { | ||
| exists(string text | | ||
| text = s.getText() | ||
| | | ||
| text.regexpMatch("(?i)([a-z]*:?//)?\\.?([a-z0-9-]+\\.)+(" + | ||
| commonTopLevelDomainRegex() +")(:[0-9]+)?/?") | ||
| or | ||
| // target is a HTTP URL to a domain on any TLD | ||
| text.regexpMatch("(?i)https?://([a-z0-9-]+\\.)+([a-z]+)(:[0-9]+)?/?") | ||
| ) | ||
| } | ||
|
|
||
| predicate incomplete_sanitization(Expr sanitizer, StrConst url) { | ||
| looksLikeUrl(url) and | ||
| ( | ||
| sanitizer.(Compare).compares(url, any(In i), _) | ||
| or | ||
| call_to_startswith(sanitizer, url) | ||
| or | ||
| unsafe_call_to_endswith(sanitizer, url) | ||
| ) | ||
| } | ||
|
|
||
| predicate call_to_startswith(Call sanitizer, StrConst url) { | ||
| sanitizer.getFunc().(Attribute).getName() = "startswith" | ||
| and | ||
| sanitizer.getArg(0) = url | ||
| } | ||
|
|
||
| predicate unsafe_call_to_endswith(Call sanitizer, StrConst url) { | ||
| sanitizer.getFunc().(Attribute).getName() = "endswith" and | ||
| sanitizer.getArg(0) = url and | ||
| not url.getText().regexpMatch("(?i)\\.([a-z0-9-]+)(\\.[a-z0-9-]+)+") | ||
| } | ||
|
|
||
| from Expr sanitizer, StrConst url | ||
| where incomplete_sanitization(sanitizer, url) | ||
| select sanitizer, "'$@' may be at an arbitrary position in the sanitized URL.", url, url.getText() |
19 changes: 19 additions & 0 deletions
19
python/ql/src/Security/CWE-020/examples/IncompleteHostnameRegExp.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| from flask import Flask, request, redirect | ||
| import re | ||
|
|
||
| app = Flask(__name__) | ||
|
|
||
| UNSAFE_REGEX = re.compile("(www|beta).example.com/") | ||
| SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/") | ||
|
|
||
| @app.route('/some/path/bad') | ||
| def unsafe(request): | ||
| target = request.args.get('target', '') | ||
| if UNSAFE_REGEX.match(target): | ||
| return redirect(target) | ||
|
|
||
| @app.route('/some/path/good') | ||
| def safe(request): | ||
| target = request.args.get('target', '') | ||
| if SAFE_REGEX.match(target): | ||
| return redirect(target) |
45 changes: 45 additions & 0 deletions
45
python/ql/src/Security/CWE-020/examples/IncompleteUrlSubstringSanitization.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| from flask import Flask, request, redirect | ||
| from urllib.parse import urlparse | ||
|
|
||
| app = Flask(__name__) | ||
|
|
||
felicitymay marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # Not safe, as "evil-example.net/example.com" would be accepted | ||
|
|
||
| @app.route('/some/path/bad1') | ||
| def unsafe1(request): | ||
| target = request.args.get('target', '') | ||
| if "example.com" in target: | ||
| return redirect(target) | ||
|
|
||
| # Not safe, as "benign-looking-prefix-example.com" would be accepted | ||
|
|
||
| @app.route('/some/path/bad2') | ||
| def unsafe2(request): | ||
| target = request.args.get('target', '') | ||
| if target.endswith("example.com"): | ||
| return redirect(target) | ||
|
|
||
|
|
||
|
|
||
| #Simplest and safest approach is to use a white-list | ||
|
|
||
| @app.route('/some/path/good1') | ||
| def safe1(request): | ||
| whitelist = [ | ||
| "example.com/home", | ||
| "example.com/login", | ||
| ] | ||
| target = request.args.get('target', '') | ||
| if target in whitelist: | ||
| return redirect(target) | ||
|
|
||
| #More complex example allowing sub-domains. | ||
|
|
||
| @app.route('/some/path/good2') | ||
| def safe2(request): | ||
| target = request.args.get('target', '') | ||
| host = urlparse(target).hostname | ||
| #Note the '.' preceding example.com | ||
| if host and host.endswith(".example.com"): | ||
| return redirect(target) | ||
|
|
||
1 change: 1 addition & 0 deletions
1
python/ql/test/query-tests/Security/CWE-020/IncompleteHostnameRegExp.expected
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| | hosttest.py:6:27:6:51 | Str | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. | |
1 change: 1 addition & 0 deletions
1
python/ql/test/query-tests/Security/CWE-020/IncompleteHostnameRegExp.qlref
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Security/CWE-020/IncompleteHostnameRegExp.ql |
2 changes: 2 additions & 0 deletions
2
python/ql/test/query-tests/Security/CWE-020/IncompleteUrlSubstringSanitization.expected
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| | urltest.py:9:8:9:30 | Compare | '$@' may be at an arbitrary position in the sanitized URL. | urltest.py:9:8:9:20 | Str | example.com | | ||
| | urltest.py:15:8:15:37 | Attribute() | '$@' may be at an arbitrary position in the sanitized URL. | urltest.py:15:24:15:36 | Str | example.com | |
1 change: 1 addition & 0 deletions
1
python/ql/test/query-tests/Security/CWE-020/IncompleteUrlSubstringSanitization.qlref
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Security/CWE-020/IncompleteUrlSubstringSanitization.ql |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| from flask import Flask, request, redirect | ||
| import re | ||
|
|
||
| app = Flask(__name__) | ||
|
|
||
| UNSAFE_REGEX = re.compile("(www|beta).example.com/") | ||
| SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/") | ||
|
|
||
| @app.route('/some/path/bad') | ||
| def unsafe(request): | ||
| target = request.args.get('target', '') | ||
| if UNSAFE_REGEX.match(target): | ||
| return redirect(target) | ||
|
|
||
| @app.route('/some/path/good') | ||
| def safe(request): | ||
| target = request.args.get('target', '') | ||
| if SAFE_REGEX.match(target): | ||
| return redirect(target) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.