Ruby: IncompleteHostnameRegExp.ql #7917

aibaars · 2022-02-09T18:02:37Z

This pull request is a port of JavaScripts IncompleteHostnameRegExp.ql query. Most of the query and documentation are a shameless copy of the original.

The pull request is split into:

a commit that copies the Javascript implementation: 1ad6e96
a commit that adapts the implementation for Ruby: a221e28

github-actions · 2022-02-09T18:46:55Z

QHelp previews:

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

UNSAFE_REGEX = /(www|beta).example.com\//
SAFE_REGEX = /(www|beta)\.example\.com\//

def unsafe
    target = params[:target]
    if UNSAFE_REGEX.match(target)
        redirect_to target
    end
end

def safe
    target = params[:target]
    if SAFE_REGEX.match(target)
        redirect_to target
    end
end

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

hmac · 2022-02-10T05:04:00Z

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.ql

+            "(?<!\\\\)[.]" +
+            // immediately followed by a sequence of subdomains, perhaps with some regex characters mixed in, followed by a known TLD
+            "([():|?a-z0-9-]+(\\\\)?[.](" + commonTopLevelDomainRegex() + "))" + ".*", 1)
+}


I wonder, is it worth adding a shared HostnameRegExpUtils.qll file to share this with the python implementation? I notice the JS version has a different implementation. I don't know whether we (the dynamic teams) want to consider consolidating them?

A bit of history: The one you copied originated in JS, written before we parsed string literals as RegExps. When we started extracting strings as RegExps, it was rewritten to its current form.

I'd say the new version is better, if you have ASTs for regexps.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.ql

ruby/ql/src/change-notes/2022-02-10-incomplete-hostname-regexp.md

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

ruby/ql/src/change-notes/2022-02-10-incomplete-hostname-regexp.md

github-actions · 2022-02-11T09:37:42Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

UNSAFE_REGEX = /(www|beta).example.com\//
SAFE_REGEX = /(www|beta)\.example\.com\//

def unsafe
    target = params[:target]
    if UNSAFE_REGEX.match(target)
        redirect_to target
    end
end

def safe
    target = params[:target]
    if SAFE_REGEX.match(target)
        redirect_to target
    end
end

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

tausbn

👍 for the Python docs changes.

github-actions · 2022-02-28T18:08:56Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

UNSAFE_REGEX = /(www|beta).example.com\//
SAFE_REGEX = /(www|beta)\.example\.com\//

def unsafe
    target = params[:target]
    if UNSAFE_REGEX.match(target)
        redirect_to target
    end
end

def safe
    target = params[:target]
    if SAFE_REGEX.match(target)
        redirect_to target
    end
end

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

github-actions · 2022-03-01T11:42:37Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host;
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

github-actions · 2022-03-01T12:18:39Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host;
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

github-actions · 2022-03-01T15:42:39Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host;
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

github-actions · 2022-03-07T15:13:34Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

tausbn

A few comments and suggestions. I did not review the query in detail. I have no doubt that the original JavaScript query is excellent. 🙂
My only lingering concern here is the fact that the string "\." in JavaScript gets interpreted as just a string containing a period (the backslash disappears) whereas in Python, it is interpreted as the string "\\." (i.e. the backslash is interpreted as just a literal backslash, since \. is not a valid escape). However, I'm guessing that this detail is abstracted away by relying on the regex library anyway, and so it should be fine.

tausbn · 2022-03-11T11:58:56Z

python/ql/lib/semmle/python/RegexTreeView.qll

@@ -48,6 +49,19 @@ newtype TRegExpParent =
  /** A back reference */
  TRegExpBackRef(Regex re, int start, int end) { re.backreference(start, end) }

+/**
+ * Provides regular expression patterns.


This is a very general description. So much so that I'm not actually sure what it means.

I agree, I mindlessly copied it from

codeql/javascript/ql/lib/semmle/javascript/Regexp.qll

Lines 992 to 1003 in df9533f

/**

* Provides regular expression patterns.

*/

module RegExpPatterns {

/**

* Gets a pattern that matches common top-level domain names in lower case.

*/

string commonTLD() {

// according to ranking by http://google.com/search?q=site:.<<TLD>>

result = "(?:com|org|edu|gov|uk|net|io)(?![a-z0-9])"

}

}

Would Provides utility predicates related to regular expressions. be any better?

tausbn · 2022-03-11T12:00:13Z

python/ql/lib/semmle/python/RegexTreeView.qll

+  /**
+   * Gets a pattern that matches common top-level domain names in lower case.
+   */
+  string commonTLD() {


This name is not really in line with our style guide. At the very least, it should be Tld not TLD. Also, since this function returns a result, it should perhaps start with get.

❤️ 👍

it should perhaps start with get.

And since it has multiple results it should probably start with getA. So getACommonTld().

tausbn · 2022-03-11T12:02:51Z

python/ql/lib/semmle/python/RegexTreeView.qll

@@ -751,6 +767,9 @@ class RegExpGroup extends RegExpTerm, TRegExpGroup {
   */
  int getNumber() { result = re.getGroupNumber(start, end) }

+  /** Holds if this is a capture group. */
+  predicate isCapture() { not exists(this.getNumber()) }


I may be completely misunderstanding this predicate, but is that not correct? Is this not expressing "not an unnamed capture group" instead?

I believe all capture groups have a number, but only some have a name. Is that assumption incorrect?

Oh wait, the not makes no sense at all. You're right.

tausbn · 2022-03-11T12:04:11Z

python/ql/lib/semmle/python/RegexTreeView.qll

+ * A node whose value may flow to a position where it is interpreted
+ * as a part of a regular expression.
+ */
+class RegExpPatternSource extends DataFlow::CfgNode {


I don't think this class should be implemented in this file. I would rather that it live somewhere closer to the rest of the data-flow implementation.

For Javascript it is in javascript/ql/lib/semmle/javascript/Regexp.qll and this file looks like the Python variant of that. I don't mind moving it to another place, but could you tell me where you'd like me to move it to?

tausbn · 2022-03-11T12:07:30Z

python/ql/src/Security/CWE-020/HostnameRegexpShared.qll

+predicate isConstantInvalidInsideOrigin(RegExpConstant term) {
+  // Look for any of these cases:
+  // - A character that can't occur in the origin
+  // - Two dashes in a row


This one surprises me. What about URLs with Punycode such as https://xn--wrdle-vua.dk?

@erik-krogh , do you know ^ ?

I don't know...
My best guess is that's it's sufficiently uncommon that we don't need to consider it.

@asgerf introduced the JS implementation as part of this PR.
Maybe he knows something?

I know double dashes are valid in the hostname, but in practice it's an indicator that this regexp is not used to check a hostname string, at least not in a security-relevant context.

erik-krogh

JS changes 👍

github-actions · 2022-03-11T13:26:38Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

javascript/ql/lib/semmle/javascript/Regexp.qll

github-actions · 2022-03-16T11:28:01Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

github-actions · 2022-03-16T13:33:38Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

non-capture groups should not have a group number

…RegExp.ql" This reverts commit ce50f35.

github-actions · 2022-03-18T12:06:35Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

nickrolfe

Otherwise, LGTM.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

github-actions · 2022-03-18T13:01:10Z

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /^((www|beta)\.)?example\.com/.

References

MDN: Regular Expressions
OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /^((www|beta)\.)?example\.com/.

References

OWASP: SSRF
OWASP: XSS Unvalidated Redirects and Forwards Cheat Sheet.
Common Weakness Enumeration: CWE-20.

I reverted the Python related changes and make a separate PR for them.

github-actions bot added documentation Ruby labels Feb 9, 2022

aibaars force-pushed the incomplete-hostname branch from 272c173 to daf0b6a Compare February 9, 2022 18:24

github deleted a comment from github-actions bot Feb 9, 2022

aibaars marked this pull request as ready for review February 9, 2022 18:33

aibaars requested a review from a team as a code owner February 9, 2022 18:33

aibaars force-pushed the incomplete-hostname branch from daf0b6a to 156c943 Compare February 9, 2022 18:45

github deleted a comment from github-actions bot Feb 9, 2022

hmac reviewed Feb 10, 2022

View reviewed changes

nickrolfe reviewed Feb 10, 2022

View reviewed changes

hvitved requested changes Feb 11, 2022

View reviewed changes

ruby/ql/src/change-notes/2022-02-10-incomplete-hostname-regexp.md Outdated Show resolved Hide resolved

aibaars requested review from a team as code owners February 11, 2022 09:36

github-actions bot added JS Python labels Feb 11, 2022

tausbn previously approved these changes Feb 11, 2022

View reviewed changes

aibaars marked this pull request as draft February 17, 2022 09:09

aibaars dismissed tausbn’s stale review via 605a2e0 February 24, 2022 16:33

aibaars force-pushed the incomplete-hostname branch 2 times, most recently from 605a2e0 to 6d3d2d5 Compare February 28, 2022 18:07

aibaars force-pushed the incomplete-hostname branch from 6d3d2d5 to e735678 Compare March 1, 2022 11:41

aibaars force-pushed the incomplete-hostname branch from e735678 to 1df3f19 Compare March 1, 2022 12:17

aibaars marked this pull request as ready for review March 1, 2022 12:23

aibaars force-pushed the incomplete-hostname branch from 1df3f19 to 3bd081f Compare March 1, 2022 15:40

tausbn previously requested changes Mar 11, 2022

View reviewed changes

erik-krogh previously approved these changes Mar 11, 2022

View reviewed changes

Address comments

cf4b834

aibaars dismissed erik-krogh’s stale review via cf4b834 March 11, 2022 13:25

erik-krogh reviewed Mar 14, 2022

View reviewed changes

javascript/ql/lib/semmle/javascript/Regexp.qll Show resolved Hide resolved

Address comment

852f05b

aibaars added 2 commits March 16, 2022 12:31

Merge remote-tracking branch 'upstream/main' into incomplete-hostname

ab93b37

Update expected output

6b323ee

aibaars force-pushed the incomplete-hostname branch from 7a03951 to 752f5c6 Compare March 16, 2022 17:28

Ruby: regex: fix getGroupNumber

1a51f0c

non-capture groups should not have a group number

aibaars force-pushed the incomplete-hostname branch from 752f5c6 to 1a51f0c Compare March 16, 2022 17:50

aibaars mentioned this pull request Mar 18, 2022

Ruby: Use taint tracking instead of type tracking to define regExpSource #8332

Merged

aibaars added 2 commits March 18, 2022 13:02

Revert "Python: switch to shared implementation of IncompleteHostname…

6d24591

…RegExp.ql" This reverts commit ce50f35.

Merge remote-tracking branch 'upstream/main' into incomplete-hostname

431b605

nickrolfe reviewed Mar 18, 2022

View reviewed changes

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp Outdated Show resolved Hide resolved

Ruby/JS add missing ^ in qhelp

4a27928

nickrolfe approved these changes Mar 18, 2022

View reviewed changes

aibaars requested a review from hvitved March 18, 2022 13:19

erik-krogh approved these changes Mar 18, 2022

View reviewed changes

hvitved approved these changes Mar 18, 2022

View reviewed changes

alexrford approved these changes Mar 18, 2022

View reviewed changes

aibaars merged commit 117fb5b into github:main Mar 18, 2022

	/**
	* Provides regular expression patterns.
	*/
	module RegExpPatterns {
	/**
	* Gets a pattern that matches common top-level domain names in lower case.
	*/
	string commonTLD() {
	// according to ranking by http://google.com/search?q=site:.<<TLD>>
	result = "(?:com\|org\|edu\|gov\|uk\|net\|io)(?![a-z0-9])"
	}
	}

Ruby: IncompleteHostnameRegExp.ql #7917

Ruby: IncompleteHostnameRegExp.ql #7917

Conversation

aibaars commented Feb 9, 2022 • edited Loading

github-actions bot commented Feb 9, 2022

Incomplete regular expression for hostnames

Recommendation

Example

References

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 11, 2022

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

tausbn left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 28, 2022

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

github-actions bot commented Mar 1, 2022

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

github-actions bot commented Mar 1, 2022

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

github-actions bot commented Mar 1, 2022

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

Incomplete regular expression for hostnames

Recommendation

Example

References

github-actions bot commented Mar 7, 2022

aibaars commented Feb 9, 2022 •

edited

Loading