Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby: IncompleteHostnameRegExp.ql #7917

Merged
merged 15 commits into from
Mar 18, 2022
Merged

Conversation

aibaars
Copy link
Contributor

@aibaars aibaars commented Feb 9, 2022

This pull request is a port of JavaScripts IncompleteHostnameRegExp.ql query. Most of the query and documentation are a shameless copy of the original.

The pull request is split into:

  • a commit that copies the Javascript implementation: 1ad6e96
  • a commit that adapts the implementation for Ruby: a221e28

@github github deleted a comment from github-actions bot Feb 9, 2022
@aibaars aibaars marked this pull request as ready for review February 9, 2022 18:33
@aibaars aibaars requested a review from a team as a code owner February 9, 2022 18:33
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2022

QHelp previews:

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

UNSAFE_REGEX = /(www|beta).example.com\//
SAFE_REGEX = /(www|beta)\.example\.com\//

def unsafe
    target = params[:target]
    if UNSAFE_REGEX.match(target)
        redirect_to target
    end
end

def safe
    target = params[:target]
    if SAFE_REGEX.match(target)
        redirect_to target
    end
end

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

@github github deleted a comment from github-actions bot Feb 9, 2022
"(?<!\\\\)[.]" +
// immediately followed by a sequence of subdomains, perhaps with some regex characters mixed in, followed by a known TLD
"([():|?a-z0-9-]+(\\\\)?[.](" + commonTopLevelDomainRegex() + "))" + ".*", 1)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, is it worth adding a shared HostnameRegExpUtils.qll file to share this with the python implementation? I notice the JS version has a different implementation. I don't know whether we (the dynamic teams) want to consider consolidating them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of history: The one you copied originated in JS, written before we parsed string literals as RegExps. When we started extracting strings as RegExps, it was rewritten to its current form.

I'd say the new version is better, if you have ASTs for regexps.

@github-actions
Copy link
Contributor

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

UNSAFE_REGEX = /(www|beta).example.com\//
SAFE_REGEX = /(www|beta)\.example\.com\//

def unsafe
    target = params[:target]
    if UNSAFE_REGEX.match(target)
        redirect_to target
    end
end

def safe
    target = params[:target]
    if SAFE_REGEX.match(target)
        redirect_to target
    end
end

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

tausbn
tausbn previously approved these changes Feb 11, 2022
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the Python docs changes.

@aibaars aibaars marked this pull request as draft February 17, 2022 09:09
@aibaars aibaars force-pushed the incomplete-hostname branch 2 times, most recently from 605a2e0 to 6d3d2d5 Compare February 28, 2022 18:07
@github-actions
Copy link
Contributor

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

UNSAFE_REGEX = /(www|beta).example.com\//
SAFE_REGEX = /(www|beta)\.example\.com\//

def unsafe
    target = params[:target]
    if UNSAFE_REGEX.match(target)
        redirect_to target
    end
end

def safe
    target = params[:target]
    if SAFE_REGEX.match(target)
        redirect_to target
    end
end

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2022

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host;
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2022

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host;
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

@aibaars aibaars marked this pull request as ready for review March 1, 2022 12:23
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2022

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host;
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2022

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

tausbn
tausbn previously requested changes Mar 11, 2022
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and suggestions. I did not review the query in detail. I have no doubt that the original JavaScript query is excellent. 🙂
My only lingering concern here is the fact that the string "\." in JavaScript gets interpreted as just a string containing a period (the backslash disappears) whereas in Python, it is interpreted as the string "\\." (i.e. the backslash is interpreted as just a literal backslash, since \. is not a valid escape). However, I'm guessing that this detail is abstracted away by relying on the regex library anyway, and so it should be fine.

@@ -48,6 +49,19 @@ newtype TRegExpParent =
/** A back reference */
TRegExpBackRef(Regex re, int start, int end) { re.backreference(start, end) }

/**
* Provides regular expression patterns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very general description. So much so that I'm not actually sure what it means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I mindlessly copied it from

/**
* Provides regular expression patterns.
*/
module RegExpPatterns {
/**
* Gets a pattern that matches common top-level domain names in lower case.
*/
string commonTLD() {
// according to ranking by http://google.com/search?q=site:.<<TLD>>
result = "(?:com|org|edu|gov|uk|net|io)(?![a-z0-9])"
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would Provides utility predicates related to regular expressions. be any better?

/**
* Gets a pattern that matches common top-level domain names in lower case.
*/
string commonTLD() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is not really in line with our style guide. At the very least, it should be Tld not TLD. Also, since this function returns a result, it should perhaps start with get.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ 👍

it should perhaps start with get.

And since it has multiple results it should probably start with getA. So getACommonTld().

@@ -751,6 +767,9 @@ class RegExpGroup extends RegExpTerm, TRegExpGroup {
*/
int getNumber() { result = re.getGroupNumber(start, end) }

/** Holds if this is a capture group. */
predicate isCapture() { not exists(this.getNumber()) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be completely misunderstanding this predicate, but is that not correct? Is this not expressing "not an unnamed capture group" instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all capture groups have a number, but only some have a name. Is that assumption incorrect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, the not makes no sense at all. You're right.

* A node whose value may flow to a position where it is interpreted
* as a part of a regular expression.
*/
class RegExpPatternSource extends DataFlow::CfgNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this class should be implemented in this file. I would rather that it live somewhere closer to the rest of the data-flow implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Javascript it is in javascript/ql/lib/semmle/javascript/Regexp.qll and this file looks like the Python variant of that. I don't mind moving it to another place, but could you tell me where you'd like me to move it to?

predicate isConstantInvalidInsideOrigin(RegExpConstant term) {
// Look for any of these cases:
// - A character that can't occur in the origin
// - Two dashes in a row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one surprises me. What about URLs with Punycode such as https://xn--wrdle-vua.dk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erik-krogh , do you know ^ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know...
My best guess is that's it's sufficiently uncommon that we don't need to consider it.

@asgerf introduced the JS implementation as part of this PR.
Maybe he knows something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know double dashes are valid in the hostname, but in practice it's an indicator that this regexp is not used to check a hostname string, at least not in a security-relevant context.

erik-krogh
erik-krogh previously approved these changes Mar 11, 2022
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS changes 👍

@github-actions
Copy link
Contributor

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

@github-actions
Copy link
Contributor

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

@github-actions
Copy link
Contributor

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

non-capture groups should not have a group number
@github-actions
Copy link
Contributor

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /((www|beta)\.)?example\.com/.

References

Copy link
Contributor

@nickrolfe nickrolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

@github-actions
Copy link
Contributor

QHelp previews:

javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

app.get('/some/path', function(req, res) {
    let url = req.param('url'),
        host = urlLib.parse(url).host;
    // BAD: the host of `url` may be controlled by an attacker
    let regex = /^((www|beta).)?example.com/;
    if (host.match(regex)) {
        res.redirect(url);
    }
});

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: let regex = /^((www|beta)\.)?example\.com/.

References

python/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta).example.com/")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

@app.route('/some/path/bad')
def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

@app.route('/some/path/good')
def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

The safe check closes this vulnerability by escaping the . so that URLs of the form wwwXexample.com are rejected.

References

ruby/ql/src/queries/security/cwe-020/IncompleteHostnameRegExp.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.

class AppController < ApplicationController

    def index
        url = params[:url]
        host = URI(url).host
        # BAD: the host of `url` may be controlled by an attacker
        regex = /^((www|beta).)?example.com/
        if host.match(regex)
            redirect_to url
        end
    end

end

The check is however easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . appropriately: regex = /^((www|beta)\.)?example\.com/.

References

@aibaars aibaars dismissed tausbn’s stale review March 18, 2022 13:15

I reverted the Python related changes and make a separate PR for them.

@aibaars aibaars requested a review from hvitved March 18, 2022 13:19
@aibaars aibaars merged commit 117fb5b into github:main Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants