-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Queries to detect ReDoS and Regex injections in Java #2743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6c3f1a1
176b786
471795f
8e2f899
502d128
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
public class ReDoS { | ||
private static String EXPONENTIAL_REGEX = "(a+)+"; | ||
|
||
public void regexMatchBad(HttpServletRequest request) { | ||
// BAD: User provided input is matched against exponential regex using standard | ||
// Java regular expression engine | ||
java.util.regex.Pattern p = java.util.regex.Pattern.compile(EXPONENTIAL_REGEX); | ||
java.util.regex.Matcher m = p.matcher(request.getParameter("input")); | ||
boolean matches = m.matches(); | ||
} | ||
|
||
public void regexMatchGood(HttpServletRequest request) { | ||
// GOOD: RE2/J regular expression engine is being used | ||
com.google.re2j.Pattern p = com.google.re2j.Pattern.compile(EXPONENTIAL_REGEX); | ||
com.google.re2j.Matcher m = p.matcher(request.getParameter("input")); | ||
boolean matches = m.matches(); | ||
} | ||
} | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
<!DOCTYPE qhelp PUBLIC | ||
"-//Semmle//qhelp//EN" | ||
"qhelp.dtd"> | ||
<qhelp> | ||
|
||
<overview> | ||
<p>Matching user input against a regular expression which takes exponential time in the worst case | ||
can allow a malicious user to perform a Denial of Service ("DoS") attack by crafting input that | ||
takes a long time to execute.</p> | ||
|
||
<p>Most regular expression engines, including the Java standard library implementation, are | ||
designed to work with an extended regular expression syntax. Although this provides flexibility for | ||
the user, it can prevent the engine from constructing an efficient implementation of the matcher in | ||
all circumstances. In particular, the "worst case time complexity" (see the references) of certain | ||
regular expressions may be "exponential". This would allow a malicious user to provide some input | ||
which causes the regular expression to take a very long time to execute.</p> | ||
|
||
<p>Typically, a regular expression is vulnerable to this attack if it applies repetition to a | ||
sub-expression which itself is repeated, or contains overlapping options. For example, | ||
<code>(a+)+</code> is vulnerable to a string such as <code>aaaaaaaaaaaaaaaaaaaaaaaaaaab</code>. | ||
More information about the precise circumstances can be found in the references.</p> | ||
</overview> | ||
|
||
<recommendation> | ||
<p>Modify the regular expression to avoid the exponential worst case time. If this is not possible, | ||
then kill the regular expression matching if it takes too long. Alternatively, use a different | ||
regular expression engine, for example RE2/J which runs in time linear in the size of the input. | ||
</p> | ||
</recommendation> | ||
|
||
<example> | ||
<p>The following example shows a HTTP request parameter that is matched against a regular | ||
expression which has exponential worst case performance. In the first case, standard Java library | ||
is being used, which can lead to a Denial of Service. In the second case, RE2/J regular expression | ||
engine is being used. It runs in time linear in the size of the input and won't lead to a DoS.</p> | ||
|
||
<sample src="ReDoS.java" /> | ||
</example> | ||
|
||
<references> | ||
<li> | ||
OWASP: | ||
<a href="https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>. | ||
</li> | ||
<li> | ||
Regular-Expressions.info: <a href="https://www.regular-expressions.info/redos.html">Preventing Regular Expression Denial of Service (ReDoS)</a>. | ||
</li> | ||
<li> | ||
<a href="https://github.com/google/re2j">RE2/J: linear time regular expression matching in Java</a>. | ||
</li> | ||
</references> | ||
</qhelp> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
/** | ||
* @name Denial of Service from comparison of user input against expensive regex | ||
* @description User input should not be matched against a regular expression that could require | ||
* exponential time on certain input. | ||
* @kind path-problem | ||
* @problem.severity error | ||
* @precision high | ||
* @id java/redos | ||
* @tags security | ||
* external/cwe/cwe-400 | ||
*/ | ||
|
||
import java | ||
import semmle.code.java.dataflow.DataFlow | ||
import Regex | ||
|
||
/** A data flow source for exponential regex. */ | ||
class ExponentialRegexLiteral extends RegexPatternSource { | ||
ExponentialRegexLiteral() { isExponentialRegex(this.asExpr()) } | ||
} | ||
|
||
from DataFlow::PathNode source, DataFlow::PathNode sink, RegexInputFlowConfig conf | ||
where conf.hasFlowPath(source, sink) | ||
select sink.getNode(), source, sink, | ||
"$@ flows to regular expression operation with dangerous regex.", source.getNode(), | ||
"User-provided value" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
import java | ||
import semmle.code.java.dataflow.FlowSources | ||
import semmle.code.java.dataflow.TaintTracking | ||
import semmle.code.java.dataflow.TaintTracking2 | ||
|
||
/** | ||
* A taint-tracking configuration for unvalidated user input that is used to match against | ||
* malicious regex. | ||
*/ | ||
class RegexInputFlowConfig extends TaintTracking::Configuration { | ||
RegexInputFlowConfig() { this = "RegexInputFlowConfig" } | ||
|
||
override predicate isSource(DataFlow::Node source) { source instanceof RemoteFlowSource } | ||
|
||
override predicate isSink(DataFlow::Node sink) { sink instanceof RegexInputSink } | ||
|
||
override predicate isSanitizer(DataFlow::Node node) { | ||
node.getType() instanceof PrimitiveType or node.getType() instanceof BoxedType | ||
} | ||
} | ||
|
||
/** | ||
* Holds if method call `ma` matches against the regex `regexExpr`. For example, method `matches` | ||
* matches against regex `regex` in `Pattern.matches(regex, input)`. | ||
* The following cases are handled: | ||
* * `Pattern.matches(regex, input)` | ||
* * `input.matches(pattern)` (where `input` is `String`) | ||
* * pattern is created first (`Pattern.compile(regex)`) and then matcher (`p.matcher(input)`) | ||
* | ||
* | ||
* Detecting the last case has some limitations: | ||
* * it doesn't work if pattern is created in one method and matcher in another one, e.g.: | ||
* `void match(Pattern p) { p.matcher(input); } void foo() { match(Pattern.compile(regex)); }`, | ||
* but it works correctly if pattern is a class field and matcher is created in method, e.g.: | ||
* `private Pattern p = Pattern.compile(regex); void foo() { p.matcher(input); }` | ||
* * the issue is reported if `Matcher` object is created with user supplied input on malicious | ||
* pattern. To actually trigger the DoS condition, actual matching (e.g. `matcher.matches()`) is | ||
* required, but usually if matcher is created it will eventually be used. | ||
*/ | ||
predicate matchesAgainstRegex(MethodAccess ma, Expr regexExpr) { | ||
// `Pattern.matches(regex, input)` | ||
ma.getMethod() instanceof MethodPatternMatches and | ||
regexExpr.getParent() = ma | ||
or | ||
// `Pattern.compile(regex).matcher(input)` or | ||
// `Pattern p = Pattern.compile(regex); p.matcher(input);` or | ||
// `private Pattern p = Pattern.compile(regex); public void foo() { p.matcher(input); }` | ||
ma.getMethod() instanceof MethodPatternMatcher and | ||
( | ||
DataFlow::localFlow(DataFlow::exprNode(regexExpr.getParent()), | ||
DataFlow::exprNode(ma.getQualifier())) or | ||
ma.getQualifier().(FieldAccess).getField().getInitializer() = regexExpr.getParent() | ||
) | ||
or | ||
// `input.matches(pattern)` | ||
ma.getMethod() instanceof MethodStringMatches and | ||
regexExpr = ma.getAnArgument() | ||
} | ||
|
||
/** A data flow sink for unvalidated user input that is used to match against malicious regex. */ | ||
class RegexInputSink extends DataFlow::ExprNode { | ||
RegexInputSink() { | ||
exists(MethodAccess ma, RegexPatternFlowConfig cfg, DataFlow::ExprNode regexSink | | ||
ma.getAnArgument() = this.getExpr() or ma.getQualifier() = this.getExpr() | ||
| | ||
cfg.hasFlow(_, regexSink) and | ||
matchesAgainstRegex(ma, regexSink.getExpr()) and | ||
this != regexSink | ||
) | ||
} | ||
} | ||
|
||
/** A data flow source for malicious regex. */ | ||
abstract class RegexPatternSource extends DataFlow::Node { } | ||
|
||
/** | ||
* A taint-tracking configuration for regex pattern that is used to create `Pattern` or is used | ||
* as an argument to `matches` method. | ||
*/ | ||
class RegexPatternFlowConfig extends TaintTracking2::Configuration { | ||
RegexPatternFlowConfig() { this = "RegexPatternFlowConfig" } | ||
|
||
override predicate isSource(DataFlow::Node source) { source instanceof RegexPatternSource } | ||
|
||
override predicate isSink(DataFlow::Node sink) { | ||
exists(MethodAccess ma, int index | ma.getArgument(index) = sink.asExpr() | | ||
ma.getMethod() instanceof MethodPatternCompile | ||
or | ||
ma.getMethod() instanceof MethodPatternMatches and index = 0 | ||
or | ||
ma.getMethod() instanceof MethodStringMatches | ||
) | ||
} | ||
|
||
override predicate isSanitizer(DataFlow::Node node) { | ||
node.getType() instanceof PrimitiveType or node.getType() instanceof BoxedType | ||
} | ||
} | ||
|
||
/** | ||
* An expression that represents a regular expression with potential exponential behavior. | ||
* Couple of variants of a common pattern that leads to exponential blow-up are detected. | ||
*/ | ||
predicate isExponentialRegex(StringLiteral s) { | ||
// Example: ([a-z]+)+ | ||
s.getValue().regexpMatch(".*\\([^()*+\\]]+\\]?(\\*|\\+)\\)(\\*|\\+).*") | ||
or | ||
// Example: (([a-z])?([a-z]+))+ | ||
s.getValue().regexpMatch(".*\\((\\([^()]+\\)\\?)?\\([^()*+\\]]+\\]?(\\*|\\+)\\)\\)(\\*|\\+).*") | ||
or | ||
// Example: (([a-z])+)+ | ||
s.getValue().regexpMatch(".*\\(\\([^()*+\\]]+\\]?\\)(\\*|\\+)\\)(\\*|\\+).*") | ||
or | ||
// Example: (a|aa)+ | ||
s.getValue().regexpMatch(".*\\(([^()*+\\]]+\\]?)\\|\\1+\\??\\)(\\*|\\+).*") | ||
or | ||
// Example: (.*[a-z]){n} n >= 10 | ||
s.getValue().regexpMatch(".*\\(\\.\\*[^()*+\\]]+\\]?\\)\\{[1-9][0-9]+,?[0-9]*\\}.*") | ||
} |
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,15 @@ | ||||||||
public boolean isUsernameInPassword(HttpServletRequest request) { | ||||||||
String username = request.getParameter("username"); | ||||||||
String pasword = request.getParameter("password"); | ||||||||
|
||||||||
// BAD: User provided input is being used to create regex without escaping | ||||||||
Pattern p = Pattern.compile(username); | ||||||||
Matcher m = p.matcher(password); | ||||||||
return m.matches(); | ||||||||
|
||||||||
// GOOD: User provided input escaped before being used to create regex | ||||||||
String safeUsername = Pattern.quote(username); | ||||||||
Pattern p = Pattern.compile(safeUsername); | ||||||||
Matcher m = p.matcher(password); | ||||||||
return m.matches(); | ||||||||
} | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
EOL at EOF |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
<!DOCTYPE qhelp PUBLIC | ||
"-//Semmle//qhelp//EN" | ||
"qhelp.dtd"> | ||
<qhelp> | ||
|
||
<overview> | ||
<p>Constructing a regular expression with unsanitized user input is dangerous as a malicious user | ||
may be able to modify the meaning of the expression. In particular, such a user may be able to | ||
provide a regular expression fragment that takes exponential time in the worst case, and use that | ||
to perform a Denial of Service attack.</p> | ||
</overview> | ||
|
||
<recommendation> | ||
<p>For user input that is intended to be referenced as a string literal in a regular expression, | ||
use the <code>Pattern.quote</code> method to escape any special characters. If the regular | ||
expression is intended to be configurable by the user, then a timeout should be used to avoid | ||
Denial of Service attacks. Alternatively, a different regular expression engine, for example RE2/J | ||
which runs in time linear in the size of the input can be used.</p> | ||
</recommendation> | ||
|
||
<example> | ||
<p>The following example uses regular expression engine to check if the provided username is part | ||
of the provided password.</p> | ||
|
||
<p>In the first case, the username which is being used to construct the regex is not escaped. If | ||
a malicious user provides a regex that has exponential worst case performance, then this could | ||
lead to a Denial of Service.</p> | ||
|
||
<p>In the second case, the user input is escaped using <code>Pattern.quote</code> before being | ||
included in the regular expression. This ensures that the user cannot insert characters which have | ||
a special meaning in regular expressions.</p> | ||
|
||
<sample src="RegexInjection.cs" /> | ||
</example> | ||
|
||
<references> | ||
<li> | ||
OWASP: | ||
<a href="https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>. | ||
</li> | ||
<li> | ||
Regular-Expressions.info: <a href="https://www.regular-expressions.info/redos.html">Preventing Regular Expression Denial of Service (ReDoS)</a>. | ||
</li> | ||
<li> | ||
<a href="https://github.com/google/re2j">RE2/J: linear time regular expression matching in Java</a>. | ||
</li> | ||
</references> | ||
</qhelp> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
/** | ||
* @name Denial of Service from comparison of user input against regex built from user-controlled | ||
* source | ||
* @description User input should not be matched against a regular expression built from | ||
* user-controlled source, because that could require exponential time on certain | ||
* input. | ||
* @kind path-problem | ||
* @problem.severity error | ||
* @precision high | ||
* @id java/regex-injection | ||
* @tags security | ||
* external/cwe/cwe-400 | ||
*/ | ||
|
||
import java | ||
import semmle.code.java.dataflow.DataFlow | ||
import Regex | ||
|
||
/** A data flow source for user-controlled regex. */ | ||
class RemoteRegexPatternSource extends RegexPatternSource { | ||
RemoteRegexPatternSource() { this instanceof RemoteFlowSource } | ||
} | ||
|
||
from DataFlow::PathNode source, DataFlow::PathNode sink, RegexInputFlowConfig conf | ||
where conf.hasFlowPath(source, sink) | ||
select sink.getNode(), source, sink, | ||
"$@ flows to regular expression operation with user-provided regex.", source.getNode(), | ||
"User-provided value" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -170,6 +170,11 @@ class TypeFile extends Class { | |
TypeFile() { this.hasQualifiedName("java.io", "File") } | ||
} | ||
|
||
/** The class `java.util.regex.Pattern`. */ | ||
class TypePattern extends Class { | ||
TypePattern() { this.hasQualifiedName("java.util.regex", "Pattern") } | ||
} | ||
|
||
// --- Standard methods --- | ||
/** | ||
* Any of the methods named `command` on class `java.lang.ProcessBuilder`. | ||
|
@@ -291,6 +296,46 @@ class MethodMathMax extends Method { | |
} | ||
} | ||
|
||
/** | ||
* Any method named `compile` on class `java.util.regex.Pattern`. | ||
*/ | ||
class MethodPatternCompile extends Method { | ||
MethodPatternCompile() { | ||
this.hasName("compile") and | ||
this.getDeclaringType() instanceof TypePattern | ||
} | ||
} | ||
|
||
/** | ||
* Any method named `matches` on class `java.util.regex.Pattern`. | ||
*/ | ||
class MethodPatternMatches extends Method { | ||
MethodPatternMatches() { | ||
this.hasName("matches") and | ||
this.getDeclaringType() instanceof TypePattern | ||
} | ||
} | ||
|
||
/** | ||
* Any method named `matcher` on class `java.util.regex.Pattern`. | ||
*/ | ||
class MethodPatternMatcher extends Method { | ||
MethodPatternMatcher() { | ||
this.hasName("matcher") and | ||
this.getDeclaringType() instanceof TypePattern | ||
} | ||
} | ||
|
||
/** | ||
* Any method named `matches` on class `java.lang.String`. | ||
*/ | ||
class MethodStringMatches extends Method { | ||
MethodStringMatches() { | ||
this.hasName("matches") and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add |
||
this.getDeclaringType() instanceof TypeString | ||
} | ||
} | ||
|
||
// --- Standard fields --- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice to cover common utility libraries such as Apache Commons. e.g.
Also, any Spring related classes if any |
||
/** The field `System.in`. */ | ||
class SystemIn extends Field { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EOL at EOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this.
This sample is included in the qhelp. Will this newline at the end look good in the rendered help? Other samples don't have newline at EOF.
@felicitymay, what is your view on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to be considered a valid file on some operating systems, you need a EOL at EOF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine with me to change this. But I would like to wait for @felicitymay opinion, because other examples don't have EOL at EOF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies for the delay in replying. I hadn't responded because I was hoping that someone more knowledgeable would answer you. (I'm a content writer, rather than a developer, and work in Windows where EOL and EOF don't really come up.)
While @JLLeitschuh is right to point out that it's normal behavior in linux to end files with an extra newline, I suspect that @ggolawski is right to suggest that for these examples the extra blank line will look strange when they're rendered. @Semmle/java does that sound right?