-
Notifications
You must be signed in to change notification settings - Fork 1.8k
JS: add query js/exploitable-polynomial-redos #2884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
erik-krogh
merged 1 commit into
github:master
from
esbena:js/practically-exploitable-redos
Feb 27, 2020
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
<!DOCTYPE qhelp PUBLIC | ||
"-//Semmle//qhelp//EN" | ||
"qhelp.dtd"> | ||
|
||
<qhelp> | ||
|
||
<include src="ReDoSIntroduction.qhelp" /> | ||
|
||
<example> | ||
<p> | ||
|
||
Consider this use of a regular expression, which removes | ||
all leading and trailing whitespace in a string: | ||
|
||
</p> | ||
|
||
<sample language="javascript"> | ||
text.replace(/^\s+|\s+$/g, ''); // BAD | ||
</sample> | ||
|
||
<p> | ||
|
||
The sub-expression <code>"\s+$"</code> will match the | ||
whitespace characters in <code>text</code> from left to right, but it | ||
can start matching anywhere within a whitespace sequence. This is | ||
problematic for strings that do <strong>not</strong> end with a whitespace | ||
character. Such a string will force the regular expression engine to | ||
process each whitespace sequence once per whitespace character in the | ||
sequence. | ||
|
||
</p> | ||
|
||
<p> | ||
|
||
This ultimately means that the time cost of trimming a | ||
string is quadratic in the length of the string. So a string like | ||
<code>"a b"</code> will take milliseconds to process, but a similar | ||
string with a million spaces instead of just one will take several | ||
minutes. | ||
|
||
</p> | ||
|
||
<p> | ||
|
||
Avoid this problem by rewriting the regular expression to | ||
not contain the ambiguity about when to start matching whitespace | ||
sequences. For instance, by using a negative look-behind | ||
(<code>/^\s+|(?<!\s)\s+$/g</code>), or just by using the built-in trim | ||
method (<code>text.trim()</code>). | ||
|
||
</p> | ||
|
||
<p> | ||
|
||
Note that the sub-expression <code>"^\s+"</code> is | ||
<strong>not</strong> problematic as the <code>^</code> anchor restricts | ||
when that sub-expression can start matching, and as the regular | ||
expression engine matches from left to right. | ||
|
||
</p> | ||
|
||
</example> | ||
|
||
<example> | ||
|
||
<p> | ||
|
||
As a similar, but slightly subtler problem, consider the | ||
regular expression that matches lines with numbers, possibly written | ||
using scientific notation: | ||
</p> | ||
|
||
<sample language="javascript"> | ||
^0\.\d+E?\d+$ // BAD | ||
</sample> | ||
|
||
<p> | ||
|
||
The problem with this regular expression is in the | ||
sub-expression <code>\d+E?\d+</code> because the second | ||
<code>\d+</code> can start matching digits anywhere after the first | ||
match of the first <code>\d+</code> if there is no <code>E</code> in | ||
the input string. | ||
|
||
</p> | ||
|
||
<p> | ||
|
||
This is problematic for strings that do <strong>not</strong> | ||
end with a digit. Such a string will force the regular expression | ||
engine to process each digit sequence once per digit in the sequence, | ||
again leading to a quadratic time complexity. | ||
|
||
</p> | ||
|
||
<p> | ||
|
||
To make the processing faster, the regular expression | ||
should be rewritten such that the two <code>\d+</code> sub-expressions | ||
do not have overlapping matches: <code>^0\.\d+(E\d+)?$</code>. | ||
|
||
</p> | ||
|
||
</example> | ||
|
||
<include src="ReDoSReferences.qhelp"/> | ||
|
||
</qhelp> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
/** | ||
* @name Polynomial regular expression used on uncontrolled data | ||
* @description A regular expression that can require polynomial time | ||
* to match user-provided values may be | ||
* vulnerable to denial-of-service attacks. | ||
* @kind path-problem | ||
* @problem.severity warning | ||
* @precision high | ||
* @id js/polynomial-redos | ||
* @tags security | ||
* external/cwe/cwe-730 | ||
* external/cwe/cwe-400 | ||
*/ | ||
|
||
import javascript | ||
import semmle.javascript.security.performance.PolynomialReDoS::PolynomialReDoS | ||
import DataFlow::PathGraph | ||
|
||
|
||
from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink | ||
where cfg.hasFlowPath(source, sink) | ||
select sink.getNode(), source, sink, "This expensive $@ use depends on $@.", | ||
sink.getNode().(Sink).getRegExp(), "regular expression", source.getNode(), "a user-provided value" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,70 +1,34 @@ | ||
<!DOCTYPE qhelp PUBLIC | ||
"-//Semmle//qhelp//EN" | ||
"qhelp.dtd"> | ||
"-//Semmle//qhelp//EN" | ||
"qhelp.dtd"> | ||
|
||
<qhelp> | ||
|
||
<overview> | ||
<p> | ||
Some regular expressions take a very long time to match certain input strings to the point where | ||
the time it takes to match a string of length <i>n</i> is proportional to <i>2<sup>n</sup></i>. | ||
Such regular expressions can negatively affect performance, or even allow a malicious user to | ||
perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular | ||
expression to match. | ||
</p> | ||
<p> | ||
The regular expression engines provided by many popular JavaScript platforms use backtracking | ||
non-deterministic finite automata to implement regular expression matching. While this approach | ||
is space-efficient and allows supporting advanced features like capture groups, it is not | ||
time-efficient in general. The worst-case time complexity of such an automaton can be exponential, | ||
meaning that for strings of a certain shape, increasing the input length by ten characters may | ||
make the automaton about 1000 times slower. | ||
</p> | ||
<p> | ||
Typically, a regular expression is affected by this problem if it contains a repetition of the | ||
form <code>r*</code> or <code>r+</code> where the sub-expression <code>r</code> is ambiguous in | ||
the sense that it can match some string in multiple ways. More information about the precise | ||
circumstances can be found in the references. | ||
</p> | ||
</overview> | ||
<include src="ReDoSIntroduction.qhelp" /> | ||
|
||
<recommendation> | ||
<p> | ||
Modify the regular expression to remove the ambiguity. | ||
</p> | ||
</recommendation> | ||
<example> | ||
<p> | ||
Consider this regular expression: | ||
</p> | ||
<sample language="javascript"> | ||
/^_(__|.)+_$/ | ||
</sample> | ||
<p> | ||
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the | ||
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two | ||
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting | ||
of an odd number of underscores followed by some other character will cause the regular | ||
expression engine to run for an exponential amount of time before rejecting the input. | ||
</p> | ||
<p> | ||
This problem can be avoided by rewriting the regular expression to remove the ambiguity between | ||
the two branches of the alternative inside the repetition: | ||
</p> | ||
<sample language="javascript"> | ||
/^_(__|[^_])+_$/ | ||
</sample> | ||
</example> | ||
|
||
<example> | ||
<p> | ||
Consider this regular expression: | ||
</p> | ||
<sample language="javascript"> | ||
/^_(__|.)+_$/ | ||
</sample> | ||
<p> | ||
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the | ||
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two | ||
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting | ||
of an odd number of underscores followed by some other character will cause the regular | ||
expression engine to run for an exponential amount of time before rejecting the input. | ||
</p> | ||
<p> | ||
This problem can be avoided by rewriting the regular expression to remove the ambiguity between | ||
the two branches of the alternative inside the repetition: | ||
</p> | ||
<sample language="javascript"> | ||
/^_(__|[^_])+_$/ | ||
</sample> | ||
</example> | ||
<include src="ReDoSReferences.qhelp"/> | ||
|
||
<references> | ||
<li> | ||
OWASP: | ||
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>. | ||
</li> | ||
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li> | ||
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li> | ||
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke: | ||
<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>. | ||
</li> | ||
</references> | ||
</qhelp> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
<!DOCTYPE qhelp PUBLIC | ||
"-//Semmle//qhelp//EN" | ||
"qhelp.dtd"> | ||
<qhelp> | ||
<overview> | ||
<p> | ||
|
||
Some regular expressions take a long time to match certain | ||
input strings to the point where the time it takes to match a string | ||
of length <i>n</i> is proportional to <i>n<sup>k</sup></i> or even | ||
<i>2<sup>n</sup></i>. Such regular expressions can negatively affect | ||
performance, or even allow a malicious user to perform a Denial of | ||
Service ("DoS") attack by crafting an expensive input string for the | ||
regular expression to match. | ||
|
||
</p> | ||
|
||
<p> | ||
|
||
The regular expression engines provided by many popular | ||
JavaScript platforms use backtracking non-deterministic finite | ||
automata to implement regular expression matching. While this approach | ||
is space-efficient and allows supporting advanced features like | ||
capture groups, it is not time-efficient in general. The worst-case | ||
time complexity of such an automaton can be polynomial or even | ||
exponential, meaning that for strings of a certain shape, increasing | ||
the input length by ten characters may make the automaton about 1000 | ||
times slower. | ||
|
||
</p> | ||
|
||
<p> | ||
|
||
Typically, a regular expression is affected by this | ||
problem if it contains a repetition of the form <code>r*</code> or | ||
<code>r+</code> where the sub-expression <code>r</code> is ambiguous | ||
in the sense that it can match some string in multiple ways. More | ||
information about the precise circumstances can be found in the | ||
references. | ||
|
||
</p> | ||
</overview> | ||
|
||
<recommendation> | ||
|
||
<p> | ||
|
||
Modify the regular expression to remove the ambiguity, or | ||
ensure that the strings matched with the regular expression are short | ||
enough that the time-complexity does not matter. | ||
|
||
</p> | ||
|
||
</recommendation> | ||
</qhelp> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
<!DOCTYPE qhelp PUBLIC | ||
"-//Semmle//qhelp//EN" | ||
"qhelp.dtd"> | ||
<qhelp> | ||
<references> | ||
<li> | ||
OWASP: | ||
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>. | ||
</li> | ||
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li> | ||
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li> | ||
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke: | ||
<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>. | ||
</li> | ||
</references> | ||
</qhelp> |
31 changes: 31 additions & 0 deletions
31
javascript/ql/src/semmle/javascript/security/performance/PolynomialReDoS.qll
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
/** | ||
* Provides a taint tracking configuration for reasoning about | ||
* polynomial regular expression denial-of-service attacks. | ||
* | ||
* Note, for performance reasons: only import this file if | ||
* `PolynomialReDoS::Configuration` is needed, otherwise | ||
* `PolynomialReDoSCustomizations` should be imported instead. | ||
*/ | ||
import javascript | ||
|
||
module PolynomialReDoS { | ||
import PolynomialReDoSCustomizations::PolynomialReDoS | ||
|
||
class Configuration extends TaintTracking::Configuration { | ||
Configuration() { this = "PolynomialReDoS" } | ||
|
||
override predicate isSource(DataFlow::Node source) { source instanceof Source } | ||
|
||
override predicate isSink(DataFlow::Node sink) { sink instanceof Sink } | ||
|
||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode node) { | ||
super.isSanitizerGuard(node) or | ||
node instanceof LengthGuard | ||
} | ||
|
||
override predicate isSanitizer(DataFlow::Node node) { | ||
super.isSanitizer(node) or | ||
node instanceof Sanitizer | ||
} | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.