-
Notifications
You must be signed in to change notification settings - Fork 1.8k
JavaScript: Improve handling of regular expressions in taint tracking. #910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pts. As its first application, this library makes it possible for `StoredXss` to reuse the `Source` classes of `DomBasedXss` and `ReflectedXss` without having to pull in their libraries (which contain their `Configuration` classes, causing `StoredXss` to recompute all flow information for the other two queries).
…izers for XSS queries.
Ping @Semmle/js. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM.
I think MetacharEscapeSanitizer
restricts too much flow though (see comment), could we make it more permissive initially and make it more restrictive if we find the need?
@@ -0,0 +1,225 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit confusing that Xss.ql
and Xss.qll
does not correspond directly to each other, but I guess renaming Xss.ql
to DomBasedXss.ql
is out of the question for compatibility reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be able to rename it soon, but I'm not sure whether it's safe to do just yet.
abstract class Sanitizer extends DataFlow::Node { } | ||
} | ||
|
||
/** Provides classes and predicates for the DOM-based XSS query. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the three modules below are mostly cut-pasted, and have not looked into their details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
getMethodName() = "replace" and | ||
exists(RegExpConstant c | | ||
c.getLiteral() = getArgument(0).asExpr() and | ||
c.getValue().regexpMatch("['\"&<>]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid matching on the single and double quotes? I fear they will make too many false XSS sanitizers since I think it is common to strip or replace one quote type with the other. This change motivates a slight renaming of the class to reflect that it does not identify all meta-characters.
Besides, the single quote is not a meta character, unless the browser supports it as an attribute value delimiter for compatibility reasons. I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll take out the quotes.
the single quote is not a meta character
https://www.w3.org/TR/2012/WD-html-markup-20120320/syntax.html#syntax-attr-single-quoted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, no, wait, I can't do that, otherwise it'll start flagging uses of verifyURL
from #205 again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then.
class MetacharEscapeSanitizer extends Sanitizer, DataFlow::MethodCallNode { | ||
MetacharEscapeSanitizer() { | ||
getMethodName() = "replace" and | ||
exists(RegExpConstant c | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sanitizer does not recognize the string target variant: tainted.replace('<', ...)
(which is safe in a loop). Perhaps we should make a library for identifying these sanitizing calls to replace
and friends (other PR), that could be useful for other security queries as well (".."
for js/path-injection
, for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but since we're aiming to make the sanitiser relatively permissive anyway I don't think there is an urgent need to cover that case.
Previously, we were a little inconsistent in our handling of regular expressions: while we treated any guard involving a regular expression test as a sanitiser, we would always track taint through regular expression replacements, even reasonably complete sanitisers.
This PR proposes to treat regexp replaces involving HTML metacharacters as sanitisers for the XSS queries (but not for anything else). In particular, this fixes #205.
Of course, we might lose true positives from incomplete manual sanitisation, but as previously discussed that should be flagged by a separate query.
Since I was looking at regexps anyway, I also added a taint step through
RegExp.prototype.exec
(a conceptually unrelated change).The evaluation (internal link) shows some performance gains from the XSS library reorganisation in the first commit, and the results confirm that the FP reported in #205 has been fixed. The additional taint tracking through
exec
gains us a few more results. The one onace
is a false positive, but that's a problem with the tainted-path query, not the new taint tracking.