JS: remove FP from js/regexpinjection where no regexp was constructed by erik-krogh · Pull Request #2556 · github/codeql

erik-krogh · 2019-12-19T10:03:58Z

Fixes this FP: https://gist.github.com/esbena/9906d698fe8b8e0bc53128af36e01c5f#file-changes-results-unclassified_added-md-L5076

The FP happened because our heuristic for isInterpretedAsRegExp was a little to broad.
Search.search(..) does not interpret its arguments as RegExp.
isInterpretedAsRegExp flagged Search.search(..) because the method was called search, and Search could maybe be a string because Search is a global variable.

My proposed fix is to check whether there exists a callee that is a non-extern function.
Because if such a function exists, then it's likely not a call to a native API, which is what the heuristic is looking for.

An evaluation on our defaults.slugs show no change in results, except for the FP mentioned.
And performance also seems OK.

I did a re-run of the slowest benchmarks (gecko-dev in particular). And it seems the performance slowdowns in the first run was just a fluke.

max-schaefer · 2019-12-19T10:07:24Z

What's the rationale for going with this criterion rather than the simpler one suggested in the issue (ignore .search where the result is not used)?

erik-krogh · 2019-12-19T12:29:15Z

What's the rationale for going with this criterion rather than the simpler one suggested in the issue (ignore .search where the result is not used)?

Because I don't it as the problem that the result from .search is unused.
E.g. str.search(someUserControlledRegExp) has the potential to take exponential time independently of whether the result from .search is used or not.

The particular FP was flagged as a function that interpreted its input as a RegExp, but looking into the source code revealed that the function does not interpret its input as a RegExp.

And this criterion can also fix other instances where a .search method does not interpret its inputs as RegExp, even if the result from .search is used.

max-schaefer · 2019-12-19T12:46:54Z

Because I don't it as the problem that the result from .search is unused.

Sure, it's a heuristic. Your proposed heuristic is less syntactic, but it risks false positives in cases where we cannot resolve a call to a non-regexp .search method, doesn't it? My worry is that our call graph is still so incomplete that this will happen rather frequently.

It's nice that it fixes this particular false positive, but I'd like to get some more confidence that it also helps in the other cases we have previously observed. Your experimental results provide some evidence for this, but of course mis-classifying a call to .search only gets you part of the way towards a false positive.

One possible experiment we could do would be to look for .search calls where the result is unused (suggesting that they are probably not regular-expression searches), but where we cannot resolve a callee (and hence would trivially, but wrongly conclude that they are). Here is what I mean: https://lgtm.com/query/5557573254743615646

max-schaefer · 2019-12-19T16:56:02Z

Interestingly, most (all?) results from the above query seem to be about Angular $location service instances. I doubt they'll be causing false-positives, so I am reasonably convinced that your proposed heuristic should work in practice.

I think it would be good to rephrase it in terms of getACallee(), though, to take full advantage of our call graph builder.

Also, you may want to be a little more precise about the allowed call targets. Rather than saying "anything in an externs file", you should be able to use the classes in Externs.qll to express the precise criterion we are after, namely "the replace/search method of class String."

erik-krogh · 2019-12-20T09:58:02Z

I ran a query testing what sinks a call-target heuristic excludes compared to a void-context heuristic: https://lgtm.com/query/2775226574270607788/

And I think I still prefer a call-target based heuristic.

Interestingly, most (all?) results from the above query seem to be about Angular $location service instances.

There are exceptions, but it looks like those exceptions are unlikely to receive tainted data-flow.

max-schaefer · 2020-01-02T10:01:53Z

javascript/ql/src/semmle/javascript/Regexp.qll

      mce.getMethodName() = methodName and
-      not exists(DataFlow::FunctionNode func | func = DataFlow::valueNode(mce.getCallee()).getAFunctionValue() | not func.getFunction().inExternsFile())
+      not exists(Function func |
+        func = any(DataFlow::MethodCallNode call | call.getEnclosingExpr() = mce).getACallee()


Can we avoid the getEnclosingExpr dance by making mce a MethodCallNode straight away and rephrasing the code relating it to source below?

remove FP from js/regexpinjection where no regexp was constructed

15d74b7

erik-krogh requested a review from a team as a code owner December 19, 2019 10:03

erik-krogh changed the title ~~FP: remove FP from js/regexpinjection where no regexp was constructed~~ JS: remove FP from js/regexpinjection where no regexp was constructed Dec 19, 2019

erik-krogh added the JS label Dec 19, 2019

more precise heuristic to identify allowed call targets

a0b5aa5

max-schaefer reviewed Jan 2, 2020

View reviewed changes

refactor isInterpretedAsRegExp to directly work on a DataFlow node

d1a77d6

max-schaefer approved these changes Jan 2, 2020

View reviewed changes

semmle-qlci merged commit 06d812a into github:master Jan 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

JS: remove FP from js/regexpinjection where no regexp was constructed#2556

JS: remove FP from js/regexpinjection where no regexp was constructed#2556
semmle-qlci merged 3 commits intogithub:masterfrom
erik-krogh:RegexpVoidCxt

erik-krogh commented Dec 19, 2019

Uh oh!

max-schaefer commented Dec 19, 2019

Uh oh!

erik-krogh commented Dec 19, 2019

Uh oh!

max-schaefer commented Dec 19, 2019

Uh oh!

max-schaefer commented Dec 19, 2019

Uh oh!

erik-krogh commented Dec 20, 2019

Uh oh!

max-schaefer Jan 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

erik-krogh commented Dec 19, 2019

Uh oh!

max-schaefer commented Dec 19, 2019

Uh oh!

erik-krogh commented Dec 19, 2019

Uh oh!

max-schaefer commented Dec 19, 2019

Uh oh!

max-schaefer commented Dec 19, 2019

Uh oh!

erik-krogh commented Dec 20, 2019

Uh oh!

max-schaefer Jan 2, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants