Skip to content

Comments

JS: remove FP from js/regexpinjection where no regexp was constructed#2556

Merged
semmle-qlci merged 3 commits intogithub:masterfrom
erik-krogh:RegexpVoidCxt
Jan 3, 2020
Merged

JS: remove FP from js/regexpinjection where no regexp was constructed#2556
semmle-qlci merged 3 commits intogithub:masterfrom
erik-krogh:RegexpVoidCxt

Conversation

@erik-krogh
Copy link
Contributor

Fixes this FP: https://gist.github.com/esbena/9906d698fe8b8e0bc53128af36e01c5f#file-changes-results-unclassified_added-md-L5076

The FP happened because our heuristic for isInterpretedAsRegExp was a little to broad.
Search.search(..) does not interpret its arguments as RegExp.
isInterpretedAsRegExp flagged Search.search(..) because the method was called search, and Search could maybe be a string because Search is a global variable.

My proposed fix is to check whether there exists a callee that is a non-extern function.
Because if such a function exists, then it's likely not a call to a native API, which is what the heuristic is looking for.

An evaluation on our defaults.slugs show no change in results, except for the FP mentioned.
And performance also seems OK.

I did a re-run of the slowest benchmarks (gecko-dev in particular). And it seems the performance slowdowns in the first run was just a fluke.

@erik-krogh erik-krogh requested a review from a team as a code owner December 19, 2019 10:03
@erik-krogh erik-krogh changed the title FP: remove FP from js/regexpinjection where no regexp was constructed JS: remove FP from js/regexpinjection where no regexp was constructed Dec 19, 2019
@erik-krogh erik-krogh added the JS label Dec 19, 2019
@max-schaefer
Copy link
Contributor

What's the rationale for going with this criterion rather than the simpler one suggested in the issue (ignore .search where the result is not used)?

@erik-krogh
Copy link
Contributor Author

What's the rationale for going with this criterion rather than the simpler one suggested in the issue (ignore .search where the result is not used)?

Because I don't it as the problem that the result from .search is unused.
E.g. str.search(someUserControlledRegExp) has the potential to take exponential time independently of whether the result from .search is used or not.

The particular FP was flagged as a function that interpreted its input as a RegExp, but looking into the source code revealed that the function does not interpret its input as a RegExp.

And this criterion can also fix other instances where a .search method does not interpret its inputs as RegExp, even if the result from .search is used.

@max-schaefer
Copy link
Contributor

Because I don't it as the problem that the result from .search is unused.

Sure, it's a heuristic. Your proposed heuristic is less syntactic, but it risks false positives in cases where we cannot resolve a call to a non-regexp .search method, doesn't it? My worry is that our call graph is still so incomplete that this will happen rather frequently.

It's nice that it fixes this particular false positive, but I'd like to get some more confidence that it also helps in the other cases we have previously observed. Your experimental results provide some evidence for this, but of course mis-classifying a call to .search only gets you part of the way towards a false positive.

One possible experiment we could do would be to look for .search calls where the result is unused (suggesting that they are probably not regular-expression searches), but where we cannot resolve a callee (and hence would trivially, but wrongly conclude that they are). Here is what I mean: https://lgtm.com/query/5557573254743615646

@max-schaefer
Copy link
Contributor

Interestingly, most (all?) results from the above query seem to be about Angular $location service instances. I doubt they'll be causing false-positives, so I am reasonably convinced that your proposed heuristic should work in practice.

I think it would be good to rephrase it in terms of getACallee(), though, to take full advantage of our call graph builder.

Also, you may want to be a little more precise about the allowed call targets. Rather than saying "anything in an externs file", you should be able to use the classes in Externs.qll to express the precise criterion we are after, namely "the replace/search method of class String."

@erik-krogh
Copy link
Contributor Author

I ran a query testing what sinks a call-target heuristic excludes compared to a void-context heuristic: https://lgtm.com/query/2775226574270607788/

And I think I still prefer a call-target based heuristic.

Interestingly, most (all?) results from the above query seem to be about Angular $location service instances.

There are exceptions, but it looks like those exceptions are unlikely to receive tainted data-flow.

mce.getMethodName() = methodName and
not exists(DataFlow::FunctionNode func | func = DataFlow::valueNode(mce.getCallee()).getAFunctionValue() | not func.getFunction().inExternsFile())
not exists(Function func |
func = any(DataFlow::MethodCallNode call | call.getEnclosingExpr() = mce).getACallee()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the getEnclosingExpr dance by making mce a MethodCallNode straight away and rephrasing the code relating it to source below?

@semmle-qlci semmle-qlci merged commit 06d812a into github:master Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants