Skip to content

JS: Add array callbacks js/use-of-returnless-function query#2116

Merged
semmle-qlci merged 9 commits intogithub:masterfrom
erik-krogh:arrayCBRet
Nov 5, 2019
Merged

JS: Add array callbacks js/use-of-returnless-function query#2116
semmle-qlci merged 9 commits intogithub:masterfrom
erik-krogh:arrayCBRet

Conversation

@erik-krogh
Copy link
Copy Markdown
Contributor

Using a callback function that does not return a value in e.g. Array.prototype.filter is generally an error.

Example:

const changePackages = packages.filter(f => {
 danger.git.modified_files.includes(f)
})
if (changePackages.length) {...}

It started out as a separate query, but after copy-pasting enough from js/use-of-returnless-function I integrated it into that query.

Just looking for methods named filter, map etc. is too imprecise and gives rise to way too many false positives. The query therefore uses type tracking to see if the receiver is an array.

It does not find many new errors, but the ones it does find are good: https://lgtm.com/query/1772403078964278178/
(The benchmarks for the above were found using the Chris API on a version of the query that did not include type tracking)

While the query on its own is slower now, it seems that performance is still good when combined with another dataflow query.
Performance evaluation is on the way.

Lots of changed lines from moving most of ExprHasNoEffect.ql into a .qll file.

@erik-krogh erik-krogh requested a review from a team as a code owner October 11, 2019 14:26
@erik-krogh erik-krogh added the JS label Oct 11, 2019
(
call.getReceiver().getALocalSource() = array()
or
call.getCalleeNode() instanceof LodashUnderscore::Member
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will miss some cases due to not having getALocalSource:

import { filter } from 'lodash'
filter(xs, x => { ... } )

predicate voidArrayCallback(DataFlow::MethodCallNode call, Function func) {
hasNonVoidCallbackMethod(call.getMethodName()) and
func = call.getAnArgument().getALocalSource().asExpr() and
1 = count(DataFlow::Node arg | arg = call.getAnArgument() and arg.getALocalSource().asExpr() instanceof Function) and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary with the array() check? As far as I can tell, none of the methods we flag here could reasonably take two functions as arguments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right.
I misunderstood some instances I found where multiple functions were used.
In those there is only 1 callback function, and the second function argument is the initial value for the reduce operation.

The check should be replaced with a check that ensures that the callback is the first function argument given.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still want to address this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I somehow overlooked this when looking back at this issue.

import javascript
import DOMProperties
import semmle.javascript.frameworks.xUnit
import semmle.javascript.RestrictedLocations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this in the .qll? I think it's only used by the .ql.

from DataFlow::CallNode call
where
predicate callToVoidFunction(DataFlow::CallNode call, Function func) {
not call.isIndefinite(_) and
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this line wasn't touched by the present PR, but what's the rationale for using isIndefinite instead of isIncomplete or isUncertain?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no good rationale for specifically isIndefinite. I needed something that made sure i could reasonably trust that I didn't miss anything.

I see now that isIncomplete seems like the better fit.

Comment thread javascript/ql/src/Statements/UseOfReturnlessFunction.ql
(
callToVoidFunction(call, func) and
msg = "the $@ does not return anything, yet the return value is used." and
name = "function " + call.getCalleeName()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCalleeName isn't always defined, so by using it here and below you are implicitly excluding results. Is that intentional?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, you can use Function.describe to get a user-friendly description that generalises getCalleeName.

name = "function " + call.getCalleeName()
or
voidArrayCallback(call, func) and
msg = "the $@ does not return anything, yet the return value from the call to " + call.getCalleeName() + "() is used." and
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You add () after the function name here, but not above. I'd suggest dropping it.

not benignContext(call.asExpr()) and

// Avoid double reporting from js/useless-expression
not hasNoEffect(func.getBodyStmt(func.getNumBodyStmt() - 1).(ExprStmt).getExpr()) and
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this getBodyStmt business? Is it some sort of cheap AST-based approximation for a final statement? If so, would it be very difficult to replace it with a better one?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, please add a test case that exercises this whitelisting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it some sort of cheap AST-based approximation for a final statement?

Yes.

If so, would it be very difficult to replace it with a better one?

No. It quite close to the existing alwaysThrows predicate.
I basically duplicated the alwaysThrows predicate, but that way I'm sure about performance.

In any case, please add a test case that exercises this whitelisting.

Done.

exists(ReachableBasicBlock entry, DataFlow::Node noEffect |
entry = f.getEntryBB() and
hasNoEffect(noEffect.asExpr()) and
entry.dominates(noEffect.getBasicBlock())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this condition. Doesn't the entry node dominate every basic block?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right.
I was a little quick in copy-pasting from alwaysThrows without thinking about what the predicate actually did.

exists(DataFlow::Node noEffect |
noEffect.getContainer() = f and
hasNoEffect(noEffect.asExpr()) and
not exists(noEffect.getASuccessor())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you want? I don't think it matters whether noEffect has a data-flow successor, it's rather the control-flow successors we are interested in. So you probably want to look for control-flow predecessors of the exit node, or something like that. (This criterion isn't quite correct in the presence of finally blocks, but I think it's OK for whitelisting purposes.)

*/
predicate voidArrayCallback(DataFlow::CallNode call, Function func) {
hasNonVoidCallbackMethod(call.getCalleeName()) and
func = call.getAnArgument().getALocalSource().asExpr() and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using asExpr here you are excluding the case where the callback is a declared function. Perhaps use

Suggested change
func = call.getAnArgument().getALocalSource().asExpr() and
func = call.getCallback(_).getFunction() and

instead.

predicate voidArrayCallback(DataFlow::MethodCallNode call, Function func) {
hasNonVoidCallbackMethod(call.getMethodName()) and
func = call.getAnArgument().getALocalSource().asExpr() and
1 = count(DataFlow::Node arg | arg = call.getAnArgument() and arg.getALocalSource().asExpr() instanceof Function) and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still want to address this?

Comment thread javascript/ql/src/Statements/UseOfReturnlessFunction.ql
Comment thread javascript/ql/src/Statements/UseOfReturnlessFunction.ql Outdated
Comment thread javascript/ql/src/Statements/UseOfReturnlessFunction.ql
Co-Authored-By: Max Schaefer <54907921+max-schaefer@users.noreply.github.com>
@max-schaefer
Copy link
Copy Markdown
Contributor

@asger-semmle, anything else you'd like to see addressed?

@asger-semmle
Copy link
Copy Markdown
Contributor

@erik-krogh can you post a link to the evaluation? (based on an earlier commit is fine)

Otherwise LGTM

@erik-krogh
Copy link
Copy Markdown
Contributor Author

erik-krogh commented Nov 5, 2019

@erik-krogh can you post a link to the evaluation? (based on an earlier commit is fine)

Here is an old one.
https://git.semmle.com/erik/dist-compare-reports/tree/domitian.ti.semmle.com_1570435565541

Nope, that was wrong. My evaluation seems to have gotten lost.

@asger-semmle
Copy link
Copy Markdown
Contributor

Hm, yeah that used to happen to me a lot. Setting the branch name is crucial.

Anyways, I'm fine with landing it if you remember the evaluation as being good. @max-schaefer?

@max-schaefer
Copy link
Copy Markdown
Contributor

Yeah, should be fine.

@semmle-qlci semmle-qlci merged commit 794d5bd into github:master Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants