JS: Precise data-flow for returns from async functions#4019
JS: Precise data-flow for returns from async functions#4019codeql-ci merged 15 commits intogithub:mainfrom
Conversation
javascript/ql/src/semmle/javascript/dataflow/internal/FlowSteps.qll
Outdated
Show resolved
Hide resolved
javascript/ql/src/semmle/javascript/dataflow/internal/FlowSteps.qll
Outdated
Show resolved
Hide resolved
asgerf
left a comment
There was a problem hiding this comment.
A few comments so far. I gave up on commit-by-commit review after I noticed my comments becoming obsolete by later commits. Let me know when you want another round of reviews.
| function.getLocation().hasLocationInfo(filepath, startline, startcolumn, endline, endcolumn) | ||
| } | ||
|
|
||
| override BasicBlock getBasicBlock() { result = function.(ExprOrStmt).getBasicBlock() } |
There was a problem hiding this comment.
This basic block belongs to the enclosing function. How about function.getExit().getBasicBlock()?
There was a problem hiding this comment.
Getting the basic-block of the enclosing function is consistent with how ExceptionalFunctionReturnNode works.
If we change it, then we should change both.
I think the current behavior is fine.
There was a problem hiding this comment.
Oof. I still think the current behavior is completely wrong, but I understand if you don't want to fix it in this PR. Opened https://github.com/github/codeql-javascript-team/issues/214.
javascript/ql/src/semmle/javascript/dataflow/internal/FlowSteps.qll
Outdated
Show resolved
Hide resolved
| */ | ||
| cached | ||
| predicate returnStep(DataFlow::Node pred, DataFlow::Node succ) { | ||
| // Note: FlowSteps::CachedSteps::returnStep/2 has copy-paste children |
There was a problem hiding this comment.
This comment was quite confusing when doing commit-by-commit review since the function doesn't appear similar to this at all until later in history 😕
I think we can eliminate the duplication by introducing a shared predicate
pragma[inline]
predicate returnStepWithAsyncFlag(DataFlow::Node pred, DataFlow::Node succ, boolean isAsync) {
// ... bind isAsync to the async-ness of the function being returned from ...
}and then call that from returnStep/2:
returnStepWithAsyncFlag(pred, succ, false)and from the Promises.qll library
returnStepWithAsyncFlag(pred, succ, true)There was a problem hiding this comment.
In the end the copy-paste was with the exception part of the predicate.
I've shared the implementation of that part.
There was a problem hiding this comment.
There is still a slight but of almost copy-pasted code between the predicate and DataFlow::localFlowStep, but I think that is OK.
| await throwAsync(source()); | ||
| } catch (e) { | ||
| sink(e); // NOT OK | ||
| sink(e); // NOT OK - but not flagged |
There was a problem hiding this comment.
Why do we miss this now? Are we missing a load step from the await operand to the catch?
There was a problem hiding this comment.
I think it is because of the mix of calls and property store/read.
The flow is: flow into throwAsync() -> store in pseudo-property -> return from throwAsync() -> read pseudo-property -> local flow to e.
There was a problem hiding this comment.
And the new code in loadStep/storeStep doesn't catch it, because they only handle ordinary returns and not exceptional returns.
I thought I had opened a draft PR. Maybe the PR got un-drafted when I accepted the review comments from I had meant to do a rebase before undrafting the PR, sorry about that. Thanks for the comments. |
|
A quick smoke test looks good in terms of performance. @asgerf ready for next round. |
The new results look like TPs. The missed result is for the same reason as this. |
asgerf
left a comment
There was a problem hiding this comment.
Thanks, mostly LGTM now but we'll obviously need to fix gecko.
Also, do you see a way to recover the lost result? That was a really good result IMO and I'd be sad to lose it.
javascript/ql/src/semmle/javascript/dataflow/internal/FlowSteps.qll
Outdated
Show resolved
Hide resolved
| function.getLocation().hasLocationInfo(filepath, startline, startcolumn, endline, endcolumn) | ||
| } | ||
|
|
||
| override BasicBlock getBasicBlock() { result = function.(ExprOrStmt).getBasicBlock() } |
There was a problem hiding this comment.
Oof. I still think the current behavior is completely wrong, but I understand if you don't want to fix it in this PR. Opened https://github.com/github/codeql-javascript-team/issues/214.
FWIW, gecko-dev has been hovering at the edge of timing out for me for a while now. In particular I've seen it time out on master on a few occasions (cf #3980). |
I managed to recover both the missed result from the evaluation and our test-case by adding a summary step for exceptions in immediately awaited async function calls. |
An evaluation still looks good. |
asgerf
left a comment
There was a problem hiding this comment.
Sorry, this fell off my radar for a bit. LGTM 👍
Previously async function were modeled imprecisely, such that there was flow from
footobarin the below.With this PR the precision is better.
The pseudo-properties from the Promise model is used the achieve this.
A new
FunctionReturnNodeis introduced, and this node has to beSourceNodebecause we need to store properties on it (the pseudo-properties).