JavaScript: Avoid duplicate source/sink nodes due to flow labels.#2201
Conversation
| or | ||
| MkSinkNode(DataFlow::Node nd, DataFlow::Configuration cfg) { mkSinkNode(nd, cfg, _) } | ||
|
|
||
| private predicate mkSourceNode(DataFlow::Node nd, DataFlow::Configuration cfg, PathSummary summary) { |
There was a problem hiding this comment.
Qldoc please, it's not really obvious what this does
There was a problem hiding this comment.
It's also badly named; will fix.
| ) | ||
| } | ||
|
|
||
| private predicate mkSinkNode(DataFlow::Node nd, DataFlow::Configuration cfg, PathSummary summary) { |
| nd = any(AddExpr add).getAnOperand().flow() | ||
| or | ||
| // Skip mid node immediately following a source node | ||
| exists(MkSourceNode(nd, cfg)) |
There was a problem hiding this comment.
This will be really confusing when a path from one source steps through another source, as the path will skip over the intermediate source. This could happen when using all parameters of a certain name as taint source.
A similar thing can occur with the sink node.
Can't we have a separate mechanism for contracting the edges going out of source nodes, and going into sink nodes?
There was a problem hiding this comment.
That is a very good point. I was going to hide the edges (not the nodes) in the follow-up PR to do with path exploration; I'll just add that commit to this PR instead.
There was a problem hiding this comment.
Hmm. Will I now be able to write a query like below?
/**
* @kind problem
*/
import javascript
import semmle.javascript.security.dataflow.CommandInjection::CommandInjection
import DataFlow::PathGraph
from SinkPathNode n
select n, "There is some flow to this node from a source"
| result = getASuccessor(nd) | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Is the diff bad, or did you include two qldoc starters here?
/**
/**
There was a problem hiding this comment.
The latter; will fix.
e26024e to
776aa61
Compare
|
The new commits LGTM. It's a bit unfortunate that it wasn't part of the evaluation, though. Could you make sure it's exercised in at least one evaluation at some point (possibly together with the next PR as per your original plan). |
|
I'll redo the whole evaluation. I'd like to get as much validation as possible, considering how many mistakes I made trying to get the right edges suppressed in that last commit 😞 |
This makes it more obvious to the evaluator that it is a good predicate to pick as a sentinel, and in practice we mostly just have one configuration in scope anyway.
…AHiddenSuccessor` into top-level predicates.
They should really only be hidden for display purposes.
Instead of skipping over initial and final nodes, we now introduce edges from source and to sink nodes that circumvent these nodes entirely.
776aa61 to
b42026a
Compare
|
A re-run (internal link) on nightly.slugs shows no impact on results or performance. |
|
Same story for default.slugs, so let's give this a try. |
Flow labels form part of the identity of a
PathNode, leading to duplicate sources and/or sinks in queries that use them (though some of our tools have settings to collapse them).This PR proposes to fix this by introducing synthetic source and sink nodes that do not incorporate information about flow labels, and have the real source/sink as their successor/predecessor. The real source and sink are hidden in the path viewer to avoid having confusing edges between identical-looking nodes.
A preliminary dist-compare on nightly.slugs shows no effects on performance or results, but I'm running a full dist-compare just in case.
Another thing I'd like to point out is that this PR turns paths of length zero into paths of length one: if a source node is also a sink node, we still introduce both a synthetic source and a synthetic sink with an edge between them (skipping over the original source-cum-sink node). This is less nice than before, but I don't think it happens that much in practice, and avoiding it is a bit fiddly. I'm happy to be convinced otherwise, though.
Commit-by-commit review is strongly encouraged.