-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Data flow: Refactoring + performance improvements #2903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7a3f125
to
bdeebc0
Compare
6ebea81
to
17d491d
Compare
Differences jobs (all internal links): Analysis times on Mono and JDK have both been reduced by around 7 %. |
- Introduce `ReadTaintNode` and `TaintStoreNode` to simplify logic for taint getters and taint setters, respectively. - `nodeCandFwd2`: Restrict `stored` column after a read, based on what it might be before a store of the same field. - `nodeCand2`: Restrict `read` column (renamed from `stored`) after a store, based on what it might be after a read of the same field. - Move big step predicates into a `LocalFlowBigStep` module. - Define predicates by dispatch in `AccessPath[Front]` class. - `flowCandFwd0`: Restrict `apf` column after a read, as it should be able to match a Boolean `read` column from `nodeCand2`. - `flowFwd0`: Restrict columns `ap` and `apf` after a read, by introducing a `flowConsCandFwd` predicate (similar to what is done in the previous pruning steps). - `flowFwd0`: Restrict columns `ap` and `apf` after a store, by introducing a `flowConsCand` predicate (similar to what is done in the previous pruning steps).
17d491d
to
f935f5e
Compare
The C++ dist builds appear to be within the noise threshold, so no objections from me. |
A few removed results on JDK.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments; I haven't read all the way through yet.
@@ -702,22 +711,36 @@ private predicate argumentFlowsThrough( | |||
) | |||
} | |||
|
|||
pragma[noinline] | |||
private predicate readStoreNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use some qldoc.
nodeCand1(arg, config) and | ||
readStoreNode(call, arg, f1, config) and | ||
readStoreCand1(f1, unbind(config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to fold the remaining nodeCand1
and readStoreCand1
lines into readStoreNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that would give a bad join-order. readStoreNode
is only there to give a proper join-order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also using unbind
? Ignoring configs (which unbind
allows us to do) then all the joins are unary filters, so I don't see how that could be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using unbind
, i.e.
TReadStoreNode(DataFlowCall call, ArgumentNode arg, Content f1, Configuration config) {
exists(Content f2, Node out |
nodeCand1(arg, config) and
argumentValueFlowsThrough(call, arg, TContentSome(f1), TContentSome(f2), out) and
nodeCand1(out, unbind(config)) and
readStoreCand1(f1, unbind(config)) and
readStoreCand1(f2, unbind(config))
)
}
yields this pipeline:
[2020-03-18 13:01:44] (121s) Tuple counts for dom#DataFlowImpl::TReadStoreNode#ffff:
20715 ~0% {3} r1 = JOIN DataFlowImpl::readStoreCand1#ff AS L WITH DataFlowImplCommon::TContentSome#ff AS R ON FIRST 1 OUTPUT L.<1>, L.<0>, R.<1>
429111225 ~0% {4} r2 = JOIN r1 WITH DataFlowImpl::readStoreCand1#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>, r1.<2>
429111225 ~0% {4} r3 = JOIN r2 WITH DataFlowImplCommon::TContentSome#ff AS R ON FIRST 1 OUTPUT r2.<3>, R.<1>, r2.<1>, r2.<2>
235878 ~45% {5} r4 = JOIN r3 WITH DataFlowImplCommon::Cached::FlowThrough::Final::argumentValueFlowsThrough#fffff#origOrder_23014#join_rhs AS R ON FIRST 2 OUTPUT R.<3>, r3.<2>, r3.<3>, R.<2>, R.<4>
198355 ~40% {6} r5 = JOIN r4 WITH DataFlowImpl::nodeCand1#ff AS R ON FIRST 1 OUTPUT r4.<1>, r4.<2>, r4.<3>, r4.<0>, r4.<4>, R.<1>
198355 ~40% {6} r6 = SELECT r5 ON r5.<1> >= r5.<5>
198355 ~40% {6} r7 = SELECT r6 ON r6.<1> <= r6.<5>
198355 ~46% {5} r8 = SCAN r7 OUTPUT r7.<4>, r7.<0>, r7.<2>, r7.<3>, r7.<5>
160468 ~57% {6} r9 = JOIN r8 WITH DataFlowImpl::nodeCand1#ff AS R ON FIRST 1 OUTPUT r8.<1>, r8.<2>, r8.<3>, r8.<0>, r8.<4>, R.<1>
160468 ~57% {6} r10 = SELECT r9 ON r9.<5> >= r9.<4>
160468 ~57% {6} r11 = SELECT r10 ON r10.<5> <= r10.<4>
160468 ~52% {4} r12 = SCAN r11 OUTPUT r11.<1>, r11.<2>, r11.<0>, r11.<4>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can't be right, that RA snippet cannot match that QL snippet. It's clearly joining on configuration and there's only two unbind
in the RA vs 3 in the QL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it looks weird, but I just double checked, and the RA above is indeed what I get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that RA cannot have come from that QL, so something is mismatched somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried it again, same result. Would you mind trying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments, still not finished reading through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
Here are the updated |
The removed ZipSlip result looks like a FP, which is now successfully pruned by the additional cons candidate tracking. |
Let's leave the remaining join-order weirdness as-is, then we can deal with it once the cause is known. |
Highlights:
ReadTaintNode
andTaintStoreNode
to simplify logic for taintgetters and taint setters, respectively.
nodeCandFwd2
: Restrictstored
column after a read, based on what it mightbe before a store of the same field.
nodeCand2
: Restrictread
column (renamed fromstored
) after a store, basedon what it might be after a read of the same field.
LocalFlowBigStep
module.AccessPath[Front]
class.flowCandFwd0
: Restrictapf
column after a read, as it should be able to matcha Boolean
read
column fromnodeCand2
.flowFwd0
: Restrict columnsap
andapf
after a read, by introducing aflowConsCandFwd
predicate (similar to what is done in the previous pruning steps).flowFwd0
: Restrict columnsap
andapf
after a store, by introducing aflowConsCand
predicate (similar to what is done in the previous pruning steps).