C++: Simplify buffer.qll repair#11042
C++: Simplify buffer.qll repair#11042MathiasVP merged 3 commits intogithub:mathiasvp/replace-ast-with-ir-use-usedataflowfrom
buffer.qll repair#11042Conversation
… the flow from 'x' to 'x++', which makes the whole library a lot simpler.
large TC in 'localFlowToExpr':
```
Evaluated relational algebra for predicate Buffer#61e3d199::localFlowStepToExpr#2#ff@0a49913i with tuple counts:
4713946 ~0% {2} r1 = SCAN DataFlowUtil#47741e1f::simpleLocalFlowStep#2#ff OUTPUT In.1, In.0
40897385 ~46% {2} r2 = JOIN boundedFastTC:Buffer#61e3d199::localFlowToExprStep#2#ff_10#higher_order_body:DataFlowUtil#47741e1f::simpleLocalFlowStep#2#ff_0#higher_order_body WITH DataFlowUtil#47741e1f::simpleLocalFlowStep#2#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1
45611331 ~43% {2} r3 = r1 UNION r2
3376553 ~14% {2} r4 = JOIN r3 WITH DataFlowUtil#47741e1f::ExprNode::getExpr#0#dispred#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1
return r4
```
After this commit the tuple counts looks like:
```
Evaluated relational algebra for predicate Buffer#61e3d199::localFlowStepToExpr#2#ff@8cc38x5k on iteration 2 running pipeline standard with tuple counts:
51367 ~3% {2} r1 = JOIN Buffer#61e3d199::getBufferSize0#1#f#prev_delta WITH DataFlowUtil#47741e1f::ExprNode::getExpr#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.0
124933 ~18% {2} r2 = JOIN r1 WITH #Buffer#61e3d199::localFlowToExprStep#2Plus#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1
176300 ~17% {2} r3 = r1 UNION r2
184685 ~22% {2} r4 = JOIN r3 WITH DataFlowUtil#47741e1f::simpleLocalFlowStep#2#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1
56646 ~47% {2} r5 = JOIN r4 WITH DataFlowUtil#47741e1f::ExprNode::getExpr#0#dispred#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1
44635 ~16% {2} r6 = r5 AND NOT Buffer#61e3d199::localFlowStepToExpr#2#ff#prev(Lhs.0, Lhs.1)
return r6
```
2a05855 to
1b50168
Compare
|
Because only the delta is looked at in the void test15(bool b)
{
char in_buffer1[8000];
char in_buffer2[9000];
char out_buffer[9000];
char *in_buffer = b ? in_buffer1 : in_buffer2;
bcopy(in_buffer, out_buffer, sizeof(out_buffer)); // GOOD: whether we're reading too much depends on `b`, whose value we do not know.
}I think |
|
Thanks for the testcase, @jketema! Indeed, we don't want to report anything in that one. After I noticed this testcase: union myUnion {
char small[10];
char large[100];
};
void myUnionTest()
{
myUnion mu;
// ...
memset(&(mu.small), 0, 200); // BADI realized that what we're after isn't actually 30f1547 removes the |
|
I would have expected that the union case was already handled by: exists(Union bufferType |
// buffer is the address of a union member; in this case, we
// take the size of the union itself rather the union member, since
// it's usually OK to access that amount (e.g. clearing with memset).
why = bufferExpr.(AddressOfExpr).getAddressable() and
bufferType.getAMemberVariable() = why and
result = bufferType.getSize()
)Why isn't this sufficient? |
18802a2
into
github:mathiasvp/replace-ast-with-ir-use-usedataflow
#10833 prepared the
buffer.qlllibrary for use-use flow. In the end, the PR ended up being a lot more complicated than I had anticipated.After thinking a bit about this, I reached the conclusion that this is all because of the added use of
asIndirectExprto convert an expression to aDataFlow::Nodein the buffer library. The complexity comes from the fact that, ifnis the dataflow node tracking*p, then there's flow from that node to the indirect node tracking*(++p). So by usingnode.asIndirectExprin the buffer library we would have dataflow through operations that incremented a pointer.This required a bunch of complexity: It required the need for
localFlowIncrStep(to track how much the pointer was incremented), and because this predicate had an integer column to count how much the pointer was incremented, the library could suddenly diverge on loops like:because we'd have dataflow from the indirect node of
pto the indirect node of++p, andlocalFlowIncrStepwould loop forever incrementing the integer column. To fix this, I disabled flow through back-edges inlocalFlowNotIncrStep.All in all, this was kinda messy. @jketema also reported that he was seeing multiple possible buffer sizes returned by the
getBufferSizepredicate, which is certainly not intended.I think this PR is a much safer and simpler way to approach this. By not using
asIndirectExprwe avoid this extra flow, and we don't need to block flow through back-edges since we don't have thelocalFlowIncrSteppredicate anymore. The end result is a much simpler implementation 🎉. It might also solve multiple sizes being reported. @jketema let's discuss this on Monday.The PR is split into two commits. The first commit contains all of the fixes explained above, and the second commit is purely a performance optimization (see the commit message for tuple counts. Note that the number of tuples in the commit message changes because the predicate is now part of the recursion. The number of tuples in the main predicate at the end is unchanged, though).