Skip to content

Conversation

MathiasVP
Copy link
Contributor

Fixes https://github.com/github/codeql-c-analysis-team/issues/257.

Two unfortunate join-order problems were introduced in #5431. The first one was observed by Jonas in https://github.com/github/codeql-c-analysis-team/issues/257. It arises due to this disjunction in exprReleases:

  exists(Function f, int arg |
    // `e` is a call to a function that releases one of it's parameters,
    // and `released` is the corresponding argument
    (
      e.(FunctionCall).getTarget() = f or
      e.(FunctionCall).getTarget().(MemberFunction).getAnOverridingFunction+() = f
    ) and
    e.(FunctionCall).getArgument(arg) = released and
    exprReleases(_,
      exprOrDereference(globalValueNumber(f.getParameter(arg).getAnAccess()).getAnExpr()), kind)
  )

(see https://github.com/github/codeql-c-analysis-team/issues/257 for the tuple counts from this disjunction).

The second one is of a similar nature, but apparently not as evil (as this disjunction contributes fewer tuples overall I guess):

exists(Function f, ThisExpr innerThis |
  // `e` is a call to a method that releases `this`, and `released`
  // is the object that is called
  (
    e.(FunctionCall).getTarget() = f or
    e.(FunctionCall).getTarget().(MemberFunction).getAnOverridingFunction+() = f
  ) and
  e.(FunctionCall).getQualifier() = exprOrDereference(released) and
  innerThis.getEnclosingFunction() = f and
  exprReleases(_, globalValueNumber(innerThis).getAnExpr(), kind)
)

Nevertheless it has a similar bad effect on the tuple counts:

Tuple counts for AV Rule 79::exprReleases#fff#join_rhs#1/3@fe1d51:
  1        ~0%         {1} r1 = CONSTANT(unique int)[85]
  
  116286   ~0%         {1} r2 = JOIN r1 WITH exprs_10#join_rhs ON FIRST 1 OUTPUT Rhs.1
  
  116281   ~3%         {2} r3 = JOIN r2 WITH Expr::Expr::getEnclosingFunction_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.0
  
  401360   ~0%         {2} r4 = JOIN r3 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg1', Lhs.1
  
  46723    ~0%         {2} r5 = JOIN r3 WITH #MemberFunction::MemberFunction::getAnOverridingFunction_dispred#ffPlus#swapped ON FIRST 1 OUTPUT Rhs.1, Lhs.1
  61134    ~0%         {2} r6 = JOIN r5 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg1', Lhs.1
  
  462494   ~0%         {2} r7 = r4 UNION r6
  380343   ~0%         {3} r8 = JOIN r7 WITH Call::Call::getQualifier_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0 'arg1'
  382135   ~1%         {3} r9 = JOIN r8 WITH AV Rule 79::exprOrDereference#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.2 'arg1', Rhs.1 'arg2'
  460767   ~194%       {3} r10 = JOIN r9 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg1', Lhs.2 'arg2'
  46583784 ~12977%     {3} r11 = JOIN r10 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Lhs.2 'arg2'
                        return r11

The fix is to delay the joins involving GVN as much as possible. Apparently, this prevents these joins from materializing altogether.

Before:

rec AV Rule 79::exprReleases#fff(int e, int released, string kind) :-     [[[ BASE CASE ]]]
    {3} r1 = JOIN Call::Call::getTarget_dispred#ff_10#join_rhs WITH Declaration::Declaration::hasGlobalOrStdName_dispred#bf ON FIRST 1 OUTPUT Lhs.1 'e', Lhs.0, Rhs.1

    {3} r2 = SELECT r1 ON In.2 = "fclose"
    {3} r3 = SCAN r2 OUTPUT In.0 'e', 0, "file"
    {3} r4 = JOIN r3 WITH Call::Call::getArgument_dispred#fff ON FIRST 2 OUTPUT Lhs.0 'e', Rhs.2 'released', "file"

    {3} r5 = SELECT r1 ON In.2 = "close"
    {3} r6 = SCAN r5 OUTPUT In.0 'e', 0, "file descriptor"
    {3} r7 = JOIN r6 WITH Call::Call::getArgument_dispred#fff ON FIRST 2 OUTPUT Lhs.0 'e', Rhs.2 'released', "file descriptor"

    {2} r8 = CONSTANT(string, unique string)["delete","new"]

    {2} r9 = CONSTANT(string, unique string)["free","malloc"]

    {2} r10 = CONSTANT(string, unique string)["delete[]","new[]"]

    {2} r11 = r9 UNION r10
    {2} r12 = r8 UNION r11
    {3} r13 = JOIN r12 WITH NewDelete::freeExpr#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Rhs.2 'released', Lhs.1 'kind'

    {3} r14 = r7 UNION r13
    {3} r15 = r4 UNION r14
    return r15

    [[[ SEMINAIVE VARIANT - STANDARD ORDER]]]
    {2} r16 = SCAN AV Rule 79::exprReleases#fff#prev_delta OUTPUT In.1, In.2 'kind'
    // This predicate was evil
    {3} r17 = JOIN r16 WITH AV Rule 79::exprReleases#fff#join_rhs#1 ON FIRST 1 OUTPUT Rhs.1 'e', Rhs.2 'released', Lhs.1 'kind'

    // This predicate was evil
    {3} r18 = JOIN r16 WITH AV Rule 79::exprReleases#fff#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Rhs.2 'released', Lhs.1 'kind'

    {3} r19 = r17 UNION r18
    {3} r20 = r19 AND NOT AV Rule 79::exprReleases#fff#prev(Lhs.0 'e', Lhs.1 'released', Lhs.2 'kind')
    return r20

Now the two evil predicates are gone and the join order is better.

After:

rec AV Rule 79::exprReleases#fff(int e, int released, string kind) :-     [[[ BASE CASE ]]]
    {3} r1 = JOIN Call::Call::getTarget_dispred#ff_10#join_rhs WITH Declaration::Declaration::hasGlobalOrStdName_dispred#bf ON FIRST 1 OUTPUT Lhs.1 'e', Lhs.0, Rhs.1

    {3} r2 = SELECT r1 ON In.2 = "fclose"
    {3} r3 = SCAN r2 OUTPUT In.0 'e', 0, "file"
    {3} r4 = JOIN r3 WITH Call::Call::getArgument_dispred#fff ON FIRST 2 OUTPUT Lhs.0 'e', Rhs.2 'released', "file"

    {3} r5 = SELECT r1 ON In.2 = "close"
    {3} r6 = SCAN r5 OUTPUT In.0 'e', 0, "file descriptor"
    {3} r7 = JOIN r6 WITH Call::Call::getArgument_dispred#fff ON FIRST 2 OUTPUT Lhs.0 'e', Rhs.2 'released', "file descriptor"

    {2} r8 = CONSTANT(string, unique string)["delete","new"]

    {2} r9 = CONSTANT(string, unique string)["free","malloc"]

    {2} r10 = CONSTANT(string, unique string)["delete[]","new[]"]

    {2} r11 = r9 UNION r10
    {2} r12 = r8 UNION r11
    {3} r13 = JOIN r12 WITH NewDelete::freeExpr#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Rhs.2 'released', Lhs.1 'kind'

    {3} r14 = r7 UNION r13
    {3} r15 = r4 UNION r14
    return r15

    [[[ SEMINAIVE VARIANT - STANDARD ORDER]]]
    {2} r16 = SCAN AV Rule 79::exprReleases#fff#prev_delta OUTPUT In.1, In.2 'kind'

    {2} r17 = JOIN r16 WITH AV Rule 79::exprOrDereference#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'released', Lhs.1 'kind'

    // Now GVN is joined with after the recursive call
    {2} r18 = JOIN r17 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'

    {2} r19 = JOIN r18 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'

    {2} r20 = JOIN r19 WITH Access::Access::getTarget_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'

    {3} r21 = JOIN r20 WITH Function::Function::getParameter_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind', Rhs.2

    {3} r22 = JOIN r21 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Lhs.2, Lhs.1 'kind'

    {3} r23 = JOIN r21 WITH #MemberFunction::MemberFunction::getAnOverridingFunction_dispred#ffPlus#swapped ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind', Lhs.2
    {3} r24 = JOIN r23 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Lhs.2, Lhs.1 'kind'

    {3} r25 = r22 UNION r24
    {3} r26 = JOIN r25 WITH Call::Call::getArgument_dispred#fff ON FIRST 2 OUTPUT Lhs.2 'kind', Lhs.0 'e', Rhs.2 'released'

    // Similarly here
    {2} r27 = JOIN r16 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'

    {3} r28 = JOIN r27 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff ON FIRST 1 OUTPUT Rhs.1, 85, Lhs.1 'kind'

    {2} r29 = JOIN r28 WITH exprs ON FIRST 2 OUTPUT Lhs.0, Lhs.2 'kind'

    {2} r30 = JOIN r29 WITH Expr::Expr::getEnclosingFunction_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'

    {2} r31 = JOIN r30 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Lhs.1 'kind'

    {2} r32 = JOIN r30 WITH #MemberFunction::MemberFunction::getAnOverridingFunction_dispred#ffPlus#swapped ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'
    {2} r33 = JOIN r32 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Lhs.1 'kind'

    {2} r34 = r31 UNION r33
    {3} r35 = JOIN r34 WITH Call::Call::getQualifier_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind', Lhs.0 'e'
    {3} r36 = JOIN r35 WITH AV Rule 79::exprOrDe.3 'f', Rhs.2 'acquireLine'
    return r6

@MathiasVP MathiasVP added the C++ label Mar 25, 2021
@MathiasVP MathiasVP requested a review from jbj March 25, 2021 11:14
@MathiasVP MathiasVP requested a review from a team as a code owner March 25, 2021 11:14
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Mar 25, 2021
@MathiasVP
Copy link
Contributor Author

The CI failure on this PR isn't related to the changes here. See https://github.slack.com/archives/CP0LHP150/p1616610290033000.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM, and if you're happy with the tuple counts, then I'm happy too.

@jbj jbj merged commit bc9682c into github:main Mar 25, 2021
@MathiasVP
Copy link
Contributor Author

The code LGTM, and if you're happy with the tuple counts, then I'm happy too.

Oh, right. For posterity, here's the actual tuple counts on the offending project (on an arbitrarily picked iteration):

Starting to evaluate predicate AV Rule 79::exprReleases#fff/3@i3#6f35bw (iteration 3)
Tuple counts for AV Rule 79::exprReleases#fff/3@i3#6f35bw:
    688     ~0%       {2} r1 = SCAN AV Rule 79::exprReleases#fff#prev_delta OUTPUT In.1, In.2 'kind'
    
    688     ~0%       {2} r2 = JOIN r1 WITH AV Rule 79::exprOrDereference#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'released', Lhs.1 'kind'
    
    806     ~4%       {2} r3 = JOIN r2 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'
    
    3419    ~71%      {2} r4 = JOIN r3 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'
    
    2838    ~415%     {2} r5 = JOIN r4 WITH Access::Access::getTarget_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'
    
    316     ~45%      {3} r6 = JOIN r5 WITH Function::Function::getParameter_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind', Rhs.2
    
    449     ~55%      {3} r7 = JOIN r6 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Lhs.2, Lhs.1 'kind'
    
    0       ~0%       {3} r8 = JOIN r6 WITH #MemberFunction::MemberFunction::getAnOverridingFunction_dispred#ffPlus#swapped ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind', Lhs.2
    0       ~0%       {3} r9 = JOIN r8 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Lhs.2, Lhs.1 'kind'
    
    449     ~55%      {3} r10 = r7 UNION r9
    447     ~57%      {3} r11 = JOIN r10 WITH Call::Call::getArgument_dispred#fff ON FIRST 2 OUTPUT Lhs.2 'kind', Lhs.0 'e', Rhs.2 'released'
    
    806     ~4%       {2} r12 = JOIN r1 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'
    
    3419    ~84%      {3} r13 = JOIN r12 WITH ASTValueNumbering::GVN::getAnExpr_dispred#ff ON FIRST 1 OUTPUT Rhs.1, 85, Lhs.1 'kind'
    
    12      ~0%       {2} r14 = JOIN r13 WITH exprs ON FIRST 2 OUTPUT Lhs.0, Lhs.2 'kind'
    
    12      ~199%     {2} r15 = JOIN r14 WITH Expr::Expr::getEnclosingFunction_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'
    
    12      ~199%     {2} r16 = JOIN r15 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Lhs.1 'kind'
    
    0       ~0%       {2} r17 = JOIN r15 WITH #MemberFunction::MemberFunction::getAnOverridingFunction_dispred#ffPlus#swapped ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind'
    0       ~0%       {2} r18 = JOIN r17 WITH Call::Call::getTarget_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'e', Lhs.1 'kind'
    
    12      ~199%     {2} r19 = r16 UNION r18
    12      ~199%     {3} r20 = JOIN r19 WITH Call::Call::getQualifier_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'kind', Lhs.0 'e'
    12      ~199%     {3} r21 = JOIN r20 WITH AV Rule 79::exprOrDereference#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'kind', Lhs.2 'e', Rhs.1 'released'
    
    459     ~58%      {3} r22 = r11 UNION r21
    440     ~52%      {3} r23 = r22 AND NOT AV Rule 79::exprReleases#fff#prev(Lhs.1 'e', Lhs.2 'released', Lhs.0 'kind')
    440     ~55%      {3} r24 = SCAN r23 OUTPUT In.1 'e', In.2 'released', In.0 'kind'
                      return r24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants