Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous CFA generation vol 2 #46

Closed
as3810t opened this issue Sep 17, 2020 · 13 comments · Fixed by #48
Closed

Erroneous CFA generation vol 2 #46

as3810t opened this issue Sep 17, 2020 · 13 comments · Fixed by #48
Assignees
Labels
bug Something isn't working

Comments

@as3810t
Copy link

as3810t commented Sep 17, 2020

For input problem:

extern void abort(void); 
void reach_error(){}

extern char __VERIFIER_nondet_char(void);
void __VERIFIER_assert(int cond) {
  if (!(cond)) {
    ERROR: {reach_error();abort();}
  }
  return;
}
signed char gcd_test(signed char a, signed char b)
{
    signed char t;

    if (a < 0) a = -a;
    if (b < 0) b = -b;

    while (b != 0) {
        t = b;
        b = a % b;
        a = t;
    }
    return a;
}


int main()
{
    signed char x = __VERIFIER_nondet_char();
    signed char y = __VERIFIER_nondet_char();
    signed char g;

    if (y > 0 && x % y == 0) {
        g = gcd_test(x, y);

        __VERIFIER_assert(g == y);
    }

    return 0;
}

GAZER generates an erroneous CFA with command (that uses GAZER version in #39 ):

gazer-theta --domain PRED_CART --refinement NWT_IT_WP <file>

The slip-up happens during the translation of the variable swapping

t = b;
b = a % b;
a = t;

The corresponding generated CFA is:

loc24 -> loc18 {
        main_main_bb13__12_i_inlined0 := (((main_main_bb13__1_i_inlined0 bv_sign_extend bv[32]) bvsrem (main_main_bb13__12_i_inlined0 bv_sign_extend bv[32])))[8:0]
        main_main_bb13__1_i_inlined0 := main_main_bb13__12_i_inlined0
}

Which corresponds to:

b = a % b;
a = b (!!!!!!!!!!);
@as3810t as3810t added the bug Something isn't working label Sep 17, 2020
@as3810t
Copy link
Author

as3810t commented Sep 17, 2020

I think this is a serious issue, so @sallaigy @radl97 can one of you take a look?

@sallaigy
Copy link
Member

sallaigy commented Sep 17, 2020

What is the output without optimizations (-no-optimize) and without variable elimination (-elim-vars=off)? What is the output if the source file only contains the code snippet which is translated erroneously?

@as3810t
Copy link
Author

as3810t commented Sep 17, 2020

With passing both -no-optimize and -elim-vars=off, the result is the same:

    loc22 -> loc23 {
        main_main_bb20_tmp24_inlined0 := (main_main_bb20__1_i_inlined0 bv_sign_extend bv[32])
        main_main_bb20_tmp25_inlined0 := (main_main_bb20__12_i_inlined0 bv_sign_extend bv[32])
        main_main_bb20_tmp26_inlined0 := (main_main_bb20_tmp24_inlined0 bvsrem main_main_bb20_tmp25_inlined0)
        main_main_bb20_tmp27_inlined0 := (main_main_bb20_tmp26_inlined0)[8:0]
    }

    loc23 -> loc24 {
    }

    loc24 -> loc18 {
        main_main_bb20__12_i_inlined0 := main_main_bb20_tmp27_inlined0
        main_main_bb20__1_i_inlined0 := main_main_bb20__12_i_inlined0
    }

@as3810t
Copy link
Author

as3810t commented Sep 17, 2020

For a smaller example, the generated code is correct (also when using if instead of while). I tried it with -no-optimize and -elim-vars=off and without them as well.

signed char x = __VERIFIER_nondet_char();
signed char y = __VERIFIER_nondet_char();
signed char g;

while(x != y) {
      g = y;
      y = x % y;
      x = g;

      __VERIFIER_assert(x != y);
}

The generated excerpt:

loc18 -> loc10 {
        main_main_bb3__01_inlined0 := main_main_bb3__0_inlined0
        main_main_bb3__0_inlined0 := main_main_bb3_tmp11_inlined0
}

@sallaigy
Copy link
Member

This is a PHI assignment. Are you confident that the assignment on the other incoming branch does not make this code correct? Does the verification produce an incorrect result (either with theta or the BMC algorithm)?

@as3810t
Copy link
Author

as3810t commented Sep 17, 2020

The verification produces an incorrect result for the problem in the first comment with gazer-theta, but a correct result with gazer-bmc.

In the example the error location should not be reachable, however, gazer-theta produces a trace, which claims that the return value of the gcd function is 0, which can be traced back to that incorrectly handled swap:

Error trace:
------------
#0 in function main():
  call __VERIFIER_nondet_char() returned #5Dbv8          at 29:21
  x := 93       (0b01011101)     at 29:21
  call __VERIFIER_nondet_char() returned #3bv8           at 30:21
  y := 3        (0b00000011)     at 30:21
#1 in function gcd_test(#5Dbv8,#3bv8):
  a := 93       (0b01011101)     at 29:21
  b := 3        (0b00000011)     at 30:21
  a := 93       (0b01011101)     at 15:9
  b := 3        (0b00000011)     at 16:9
  a := 93       (0b01011101)
  b := 3        (0b00000011)
  t := 3        (0b00000011)
  b := 0        (0b00000000)     at 20:13
  a := 3        (0b00000011)
  a := 0        (0b00000000)
  b := 0        (0b00000000)
  a := 0        (0b00000000)
  a := 0        (0b00000000)
  return #0bv8
#2 in function main():
  g := 0        (0b00000000)
#3 in function __VERIFIER_assert(#0bv32):
  cond := 0     (0b00000000000000000000000000000000)     at 36:29

@radl97
Copy link
Contributor

radl97 commented Sep 17, 2020

Can you run theta-cfa -pipeline test.c?
This should create a digraph for all the different loops and functions. I think only gcd and it's loop is important here. (it uses the CFA representation by Gazer, not the Theta one)

@as3810t
Copy link
Author

as3810t commented Sep 17, 2020

@radl97 The digraph can be found here

@sallaigy
Copy link
Member

sallaigy commented Sep 18, 2020

I looked at the generated automata and the original LLVM module. The generated code seems to be a faithful representation of the code coming from LLVM, which also looks correct (and BMC provides a correct result on it).

%.12.i contains the srem result from the previous iteration and the eventual return value %.1.lcssa is set to contain this value.

bb13:                                             ; preds = %bb15, %bb8
  %.12.i = phi i8 [ %.01.i, %bb8 ], [ %tmp19, %bb15 ], !dbg !28
  %.1.i = phi i8 [ %spec.select.i, %bb8 ], [ %.12.i, %bb15 ], !dbg !28
  call void @llvm.dbg.value(metadata i8 %.1.i, metadata !33, metadata !DIExpression()) #4, !dbg !28
  call void @llvm.dbg.value(metadata i8 %.12.i, metadata !34, metadata !DIExpression()) #4, !dbg !28
  %tmp14 = icmp eq i8 %.12.i, 0, !dbg !46                                                    ; b != 0
  br i1 %tmp14, label %gcd_test.exit, label %bb15, !dbg !45

bb15:                                             ; preds = %bb13
  call void @llvm.dbg.value(metadata i8 %.12.i, metadata !35, metadata !DIExpression()) #4, !dbg !28 ; t = b
  %tmp16 = sext i8 %.1.i to i32, !dbg !47
  %tmp17 = sext i8 %.12.i to i32, !dbg !49
  %tmp18 = srem i32 %tmp16, %tmp17, !dbg !50
  %tmp19 = trunc i32 %tmp18 to i8, !dbg !47
  call void @llvm.dbg.value(metadata i8 %tmp19, metadata !34, metadata !DIExpression()) #4, !dbg !28
  call void @llvm.dbg.value(metadata i8 %.12.i, metadata !33, metadata !DIExpression()) #4, !dbg !28
  br label %bb13, !dbg !45, !llvm.loop !51

gcd_test.exit:                                    ; preds = %bb13
  %.1.i.lcssa = phi i8 [ %.1.i, %bb13 ], !dbg !28

@as3810t Unless you are able to pinpoint the semantic difference yourself, I would recommend to look into the backend algorithm executed by theta.

@radl97
Copy link
Contributor

radl97 commented Sep 20, 2020

@as3810t I think you sent the wrong file.
Note that there are multiple generated files, one for each loop and each function.

The interesting files should be .gcd.dot and .gcd.loop.dot in this context.

Oh, I think gazer-cfa does not exactly work like that... :/ Sorry

@as3810t
Copy link
Author

as3810t commented Sep 20, 2020

I found the following cases, where I think the outputted CFA is semantically different than the input C program. In each case, I generated the CFA with gazer-theta <file> --model-only -o <outfile>, without any additional parameters. The textual CFA is the exact output of gazer-cfa, while on the image I did some modification to make it more readable:

  • I text replaced (some of) the variable names to match their C source counterpart
  • I deleted nodes that had only one skip statement on their only outgoing edge.
Filename Source CFA (text) CFA (img) Notes
gcd_1.c .c .cfa .png The swap in lines 19-21 in gcd_1.c does not match the operations on transition between loc24 -> loc21. To verify let x = 93, y = 3. In this scenario, the assertion passes in the C file (g = 3), but fails in the CFA (g = 0).
num_conversion_2.c .c .cfa .png The cycles (lines 21-27 and loc15-loc18) do not match, as in the CFA, the order of the assignment on y and on c is swapped. To verify let x = 9. In this scenario, the assertion passes in the C file (y = 9), but fails in the CFA (y = 8).
sum02-2.c .c .cfa .png The cycles (lines 21-23 and loc12-loc18) do not match, as in the CFA, the order of the assignment on i and on sn is swapped. To verify let n = 0. In this scenario, the assertion passes in the C file (sn = 0), but fails in the CFA (sn = 1).

All cases have in common:

  • gazer-theta verifies incorrectly (with PRED_CART and NWT_IT_WP) and produces a counterexample that fits the input CFA (manually checked every one of them)
  • gazer-bmc verifies correctly (or bound reached)

@radl97
Copy link
Contributor

radl97 commented Sep 21, 2020

Thanks for the detailed report.

Gazer's CFA assumes some kind of SSA-form, but uses more assumptions right now, I think. (A strict SSA would need Phi nodes. Now the only assumption is (should be) that one loop or one function assigns only one value per path for a given variable)

This is handled by "PhiInputs". At the start of the loop, when a variable is changed in the loop. There could be two assignments where the value could come from: from before the loop or inside the loop in a previous iteration.
The loop is transformed to a function with multiple inputs or outputs.

So gcd_loop(a,b) returns gcd_loop(b,a%b). I think the problem lies in the phi input assignments affecting each other (when building up these in Theta).

Input assignments are part of the inlining process when instead of "calling", one assigns the input variables with the given value.

This could be in some inlining mechanism or in RecursiveToCyclicCfa.cpp lines 192- and 257-

@radl97
Copy link
Contributor

radl97 commented Sep 21, 2020

The solution could be a bit hard. Assigning in two stages (a' := b; b' := a%b; a:=a'; b:=b';) could be a solution.

@radl97 radl97 self-assigned this Sep 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants