Reachability benchmarks with overflow #1166

hernanponcedeleon · 2020-10-09T10:44:03Z

I recently found many benchmarks from the reachability category which contain overflow and thus undefined behaviours. These include nla-digbench/hard.c as mentioned in #1105 and the ones I mentioned in #1159. Below are some more.

nla-digbench/egcd.c
nla-digbench/egcd2.c
nla-digbench/egcd3.c
nla-digbench/lcm1.c
nla-digbench/prod4br.c

The problem also propagates to nla-digbench-scaling but I think the best would be to figure it out what to do with the base case and then using the script from @MartinSpiessl to generate the derived versions.

The question is how to fix these benchmarks from the reachability perspective (we should additionally keep them as they are for the no-overflow category).

I spent some time trying to find out a general solution, but nothing came to my mind. In issue #1105, it is said that using unsigned would solve the problem, but I don't think this apply to the general case (not event to this particular case, see this): the extra bit might not be enough if e.g. we multiply two non deterministic integers.

I thought adding something like assume_abort_if_not(X >= 1) <= INT_MAX and assume_abort_if_not(X >= 1) >= INT_MIN would do the trick even if we would need the check for every sub-expression, e.g. assert(x*y + a*b == 0) would require to add

assume_abort_if_not(x*y >= 1) <= INT_MAX
assume_abort_if_not(x*y >= 1) >= INT_MIN
assume_abort_if_not(a*b >= 1) <= INT_MAX
assume_abort_if_not(a*b >= 1) >= INT_MIN
assume_abort_if_not(x*y + a*b >= 1) <= INT_MAX
assume_abort_if_not(x*y + a*b >= 1) >= INT_MIN

however I'm not sure this works. I'm performing the verification on optimised code (using -O3 in clang) and I still get a wrong result. While this might be a bug on my verifier, I suspect it is actually that clang performs some aggressive optimisation due to undefined behaviour (this was the way I discover all other overflows).

The text was updated successfully, but these errors were encountered:

hernanponcedeleon · 2020-10-11T10:09:34Z

I already understood why the "solution" using assume_abort_if_not does not work: the check would already overflow.

@divyeshunadkat proposed in #1155 to use long long as the type or certain variables. However I still don't think the proposed solution would work in general case. It will work when we use only use + or -. Even if we only use on * (the extra 4 bytes should accommodate the result), however as soon as we have something like s*y*y (e.g. egcd2.c), we have problems again.

I don't think there are worse cases than those (using twice *) so one possibility would be to use long long (8 bytes) for the result variables and restrict the nondeterministic values to shorts (2 bytes). I think this we keep the complexity of the benchmarks without resulting in overflow.

hernanponcedeleon · 2020-10-26T08:58:58Z

Now that #1155 has been merged, we can proceed to fix the inherited bugs in nla-digbench-scaling. @MartinSpiessl can you please re-run your script in the new fixed benchmarks from nla-digbench? You can assign me as a reviewer for the PR.

hernanponcedeleon · 2020-11-02T17:17:45Z

I think there are still problems overflows with egad2-ll and egad3-ll. I tried checking overflows with CPAchecker and got violations:

scripts/cpa.sh -svcomp20 -heap 10000M -benchmark -timelimit '900 s' -64 -stats -spec ~/git/sv-benchmarks/c/properties/no-overflow.prp ~/git/sv-benchmarks/c/nla-digbench/egcd2.c -setprop solver.solver=SMTInterpol -setprop cpa.predicate.encodeBitvectorAs=INTEGER -setprop cpa.predicate.encodeFloatAs=RATIONAL

@MartinSpiessl told me the analysis might be unsound so I'm not sure. However, I get errors using Dartagnan in the scaled versions (nla-digbench-scaling) which might suggest there is undefined behaviour.

MartinSpiessl · 2020-11-10T18:43:53Z

please re-run your script in the new fixed benchmarks from nla-digbench?

I need to do so anyway to fix #1200 after the coverage property verdicts got merged first. I will probably open a new PR and close #1200 because the merge conflicts are not worth fixing if I can just regenerate everything from the generation script

@MartinSpiessl told me the analysis might be unsound so I'm not sure. However, I get errors using Dartagnan in the scaled versions (nla-digbench-scaling) which might suggest there is undefined behaviour.

Yes, linear integer arithmetic might be unsound, I will have a look.

hernanponcedeleon added C Task in language C issue with benchmark labels Oct 9, 2020

This was referenced Oct 10, 2020

Fixing benchmarks with overflows from nla-digibench #1155

Merged

PR #920 needs some fixes (undefined behavior was overlooked, fixed task simplified the program too much) #1105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reachability benchmarks with overflow #1166

Reachability benchmarks with overflow #1166

hernanponcedeleon commented Oct 9, 2020

hernanponcedeleon commented Oct 11, 2020

hernanponcedeleon commented Oct 26, 2020

hernanponcedeleon commented Nov 2, 2020

MartinSpiessl commented Nov 10, 2020

Reachability benchmarks with overflow #1166

Reachability benchmarks with overflow #1166

Comments

hernanponcedeleon commented Oct 9, 2020

hernanponcedeleon commented Oct 11, 2020

hernanponcedeleon commented Oct 26, 2020

hernanponcedeleon commented Nov 2, 2020

MartinSpiessl commented Nov 10, 2020