Skip to content
This repository has been archived by the owner on Oct 3, 2021. It is now read-only.

nla-digbench tasks freire{1,2}.c are flawed #1118

Open
MartinSpiessl opened this issue Sep 20, 2020 · 3 comments
Open

nla-digbench tasks freire{1,2}.c are flawed #1118

MartinSpiessl opened this issue Sep 20, 2020 · 3 comments
Labels
C Task in language C issue with benchmark

Comments

@MartinSpiessl
Copy link
Contributor

The programs freire{1,2}.c in c/nla-digbench are clearly flawed since they neglect the limited floating point precision.

As an example, let's look at this concretized version of freire1.c:

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    int r;
    double a, x;
    a = 12345678901234567;
    x = a / 2.0;
    r = 0;
    long long i = 0;

    while (1) {
        i++;
        printf("i:\t\t\t%lld\n",i);
        printf("a:\t\t\t%f\n",a);
        printf("x:\t\t\t%f\n",x);
        printf("r:\t\t\t%d\n",r);
        printf("r*r - a - r + 2*x:\t%f\n",r*r - a - r + 2*x);
        printf("(2*x-a)+(r*r-r):\t%f\n\n",(2*x-a)+(r*r-r));
        assert((int)(r*r - a - r + 2*x) == 0);
        //assert((int)((2*x-a)+(r*r-r))==0);

        if (!(x > r))
            break;
        x = x - r;
        r = r + 1;
    }

    assert((int)(r*r - a - r + 2*x) == 0);
    return 0;
}

After compilation this will show us that the assert inside the loop can be violated after only 4 rounds (I introduced i to track numbers of rounds and also added some printf-calls for debugging:

i:                      1
a:                      12345678901234568.000000
x:                      6172839450617284.000000
r:                      0
r*r - a - r + 2*x:      0.000000
(2*x-a)+(r*r-r):        0.000000

i:                      2
a:                      12345678901234568.000000
x:                      6172839450617284.000000
r:                      1
r*r - a - r + 2*x:      0.000000
(2*x-a)+(r*r-r):        0.000000

i:                      3
a:                      12345678901234568.000000
x:                      6172839450617283.000000
r:                      2
r*r - a - r + 2*x:      0.000000
(2*x-a)+(r*r-r):        0.000000

i:                      4
a:                      12345678901234568.000000
x:                      6172839450617281.000000
r:                      3
r*r - a - r + 2*x:      -2.000000
(2*x-a)+(r*r-r):        0.000000

a.out: main.c:28: main: Assertion `(int)(r*r - a - r + 2*x) == 0' failed.
Aborted (core dumped)

The initial value for a that I chose here leads to the assertion violation in just 4 steps. I was not able to get any less than 4 steps, sometimes it goes into the thousands. I also showed that by reordering of the terms in the assumption (grouping by type) we get a slightly better numeric stability, but even then the assertion would eventually be violated. This is inevitable because of the limited precision of IEEE floating-point values.

@nguyenthanhvuh was aware that CPAchecker complained here since both files contain the following comment in the beginning:

Note: for some reason using cpa was able to disprove these
cpa.sh -kInduction -setprop solver.solver=z3 freire1.c`

Indeed, these counterexamples are probably valid as shown via my concrete example above. The problem here is that this specific algorithm of Freire was never stated for use with floating-point variables or even signed integers:

http://www.pedrofreire.com/sqrt/sqrt1.en.html

register unsigned int x, r;

/* … */
x = (x+1) >> 1;    /* init x */
for( r=0; x>r; x-=r++ );
  1. In order to fix these tasks, we would need to change the types to integers and make sure that none of the operations on the way overflow. Since the algorithm in freire1.c will calculate the next bigger square, i.e. (floor(sqrt(a))+1)^2, we would have to limit the maximum value of a such that this value is still below INT_MAX. Or just some value that is sufficiently lower than INT_MAX if we are lazy 😉

  2. For negative values of a the algorithm will immediately break and the assertions actually hold, so we would not need to change the integer types to be unsigned instead of signed in order for the program to be correct (w.r.t. the assertions).

  3. We could probably also find constraints on the floating-point values of a that would allow us to fix these tasks while retaining the floating-point variables. It is always nice to have tasks that actually use floating-point variables, though they are intrinsically extremely hard to verify.

@ndkimhao
Copy link

ndkimhao commented Sep 27, 2020

Hi, @MartinSpiessl, the double type can't represent exactly the number 12345678901234567, so all calculations that followed the statement a = 12345678901234567 would not be accurate.

I think we can limit the value of the a variable to resolve this problem. I did a quick binary search and found the following bounds for Freire1 and 2:

Freire1:

  • current code: a <= 2147441940
  • replace int r by double r: a <= 9007199254740993 = 2^53 + 1

Freire2:
a <= 538359860

@MartinSpiessl
Copy link
Contributor Author

Hi, @MartinSpiessl, the double type can't represent exactly the number 12345678901234567, so all calculations that followed the statement a = 12345678901234567 would not be accurate.

Yeah, I didn't care to much whether this particular value can be represented. It will just lead to a specific value in a that can be represented. Even if we set a to a double that can be represented, there is no guarantee that intermediate calculation results can still be represented accurately if that double value is high enough.

Thanks for checking out the bounds! I tried the first solution for freire1.c with CPAchecker but I still found a violation, even if I set a<=1. I looked at the model and found that -inf was assigned to a. So I guess we would have to post a lower bound for a as well to exclude this extreme value (which is of course a double value, so I do not think this is an error in CPAchecker, __VERIFIER_nondet_double should be allowed to output (any) NaN and inf value).

With a lower bound of 0 or -2147441940 CPAchecker runs into a timeout instead, which is good news, since this means the solver cannot easily detect a problem anymore. There might still be one, but that would be the other verifiers job to prove or disprove.

Having this as a verification task is therefore very interesting. Can you make a PR for this? Maybe also add a variant that only uses integers, the comparison would be especially nice.
I will look over the PR, test the fixes with CPAchecker again and would approve if the violations cannot be found anymore.

@ndkimhao
Copy link

We have created the integer version of freire1.c and freire2.c recently. I will also check the bounds for them and make a PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C Task in language C issue with benchmark
Development

No branches or pull requests

2 participants