Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum.c is erroneous in the presence of compiler optimization #2

Open
comex opened this issue Oct 25, 2015 · 4 comments
Open

sum.c is erroneous in the presence of compiler optimization #2

comex opened this issue Oct 25, 2015 · 4 comments

Comments

@comex
Copy link

comex commented Oct 25, 2015

There are two problems with sum.c as written: since the sum, s, is not used, any optimizing compiler will remove the loop entirely; and even if it is used, at least GCC and Clang are able to determine that the loop will result in s == NUMBER and remove it anyway.

It's hard to come up with a simple alternate solution because compilers are pretty tricky with these kinds of simple loops. One possibility is to make s volatile, but this will somewhat artificially constrain performance by making the CPU store each iteration to memory rather than keeping it in a register. Another is to add

asm("" :: "r"(s));

in the loop, to force the compiler to put the sum in some (single) register before executing an empty block of assembly.

@auscompgeek
Copy link
Contributor

I see two possible clean solutions:

  • Move the summation into a separate function. This may or may not work, depending on whether gcc and clang are intelligent enough to determine whether a function is deterministic. I haven't investigated this.
  • Print out the end value of s. This is done in fill_array.c and fill_array_out_of_order.c as well, so the same should probably be done here.

@jvns
Copy link
Collaborator

jvns commented Oct 25, 2015

Thanks for reporting this! This was originally tested with gcc -O2 on OS X and it didn't optimize out the loop, but it certainly doesn't work on my Linux machine.

@radarek
Copy link

radarek commented Nov 2, 2015

On my machine (OSX 10.11) -O2 actually optimized it away:

$ gcc -O2 sum.c
$ time ./a.out 500000000

real    0m0.004s
user    0m0.001s
sys 0m0.002s

-O1 gives the same result. Only -O0 version isn't optimized in this case:

$ gcc -O0 sum.c
$ time ./a.out 500000000

real    0m1.079s
user    0m1.054s
sys 0m0.008s
$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.0.0
Thread model: posix

I would suggest to use -O0 optimization level. We measure how fast is real loop, not optimized out one (which means no code at all).

@edgar-bonet
Copy link

radarek wrote:

I would suggest to use -O0 optimization level.

I disagree. At -O0, gcc tends to generate hugely inefficient code. Benchmarking that tells you nothing about the speed you can expect from a C program.

I suggest the following:

#include <stdlib.h>
#include <stdio.h>

// Number to guess: How many iterations of
// this loop can we go through in a second?

int main(int argc, char **argv) {
    unsigned int NUMBER, i, s;
    NUMBER = atoi(argv[1]);

    for (s = i = 0; i < NUMBER; ++i) {
        s += i;
    }
    printf("%u\n", s);

    return 0;
}

Notice three changes:

  • The value of s is printed in order to ensure it will be computed
  • s += 1 is replaced by s += i because with the former gcc is able to figure out the result at compile time, which it doesn't with the later
  • the numbers are unsigned in order to avoid undefined behavior when s overflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants