Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

undefined behavior warning on arm with -mfpu=neon #88

Closed
mvduin opened this issue Jul 29, 2021 · 12 comments
Closed

undefined behavior warning on arm with -mfpu=neon #88

mvduin opened this issue Jul 29, 2021 · 12 comments

Comments

@mvduin
Copy link
Contributor

mvduin commented Jul 29, 2021

When building on armhf with -mfpu=neon using gcc 10.2, I get the following failure:

In file included from minimp3_ex.h:9,
                 from minimp3_test.c:40:
minimp3.h: In function ‘mp3dec_decode_frame’:
minimp3.h:894:23: error: iteration 1073741823 invokes undefined behavior [-Werror=aggressive-loop-optimizations]
minimp3.h:892:5: note: within this loop

This seems really puzzling... the loop it's referring to can never do more than 576 iterations. But perhaps it's talking about the loop after it's done structural transformations and there's an actual issue somewhere higher up? Either way, regardless of whether this is a minimp3 bug or a gcc bug, it's making me nervous.

@lieff
Copy link
Owner

lieff commented Jul 30, 2021

Hmm, looks like it tries to vectorize the loop and run into aliasing issue. Can you try -fno-strict-aliasing option?

@mvduin
Copy link
Contributor Author

mvduin commented Jul 30, 2021

No effect. Lowering optimization to -O1 does eliminate the warning, so I'm now trying a binary search of the optimization settings that differ between O1 and O2, perhaps the specific flag that triggers it might give a clue

@mvduin
Copy link
Contributor Author

mvduin commented Jul 30, 2021

-O1 -mfpu=neon -finline-functions -finline-small-functions -ftree-vrp triggers the warning, and no subset of these flags does.
Conversely, -O3 -mfpu=neon -fno-tree-vrp does not trigger the warning.

@mvduin
Copy link
Contributor Author

mvduin commented Jul 30, 2021

and of course -faggressive-loop-optimizations (enabled by default) is required to trigger the warning

@lieff
Copy link
Owner

lieff commented Jul 30, 2021

Hmmm.

-ftree-vrp
Perform Value Range Propagation on trees. This is similar to the constant propagation pass, but instead of values, ranges of values are propagated. This allows the optimizers to remove unnecessary range checks like array bound checks and null pointer checks. This is enabled by default at -O2 and higher. Null pointer check elimination is only done if -fdelete-null-pointer-checks is enabled.

Nothing to do with aliasing. Looks like gcc bug after all.

@mvduin
Copy link
Contributor Author

mvduin commented Jul 31, 2021

Yup. Further analysis (all tests done with -O1 -mfpu=neon -finline-functions -finline-small-functions -ftree-vrp):

The "offending" call is this one in L3_decode: L3_midside_stereo(s->grbuf[0], 576); Commenting it out or changing the second argument to 575 makes the warning go away. Changing the second argument to 572 does however not make the warning go away, i.e. the warning is triggered by L3_midside_stereo being called with n being a compile-time constant that's a multiple of 4.

The warning can also be suppressed by any of the following changes to L3_midside_stereo:

  • disabling the entire #if HAVE_SIMD block (as expected from the fact that -mfpu=neon is required to trigger the warning).
  • changing the SIMD loop bound to i < n - 4, thus pointlessly making the non-SIMD loop deal with the last four elements when n is a multiple of 4.
  • adding if (n % 4 == 0) return; between the SIMD loop and the non-SIMD loop, even though this test is obviously redundant.
  • adding if (__builtin_constant_p(n % 4 == 0) && n % 4 == 0) return; between the SIMD loop and the non-SIMD loop, which is a zero-cost workaround for this warning since the test will be eliminated at compile-time.
  • adding i = n & -4; between the SIMD loop and the non-SIMD loop, even though i obviously already has that value at that point.
  • replacing the body of the SIMD loop by equivalent non-SIMD code:
        float vl0 = left [i], vl1 = left [i+1], vl2 = left [i+2], vl3 = left [i+3];
        float vr0 = right[i], vr1 = right[i+1], vr2 = right[i+2], vr3 = right[i+3];
        left [i] = vl0 + vr0; left [i+1] = vl1 + vr1; left [i+2] = vl2 + vr2; left [i+3] = vl3 + vr3;
        right[i] = vl0 - vr0; right[i+1] = vl1 - vr1; right[i+2] = vl2 - vr2; right[i+3] = vl3 - vr3;
  • replacing the body of the SIMD loop by non-SIMD code that merely swaps the channels.
  • replacing the body of the SIMD loop by non-SIMD code that merely copies the left channel to the right channel:
        right[i] = left[i]; right[i+1] = left[i+1]; right[i+2] = left[i+2]; right[i+3] = left[i+3];

However, none of the following changes to L3_midside_stereo get rid of the warning:

  • adding if (i < 0 || i >= 576 || i >= n) return; between the SIMD loop and the non-SIMD loop
  • adding if (i < 0 || i >= 576 || i >= n) return; inside the non-SIMD loop at the top of the loop body
  • replacing the body of the SIMD loop by SIMD code that merely swaps the channels
  • replacing the body of the SIMD loop by SIMD code that merely stores the values back from where they were loaded
  • replacing the body of the SIMD loop by non-SIMD code that merely stores the values back from where they were loaded
  • replacing the body of the SIMD loop by right[i] = left[i];
  • replacing the body of the SIMD loop by right[i] = left[i]; right[i+3] = left[i+3];
  • replacing the body of the SIMD loop by the body of the non-SIMD loop
  • removing the body of the SIMD loop entirely
  • removing the body of the SIMD loop entirely, making the SIMD loop unconditional (i.e. not depend on HAVE_SIMD nor have_simd()), and removing -mfpu=neon from the compiler flags.

In retrospect there was no need to try all these variations of the SIMD loop body but the inconsistent behaviour from gcc puzzled me greatly.

mvduin added a commit to dutchanddutch/minimp3 that referenced this issue Jul 31, 2021
@mvduin
Copy link
Contributor Author

mvduin commented Jul 31, 2021

Issue is known and being worked on: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100801

@lieff
Copy link
Owner

lieff commented Jul 31, 2021

Thanks for such detailed research)
Can you also try potentially more simple workaround using __attribute__ ((noinline)) on L3_midside_stereo?

@mvduin
Copy link
Contributor Author

mvduin commented Jul 31, 2021

That works too of course but my suggested patch is pretty simple too and has the benefit that it doesn't affect optimization, avoiding any need to check performance impact. This seemed desirable considering this isn't a case of misoptimization, it's merely a spurious warning.

@lieff
Copy link
Owner

lieff commented Aug 1, 2021

I've checked code sizes for x64 and looks like your workaround is better:
No modification:

   text    data     bss     dec     hex filename
  65029     928      32   65989   101c5 ./minimp3

__attribute__ ((noinline)):

   text    data     bss     dec     hex filename
  65973     928      32   66933   10575 ./minimp3

__builtin_constant_p:

   text    data     bss     dec     hex filename
  65077     928      32   66037   101f5 ./minimp3

That`s strange that noinline increases code size, but ok, picking the best one)

lieff pushed a commit that referenced this issue Aug 1, 2021
@mvduin
Copy link
Contributor Author

mvduin commented Aug 1, 2021

Agreed. I'm also surprised my change makes a noticable difference, that doesn't really make sense either. Compilers are weird.

@lieff
Copy link
Owner

lieff commented Aug 6, 2021

Resolved, so closing.

@lieff lieff closed this as completed Aug 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants