BUG: optimizing compilers can reorder call to npy_get_floatstatus #11036

mattip · 2018-05-03T04:25:17Z

Fixes #10370. We should find a more generic and explicit way to prevent optimizing compilers from reordering the call to npy_get_floatstatus.

To reproduce the problem, clang or gcc-8.1 are required. Confirmed that this fixes the problem using clang-6.0

eric-wieser · 2018-05-03T04:35:47Z

Does putting the volatile in npy_get_floatstatus instead work as well?

mattip · 2018-05-03T05:20:37Z

no, but you may be on to something. Adding c1 as a parameter to npy_get_floatstatus does seem to prevent the compiler reorder - see the compiler-to-assembler side-by-side comparison at https://godbolt.org/g/Zoc5xr and then modify the outside function to accept an argument

eric-wieser · 2018-05-03T05:47:01Z

I don't understand that test - what is outside?

mattip · 2018-05-03T05:49:12Z

outside is a dummy function, the equivalent of npy_get_floatstatus. Its implementation is irrelevant to the compilation of this unit

Edit - it's -> its

eric-wieser · 2018-05-03T05:51:33Z

Its implementation is irrelevant to the compilation of this unit

But it's not - in the numpy case, the compiler is (probably?) able to inline the function, whereas here it isn't able to. Your test is just showing that GCC won't make the optimization if it doesn't know what the function does with its input.

tzickel · 2018-05-03T06:13:48Z

A. I don't know if people usually build numpy wth LTO, but it's worth checking that if using LTO, the compiler might see the internals of that function and decide that it's the safe to still reorder it.

B. The rest of the npy_*floatstatus* usage in the code needs to be audited on clang / gcc and see that the compiler does not reorder it in a bad way as well.

C. Testing it on MSVC seeing it's still ok (there in the documentation it says that using FENV_ACCESS is the way to go, maybe detect compiler and do something differently).

mattip · 2018-05-03T06:20:20Z

Here is a version where outside is inlined. Note that on the assembler side, lines 4-5 are outside, and they occur before 6-8 which are c1 = _mm_min_pd(c1, c2)

If I add c1 as a parameter to outside, the compiler generates the correct assembler https://godbolt.org/g/za8vNV , even with inlining and without volatile

eric-wieser · 2018-05-03T06:28:00Z

If I add c1 as a parameter to outside, the compiler generates the correct assembler

That's not what I see when I click that link:

The first thing that happens is your outside is called (clang 6.0.0)

eric-wieser · 2018-05-03T06:31:38Z

What is working for me is adding a volatile char in npy_get_floatstatus, which then forces the argument to be passed:

https://godbolt.org/g/gEqR6P

juliantaylor · 2018-05-03T15:38:33Z

npy_get_float_status is not const function, the compiler is not allowed to reorder it as it can (and does) depends on the global state.
This looks like a compiler bug we should investigate. Or clang has changed some options so the floating point state is not considered anymore by default. Or automatic const function detection is going haywire.

Is it reproducable with gcc 8? I tried with 8.0.1 and the code looks correct.
edit nevermind, reproduced it.

mattip · 2018-05-03T15:50:47Z

@juliantaylor according to https://godbolt.org/g/gEqR6P the issue exists in old versions of clang, and now reproduces in gcc 8.1

mattip · 2018-05-03T15:51:57Z

is my fix considered a C-API change? Neither scipy nor cython use the npy_.*floatstatus.*functions

charris · 2018-05-03T15:57:42Z

numpy/core/src/npymath/ieee754.c.src

@@ -759,13 +780,18 @@ int npy_clear_floatstatus(void)

 #else

-int npy_get_floatstatus(void)
+int npy_get_floatstatus(void *param)


Why pass as void * instead of char *?

no particular reason. This way any pointer argument will suffice and silently cast, if it is char* I will need to cast everywhere it is called, no?

changed to char *

charris · 2018-05-03T16:25:13Z

is my fix considered a C-API change?

Yes, and I think you should not use the same function names if you change the signatures because that will break current code, not just require a recompile. You could make it feature version dependent, but then folks would be stuck with using at least that version of numpy for everything, which may be more work than they want to deal with. See npy_config.h and ndarraytypes.h. AFAICT, we have never used the NPY_FEATURE_VERSION by itself. I suppose one way to avoid problems would be to make the new functions private, but they may be needed downstream.

Any particular reason to pass the arguments as void * ?

juliantaylor · 2018-05-03T16:29:21Z

GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85633

mattip · 2018-05-03T16:35:51Z

So if this approach is acceptable, I will change the function names.

juliantaylor · 2018-05-03T17:19:51Z

fwiw this approach breaks the api but not the abi as the argument is not actually used
but a new function would be nicer for downstreams, maybe with _barrier suffix

charris · 2018-05-03T17:21:39Z

A good explanatory note in the 1.15.0 release notes listing the new functions and their usage would be helpful, @ahaldane If you want to do a 1.14.4 release this should be part of it.

juliantaylor · 2018-05-03T18:06:09Z

doc/source/reference/c-api.coremath.rst

@@ -150,31 +150,31 @@ Those can be useful for precise floating point comparison.

    .. versionadded:: 1.4.0

-.. c:function:: void npy_set_floatstatus_divbyzero()
+.. c:function:: void npy_set_floatstatus_divbyzero(void*)


do we need barriers on the set functions?
reordering becomes interesting when reading the status but setting doesn't really matter as it has no influence on future instructions.

removed changes to npy_set_floatstatus*

ahaldane · 2018-05-03T18:06:39Z

I'm not sure yet about 1.14.4, but let's tag things to see what would go in it.

tzickel · 2018-05-03T19:15:16Z

nice throw back from 7 years ago:
https://github.com/numpy/numpy/blame/b946795bd64d34147f3486e97246ed1f6a4a0937/numpy/core/src/umath/scalarmath.c.src#L809

:)

juliantaylor · 2018-05-06T12:14:32Z

the scalarmath code could also be updated to use the functions with the barriers.

charris · 2018-05-06T19:20:59Z

I don't know if people usually build numpy wth LTO

There have been some fixes of problems exposed by LTO, on Windows IIRC.

charris · 2018-05-06T19:46:46Z

Segmentation fault.

charris · 2018-05-06T20:47:07Z

  numpy\core\src\npymath\ieee754.c.src(754) : error C2143: syntax error : missing ';' before 'type'
  numpy\core\src\npymath\ieee754.c.src(761) : error C2065: 'fpstatus' : undeclared identifier
  numpy\core\src\npymath\ieee754.c.src(762) : error C2065: 'fpstatus' : undeclared identifier
  numpy\core\src\npymath\ieee754.c.src(763) : error C2065: 'fpstatus' : undeclared identifier
  numpy\core\src\npymath\ieee754.c.src(764) : error C2065: 'fpstatus' : undeclared identifier

Also unused variable warnings.

[ci skip]

eric-wieser · 2018-05-10T04:31:42Z

doc/source/reference/c-api.coremath.rst

 .. c:function:: int npy_clear_floatstatus()

    Clears the floating point status. Returns the previous status mask.

    .. versionadded:: 1.9.0

+.. c:function:: int npy_clear_floatstatus(char*)


Should this be npy_clear_floatstatus_barrier?

yes it should. This has been merged, so I will fix it somewhere else

eric-wieser · 2018-05-10T04:32:22Z

numpy/core/src/npymath/ieee754.c.src

+    /*
+     * By using a volatile, the compiler cannot reorder this call
+     */
+    if (param != NULL) {


Why do this check, yet pass x in from npy_get_floatstatus?

The correct thing to do is call npy_get_floatstatus_barrier directly with a local variable, this currently prevents reordering the call. See for instance _check_ufunc_fperr in numpy/core/src/umath/extobj.c (where extobj may be NULL), or line 866 in scalarmath.c.src which was the original place the reordering was noticed.

When I fix the documentation from the comment above I will expand why the _barrier form is preferable.

I think @eric-wieser was asking why the check was needed at all.

charris · 2018-05-30T02:48:33Z

@mattip This is a horror to backport because it touches stuff all over the place. Could you give it a shot? Leave out the documentation stuff and just get the fix and tests backported.

charris · 2018-05-30T13:15:54Z

I'll give a backport another shot also.

charris · 2018-05-30T13:38:00Z

OK, I squashed the commits and did a backport, looks OK so far.

The volatile statement was designed to prevent reordering of floating point error checks, however, this was more generally fixed in numpygh-11036, thus removing the need for the volatile declaration (and bringing the code in line with the rest of the file).

BUG: optimizing compilers can reorder call to npy_get_floatstatus

a51f86b

mattip added 00 - Bug component: numpy._core labels May 3, 2018

mattip added this to the 1.15.0 release milestone May 3, 2018

juliantaylor self-assigned this May 3, 2018

mattip force-pushed the fix-10370 branch from 196d458 to d16c8e5 Compare May 3, 2018 15:20

charris reviewed May 3, 2018

View reviewed changes

juliantaylor reviewed May 3, 2018

View reviewed changes

ahaldane modified the milestones: 1.15.0 release, 1.14.4 May 3, 2018

mattip mentioned this pull request May 4, 2018

BUG: reduce using SSE only warns if inside SSE loop #11043

Merged

alternative fix for npy_get_floatstatus, npy_clear_floatstatus

f21ad36

mattip force-pushed the fix-10370 branch from d16c8e5 to f21ad36 Compare May 5, 2018 20:32

unify test with pr numpy#11043

6eefa6d

use barrier form of functions in place of PyUFunc_{get,clear}fperr

5a835fb

mattip force-pushed the fix-10370 branch from 46baad1 to 165c1b6 Compare May 7, 2018 05:04

update doc, prevent segfault

b91becf

mattip force-pushed the fix-10370 branch from 165c1b6 to b91becf Compare May 7, 2018 05:46

MAINT: Do some rewrite on the 1.15.0 release notes.

305ca24

[ci skip]

charris merged commit f5758d6 into numpy:master May 10, 2018

eric-wieser reviewed May 10, 2018

View reviewed changes

mattip deleted the fix-10370 branch May 10, 2018 06:45

mattip mentioned this pull request May 10, 2018

DOC: expand reasoning behind npy_*floatstatus_barrer() #11073

Merged

charris mentioned this pull request May 11, 2018

Revisit compiler statement reordering fix when we require C99 #11079

Open

charris added the 09 - Backport-Candidate PRs tagged should be backported label May 30, 2018

charris mentioned this pull request May 30, 2018

BUG: optimizing compilers can reorder call to npy_get_floatstatus #11198

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label May 30, 2018

charris removed this from the 1.14.4 release milestone May 30, 2018

astrofrog mentioned this pull request Oct 20, 2018

malloc issue with polyfit #12230

Closed

mattip mentioned this pull request May 13, 2021

GCC 11.1.1 with -O3 causes a lot of RuntimeWarning: invalid value . . .` errors #18949

Closed

BUG: optimizing compilers can reorder call to npy_get_floatstatus #11036

BUG: optimizing compilers can reorder call to npy_get_floatstatus #11036

Conversation

mattip commented May 3, 2018

eric-wieser commented May 3, 2018

mattip commented May 3, 2018

eric-wieser commented May 3, 2018

mattip commented May 3, 2018 • edited

eric-wieser commented May 3, 2018 • edited

tzickel commented May 3, 2018 • edited

mattip commented May 3, 2018

eric-wieser commented May 3, 2018 • edited

eric-wieser commented May 3, 2018

juliantaylor commented May 3, 2018 • edited

mattip commented May 3, 2018

mattip commented May 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented May 3, 2018 • edited

juliantaylor commented May 3, 2018

mattip commented May 3, 2018

juliantaylor commented May 3, 2018

charris commented May 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahaldane commented May 3, 2018

tzickel commented May 3, 2018

juliantaylor commented May 6, 2018

charris commented May 6, 2018

charris commented May 6, 2018

charris commented May 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented May 30, 2018

charris commented May 30, 2018

charris commented May 30, 2018

mattip commented May 3, 2018 •

edited

eric-wieser commented May 3, 2018 •

edited

tzickel commented May 3, 2018 •

edited

eric-wieser commented May 3, 2018 •

edited

juliantaylor commented May 3, 2018 •

edited

charris commented May 3, 2018 •

edited