New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: optimizing compilers can reorder call to npy_get_floatstatus #11036
Conversation
Does putting the |
no, but you may be on to something. Adding c1 as a parameter to npy_get_floatstatus does seem to prevent the compiler reorder - see the compiler-to-assembler side-by-side comparison at https://godbolt.org/g/Zoc5xr and then modify the |
I don't understand that test - what is |
Edit - it's -> its |
But it's not - in the numpy case, the compiler is (probably?) able to inline the function, whereas here it isn't able to. Your test is just showing that GCC won't make the optimization if it doesn't know what the function does with its input. |
A. I don't know if people usually build numpy wth LTO, but it's worth checking that if using LTO, the compiler might see the internals of that function and decide that it's the safe to still reorder it. B. The rest of the npy_*floatstatus* usage in the code needs to be audited on clang / gcc and see that the compiler does not reorder it in a bad way as well. C. Testing it on MSVC seeing it's still ok (there in the documentation it says that using FENV_ACCESS is the way to go, maybe detect compiler and do something differently). |
Here is a version where If I add c1 as a parameter to |
What is working for me is adding a volatile |
Is it reproducable with gcc 8? I tried with 8.0.1 and the code looks correct. |
@juliantaylor according to https://godbolt.org/g/gEqR6P the issue exists in old versions of clang, and now reproduces in gcc 8.1 |
is my fix considered a C-API change? Neither scipy nor cython use the |
numpy/core/src/npymath/ieee754.c.src
Outdated
@@ -759,13 +780,18 @@ int npy_clear_floatstatus(void) | |||
|
|||
#else | |||
|
|||
int npy_get_floatstatus(void) | |||
int npy_get_floatstatus(void *param) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pass as void *
instead of char *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular reason. This way any pointer argument will suffice and silently cast, if it is char*
I will need to cast everywhere it is called, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to char *
Yes, and I think you should not use the same function names if you change the signatures because that will break current code, not just require a recompile. You could make it feature version dependent, but then folks would be stuck with using at least that version of numpy for everything, which may be more work than they want to deal with. See Any particular reason to pass the arguments as |
So if this approach is acceptable, I will change the function names. |
fwiw this approach breaks the api but not the abi as the argument is not actually used |
A good explanatory note in the 1.15.0 release notes listing the new functions and their usage would be helpful, @ahaldane If you want to do a 1.14.4 release this should be part of it. |
@@ -150,31 +150,31 @@ Those can be useful for precise floating point comparison. | |||
|
|||
.. versionadded:: 1.4.0 | |||
|
|||
.. c:function:: void npy_set_floatstatus_divbyzero() | |||
.. c:function:: void npy_set_floatstatus_divbyzero(void*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need barriers on the set functions?
reordering becomes interesting when reading the status but setting doesn't really matter as it has no influence on future instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed changes to npy_set_floatstatus*
I'm not sure yet about 1.14.4, but let's tag things to see what would go in it. |
nice throw back from 7 years ago: :) |
the scalarmath code could also be updated to use the functions with the barriers. |
There have been some fixes of problems exposed by LTO, on Windows IIRC. |
Segmentation fault. |
Also unused variable warnings. |
.. c:function:: int npy_clear_floatstatus() | ||
|
||
Clears the floating point status. Returns the previous status mask. | ||
|
||
.. versionadded:: 1.9.0 | ||
|
||
.. c:function:: int npy_clear_floatstatus(char*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be npy_clear_floatstatus_barrier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it should. This has been merged, so I will fix it somewhere else
/* | ||
* By using a volatile, the compiler cannot reorder this call | ||
*/ | ||
if (param != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this check, yet pass x
in from npy_get_floatstatus
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct thing to do is call npy_get_floatstatus_barrier
directly with a local variable, this currently prevents reordering the call. See for instance _check_ufunc_fperr
in numpy/core/src/umath/extobj.c
(where extobj may be NULL), or line 866 in scalarmath.c.src
which was the original place the reordering was noticed.
When I fix the documentation from the comment above I will expand why the _barrier
form is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @eric-wieser was asking why the check was needed at all.
@mattip This is a horror to backport because it touches stuff all over the place. Could you give it a shot? Leave out the documentation stuff and just get the fix and tests backported. |
I'll give a backport another shot also. |
OK, I squashed the commits and did a backport, looks OK so far. |
The volatile statement was designed to prevent reordering of floating point error checks, however, this was more generally fixed in numpygh-11036, thus removing the need for the volatile declaration (and bringing the code in line with the rest of the file).
Fixes #10370. We should find a more generic and explicit way to prevent optimizing compilers from reordering the call to
npy_get_floatstatus
.To reproduce the problem, clang or gcc-8.1 are required. Confirmed that this fixes the problem using clang-6.0