Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: optimizing compilers can reorder call to npy_get_floatstatus #11036

Merged
merged 6 commits into from May 10, 2018

Conversation

mattip
Copy link
Member

@mattip mattip commented May 3, 2018

Fixes #10370. We should find a more generic and explicit way to prevent optimizing compilers from reordering the call to npy_get_floatstatus.

To reproduce the problem, clang or gcc-8.1 are required. Confirmed that this fixes the problem using clang-6.0

@eric-wieser
Copy link
Member

Does putting the volatile in npy_get_floatstatus instead work as well?

@mattip
Copy link
Member Author

mattip commented May 3, 2018

no, but you may be on to something. Adding c1 as a parameter to npy_get_floatstatus does seem to prevent the compiler reorder - see the compiler-to-assembler side-by-side comparison at https://godbolt.org/g/Zoc5xr and then modify the outside function to accept an argument

@eric-wieser
Copy link
Member

I don't understand that test - what is outside?

@mattip
Copy link
Member Author

mattip commented May 3, 2018

outside is a dummy function, the equivalent of npy_get_floatstatus. Its implementation is irrelevant to the compilation of this unit

Edit - it's -> its

@eric-wieser
Copy link
Member

eric-wieser commented May 3, 2018

Its implementation is irrelevant to the compilation of this unit

But it's not - in the numpy case, the compiler is (probably?) able to inline the function, whereas here it isn't able to. Your test is just showing that GCC won't make the optimization if it doesn't know what the function does with its input.

@tzickel
Copy link

tzickel commented May 3, 2018

A. I don't know if people usually build numpy wth LTO, but it's worth checking that if using LTO, the compiler might see the internals of that function and decide that it's the safe to still reorder it.

B. The rest of the npy_*floatstatus* usage in the code needs to be audited on clang / gcc and see that the compiler does not reorder it in a bad way as well.

C. Testing it on MSVC seeing it's still ok (there in the documentation it says that using FENV_ACCESS is the way to go, maybe detect compiler and do something differently).

@mattip
Copy link
Member Author

mattip commented May 3, 2018

Here is a version where outside is inlined. Note that on the assembler side, lines 4-5 are outside, and they occur before 6-8 which are c1 = _mm_min_pd(c1, c2)

If I add c1 as a parameter to outside, the compiler generates the correct assembler https://godbolt.org/g/za8vNV , even with inlining and without volatile

@eric-wieser
Copy link
Member

eric-wieser commented May 3, 2018

If I add c1 as a parameter to outside, the compiler generates the correct assembler

That's not what I see when I click that link:

image

The first thing that happens is your outside is called (clang 6.0.0)

@eric-wieser
Copy link
Member

What is working for me is adding a volatile char in npy_get_floatstatus, which then forces the argument to be passed:

https://godbolt.org/g/gEqR6P

@juliantaylor
Copy link
Contributor

juliantaylor commented May 3, 2018

npy_get_float_status is not const function, the compiler is not allowed to reorder it as it can (and does) depends on the global state.
This looks like a compiler bug we should investigate. Or clang has changed some options so the floating point state is not considered anymore by default. Or automatic const function detection is going haywire.

Is it reproducable with gcc 8? I tried with 8.0.1 and the code looks correct.
edit nevermind, reproduced it.

@mattip
Copy link
Member Author

mattip commented May 3, 2018

@juliantaylor according to https://godbolt.org/g/gEqR6P the issue exists in old versions of clang, and now reproduces in gcc 8.1

@mattip
Copy link
Member Author

mattip commented May 3, 2018

is my fix considered a C-API change? Neither scipy nor cython use the npy_.*floatstatus.*functions

@@ -759,13 +780,18 @@ int npy_clear_floatstatus(void)

#else

int npy_get_floatstatus(void)
int npy_get_floatstatus(void *param)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass as void * instead of char *?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no particular reason. This way any pointer argument will suffice and silently cast, if it is char* I will need to cast everywhere it is called, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to char *

@charris
Copy link
Member

charris commented May 3, 2018

is my fix considered a C-API change?

Yes, and I think you should not use the same function names if you change the signatures because that will break current code, not just require a recompile. You could make it feature version dependent, but then folks would be stuck with using at least that version of numpy for everything, which may be more work than they want to deal with. See npy_config.h and ndarraytypes.h. AFAICT, we have never used the NPY_FEATURE_VERSION by itself. I suppose one way to avoid problems would be to make the new functions private, but they may be needed downstream.

Any particular reason to pass the arguments as void * ?

@juliantaylor
Copy link
Contributor

@mattip
Copy link
Member Author

mattip commented May 3, 2018

So if this approach is acceptable, I will change the function names.

@juliantaylor
Copy link
Contributor

fwiw this approach breaks the api but not the abi as the argument is not actually used
but a new function would be nicer for downstreams, maybe with _barrier suffix

@charris
Copy link
Member

charris commented May 3, 2018

A good explanatory note in the 1.15.0 release notes listing the new functions and their usage would be helpful, @ahaldane If you want to do a 1.14.4 release this should be part of it.

@@ -150,31 +150,31 @@ Those can be useful for precise floating point comparison.

.. versionadded:: 1.4.0

.. c:function:: void npy_set_floatstatus_divbyzero()
.. c:function:: void npy_set_floatstatus_divbyzero(void*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need barriers on the set functions?
reordering becomes interesting when reading the status but setting doesn't really matter as it has no influence on future instructions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed changes to npy_set_floatstatus*

@ahaldane
Copy link
Member

ahaldane commented May 3, 2018

I'm not sure yet about 1.14.4, but let's tag things to see what would go in it.

@ahaldane ahaldane modified the milestones: 1.15.0 release, 1.14.4 May 3, 2018
@tzickel
Copy link

tzickel commented May 3, 2018

@juliantaylor
Copy link
Contributor

the scalarmath code could also be updated to use the functions with the barriers.

@charris
Copy link
Member

charris commented May 6, 2018

I don't know if people usually build numpy wth LTO

There have been some fixes of problems exposed by LTO, on Windows IIRC.

@charris
Copy link
Member

charris commented May 6, 2018

Segmentation fault.

@charris
Copy link
Member

charris commented May 6, 2018

  numpy\core\src\npymath\ieee754.c.src(754) : error C2143: syntax error : missing ';' before 'type'
  numpy\core\src\npymath\ieee754.c.src(761) : error C2065: 'fpstatus' : undeclared identifier
  numpy\core\src\npymath\ieee754.c.src(762) : error C2065: 'fpstatus' : undeclared identifier
  numpy\core\src\npymath\ieee754.c.src(763) : error C2065: 'fpstatus' : undeclared identifier
  numpy\core\src\npymath\ieee754.c.src(764) : error C2065: 'fpstatus' : undeclared identifier

Also unused variable warnings.

@charris charris merged commit f5758d6 into numpy:master May 10, 2018
.. c:function:: int npy_clear_floatstatus()

Clears the floating point status. Returns the previous status mask.

.. versionadded:: 1.9.0

.. c:function:: int npy_clear_floatstatus(char*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be npy_clear_floatstatus_barrier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should. This has been merged, so I will fix it somewhere else

/*
* By using a volatile, the compiler cannot reorder this call
*/
if (param != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this check, yet pass x in from npy_get_floatstatus?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct thing to do is call npy_get_floatstatus_barrier directly with a local variable, this currently prevents reordering the call. See for instance _check_ufunc_fperr in numpy/core/src/umath/extobj.c (where extobj may be NULL), or line 866 in scalarmath.c.src which was the original place the reordering was noticed.

When I fix the documentation from the comment above I will expand why the _barrier form is preferable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @eric-wieser was asking why the check was needed at all.

@charris
Copy link
Member

charris commented May 30, 2018

@mattip This is a horror to backport because it touches stuff all over the place. Could you give it a shot? Leave out the documentation stuff and just get the fix and tests backported.

@charris
Copy link
Member

charris commented May 30, 2018

I'll give a backport another shot also.

@charris
Copy link
Member

charris commented May 30, 2018

OK, I squashed the commits and did a backport, looks OK so far.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label May 30, 2018
@charris charris removed this from the 1.14.4 release milestone May 30, 2018
seberg added a commit to seberg/numpy that referenced this pull request Aug 5, 2020
The volatile statement was designed to prevent reordering of
floating point error checks, however, this was more generally
fixed in numpygh-11036, thus removing the need for the volatile
declaration (and bringing the code in line with the rest of
the file).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.min does not always propagate NaNs
6 participants