New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Speedup UFUNC_CHECK_STATUS by avoding heavy clear opition #3739

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants
@arinkverma
Contributor

arinkverma commented Sep 13, 2013

UFUNC_CHECK_STATUS is just single macro which do both checking clearing the error flags. It clear error flags every time after checking. We should avoid clear operation if not needed, as it is a bit expensive and take significant amount of time.

The way numpy detect divide-by-zero, overflow, underflow, etc., is that before each ufunc loop it clear the FP error flags, and then after the ufunc loop we see if any have become set. And clear again. I have avoided clear if not needed to save time.

Performance result is post on blog. http://www.arinkverma.in/2013/09/speedup-ufunccheckstatus-by-avoiding.html

@juliantaylor

This comment has been minimized.

Show comment
Hide comment
@juliantaylor

juliantaylor Sep 14, 2013

Contributor

this needs rebasing, the code for windows changed a bit
else it looks good

Contributor

juliantaylor commented Sep 14, 2013

this needs rebasing, the code for windows changed a bit
else it looks good

@njsmith

This comment has been minimized.

Show comment
Hide comment
@njsmith

njsmith Sep 14, 2013

Member

On windows, we should use _statusfp (and/or _status87, _statusfp2, see the latest version of this code in master) to check the control flags before clearing them. Right now with this patch windows doesn't actually get the optimization. (And in fact the windows code is probably made pointlessly more complicated...)

Overall I think this patch pretty much does the right thing but it leaves a big pile of pointlessly tangled spaghetti.

What we actually want:

  • PyUFunc_getfperr() just gets the status, doesn't clear it
  • PyUFunc_clearfperr() clears the status (but first checks the status and skips clearing it if it's already cleared, since writing to the fp control word can be expensive)

If you trace through all the code then this is... pretty much what ends up happening. Right now clearfperr calls getfperr, and getfperr calls UFUNC_CHECK_STATUS. And UFUNC_CHECK_STATUS, with this patch, implements some elaborate logic to always check the status, and then clear it if it's not cleared, and then return it. And this works out because in practice we want clearfperr to do both of these things, and in the fast path where it matters getfperr always finds that the status is clear, so it ends up skipping the clearing.

Problem is you actually have to read and understand the macros, and the high-level interface, and the ufunc loop callers, before you can figure out whether this is right or not.

Alternative proposal:

  • Split the platform-specific bit into two operations, one that just gets the flags and converts them to numpy's representation, and one that just unconditionally clears the status.
  • Use the first to implement getfperr, and make clearfperr into if (PyUFUnc_getfperr()) { UFUNC_CLEAR_STATUS(); }
Member

njsmith commented Sep 14, 2013

On windows, we should use _statusfp (and/or _status87, _statusfp2, see the latest version of this code in master) to check the control flags before clearing them. Right now with this patch windows doesn't actually get the optimization. (And in fact the windows code is probably made pointlessly more complicated...)

Overall I think this patch pretty much does the right thing but it leaves a big pile of pointlessly tangled spaghetti.

What we actually want:

  • PyUFunc_getfperr() just gets the status, doesn't clear it
  • PyUFunc_clearfperr() clears the status (but first checks the status and skips clearing it if it's already cleared, since writing to the fp control word can be expensive)

If you trace through all the code then this is... pretty much what ends up happening. Right now clearfperr calls getfperr, and getfperr calls UFUNC_CHECK_STATUS. And UFUNC_CHECK_STATUS, with this patch, implements some elaborate logic to always check the status, and then clear it if it's not cleared, and then return it. And this works out because in practice we want clearfperr to do both of these things, and in the fast path where it matters getfperr always finds that the status is clear, so it ends up skipping the clearing.

Problem is you actually have to read and understand the macros, and the high-level interface, and the ufunc loop callers, before you can figure out whether this is right or not.

Alternative proposal:

  • Split the platform-specific bit into two operations, one that just gets the flags and converts them to numpy's representation, and one that just unconditionally clears the status.
  • Use the first to implement getfperr, and make clearfperr into if (PyUFUnc_getfperr()) { UFUNC_CLEAR_STATUS(); }
int retstatus;
UFUNC_CHECK_STATUS(retstatus);
if (retstatus) {
PyUFunc_clearfperr();

This comment has been minimized.

@arinkverma

arinkverma Sep 14, 2013

Contributor

Not clearing flags here, fails four test.

    raise LinAlgError("SVD did not converge")
LinAlgError: SVD did not converge
@arinkverma

arinkverma Sep 14, 2013

Contributor

Not clearing flags here, fails four test.

    raise LinAlgError("SVD did not converge")
LinAlgError: SVD did not converge

This comment has been minimized.

@njsmith

njsmith Sep 14, 2013

Member

Strange. Sounds like it might be another latent bug somewhere. Perhaps post
the PR so we can see the actual code and Travis output together?
On 14 Sep 2013 20:22, "Arink Verma" notifications@github.com wrote:

In numpy/core/src/umath/ufunc_object.c:

@@ -203,13 +193,23 @@

#undef HANDLEIT

+/UFUNC_API/
+NPY_NO_EXPORT int
+PyUFunc_getfperr(void)
+{

  • int retstatus;
  • UFUNC_CHECK_STATUS(retstatus);
  • if (retstatus) {
  •    PyUFunc_clearfperr();
    

Not clearing flags here, fails four test.

raise LinAlgError("SVD did not converge")

LinAlgError: SVD did not converge


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/3739/files#r6364598
.

@njsmith

njsmith Sep 14, 2013

Member

Strange. Sounds like it might be another latent bug somewhere. Perhaps post
the PR so we can see the actual code and Travis output together?
On 14 Sep 2013 20:22, "Arink Verma" notifications@github.com wrote:

In numpy/core/src/umath/ufunc_object.c:

@@ -203,13 +193,23 @@

#undef HANDLEIT

+/UFUNC_API/
+NPY_NO_EXPORT int
+PyUFunc_getfperr(void)
+{

  • int retstatus;
  • UFUNC_CHECK_STATUS(retstatus);
  • if (retstatus) {
  •    PyUFunc_clearfperr();
    

Not clearing flags here, fails four test.

raise LinAlgError("SVD did not converge")

LinAlgError: SVD did not converge


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/3739/files#r6364598
.

This comment has been minimized.

@arinkverma

arinkverma Sep 14, 2013

Contributor

For commit 0b3cd53f9ebdfe4f5084768f26594294d213029d , Travis output is at https://travis-ci.org/arinkverma/numpy/jobs/11364367

@arinkverma

arinkverma Sep 14, 2013

Contributor

For commit 0b3cd53f9ebdfe4f5084768f26594294d213029d , Travis output is at https://travis-ci.org/arinkverma/numpy/jobs/11364367

This comment has been minimized.

@njsmith

njsmith Sep 15, 2013

Member

That's pretty mysterious. The gufunc code does always do clearfperr -> run the loop and stuff -> getfperr. So I don't see how changing getfperr could make new errors suddenly appear. Needs more debugging. I can't easily see that commit's history, but if it includes your other error-object related changes then that could be related, since this error is being raised by an error object callback.

@njsmith

njsmith Sep 15, 2013

Member

That's pretty mysterious. The gufunc code does always do clearfperr -> run the loop and stuff -> getfperr. So I don't see how changing getfperr could make new errors suddenly appear. Needs more debugging. I can't easily see that commit's history, but if it includes your other error-object related changes then that could be related, since this error is being raised by an error object callback.

This comment has been minimized.

@arinkverma

arinkverma Sep 15, 2013

Contributor

Only change I did, adding condition at UFUNC_CHECK_STATUS as

#define UFUNC_CHECK_STATUS(ret) do { \
    int fpstatus = (int) fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | \
                                          FE_UNDERFLOW | FE_INVALID); \
    if (fpstatus == 0) { \
       ret = 0;\
    } else { \
         ret =    ((FE_DIVBYZERO  & fpstatus) ? UFUNC_FPE_DIVIDEBYZERO : 0) \
                | ((FE_OVERFLOW   & fpstatus) ? UFUNC_FPE_OVERFLOW : 0) \
                | ((FE_UNDERFLOW  & fpstatus) ? UFUNC_FPE_UNDERFLOW : 0) \
                | ((FE_INVALID    & fpstatus) ? UFUNC_FPE_INVALID : 0); \  
    }\
} while (0)

and separate out clear operations to UFUNC_CLEAR_STATUS

#define UFUNC_CLEAR_STATUS() { \
        (void) feclearexcept(FE_DIVBYZERO | FE_OVERFLOW | \
                            FE_UNDERFLOW | FE_INVALID); \       
}

For all platform likewise. It doesn't include other error-object changes which I did in different pr #3686.

@arinkverma

arinkverma Sep 15, 2013

Contributor

Only change I did, adding condition at UFUNC_CHECK_STATUS as

#define UFUNC_CHECK_STATUS(ret) do { \
    int fpstatus = (int) fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | \
                                          FE_UNDERFLOW | FE_INVALID); \
    if (fpstatus == 0) { \
       ret = 0;\
    } else { \
         ret =    ((FE_DIVBYZERO  & fpstatus) ? UFUNC_FPE_DIVIDEBYZERO : 0) \
                | ((FE_OVERFLOW   & fpstatus) ? UFUNC_FPE_OVERFLOW : 0) \
                | ((FE_UNDERFLOW  & fpstatus) ? UFUNC_FPE_UNDERFLOW : 0) \
                | ((FE_INVALID    & fpstatus) ? UFUNC_FPE_INVALID : 0); \  
    }\
} while (0)

and separate out clear operations to UFUNC_CLEAR_STATUS

#define UFUNC_CLEAR_STATUS() { \
        (void) feclearexcept(FE_DIVBYZERO | FE_OVERFLOW | \
                            FE_UNDERFLOW | FE_INVALID); \       
}

For all platform likewise. It doesn't include other error-object changes which I did in different pr #3686.

@arinkverma

This comment has been minimized.

Show comment
Hide comment
@arinkverma

arinkverma Sep 19, 2013

Contributor

@juliantaylor What do you think about this?

Contributor

arinkverma commented Sep 19, 2013

@juliantaylor What do you think about this?

@juliantaylor

This comment has been minimized.

Show comment
Hide comment
@juliantaylor

juliantaylor Sep 19, 2013

Contributor

still looks good to me, but it needs rebasing against master.
is the test issue resolved?

it seems to be the case, but is it guaranteed that getting the fpu flags is non destructive/does not clear the flags on all hardware?

Contributor

juliantaylor commented Sep 19, 2013

still looks good to me, but it needs rebasing against master.
is the test issue resolved?

it seems to be the case, but is it guaranteed that getting the fpu flags is non destructive/does not clear the flags on all hardware?

@njsmith

This comment has been minimized.

Show comment
Hide comment
@njsmith

njsmith Sep 19, 2013

Member

@juliantaylor: I don't think we actually need any guarantees that it's non-destructive? In fact right now the WIN64 checking macro does still clear the flags, but this is a bug.

AFAICT right now all our code will work fine even if some platform does only provide a get-and-clear operation, but there are no such platforms. So if we ever discover one of them we'll have to make some tweaks but no big deal.

Member

njsmith commented Sep 19, 2013

@juliantaylor: I don't think we actually need any guarantees that it's non-destructive? In fact right now the WIN64 checking macro does still clear the flags, but this is a bug.

AFAICT right now all our code will work fine even if some platform does only provide a get-and-clear operation, but there are no such platforms. So if we ever discover one of them we'll have to make some tweaks but no big deal.

@njsmith njsmith closed this Sep 19, 2013

@njsmith

This comment has been minimized.

Show comment
Hide comment
@njsmith

njsmith Sep 19, 2013

Member

(sorry, wrong button)

Member

njsmith commented Sep 19, 2013

(sorry, wrong button)

@njsmith njsmith reopened this Sep 19, 2013

@njsmith

View changes

Show outdated Hide outdated numpy/core/include/numpy/ufuncobject.h
#if defined(__BORLANDC__)
#define UFUNC_NOFPE _control87(MCW_EM, MCW_EM);
#endif
#define UFUNC_CHECK_STATUS(ret) { \
#if defined(_WIN64)
#define UFUNC_CHECK_STATUS(ret) do { \
int fpstatus = (int) _clearfp(); \

This comment has been minimized.

@njsmith

njsmith Sep 19, 2013

Member

This should do _statusfp()

@njsmith

njsmith Sep 19, 2013

Member

This should do _statusfp()

@njsmith

View changes

Show outdated Hide outdated numpy/core/include/numpy/ufuncobject.h
} while (0)
#define UFUNC_CLEAR_STATUS() { \
}

This comment has been minimized.

@njsmith

njsmith Sep 19, 2013

Member

And this should call _clearfp()

@njsmith

njsmith Sep 19, 2013

Member

And this should call _clearfp()

ENH: Speedup UFUNC_CHECK_STATUS by avoding heavy clear opition
UFUNC_CHECK_STATUS is just single macro which do both checking
clearing the error flags. It clear error flags every time after
checking. We should avoid clear operation if not needed, as it
is a bit expensive and take significant amount of time.
@juliantaylor

This comment has been minimized.

Show comment
Hide comment
@juliantaylor

juliantaylor Oct 18, 2013

Contributor

as I understand it the test failures are fixed, @arinkverma can you rebase it?
actually I don't think these macros need to be macros, converting them to normal inline functions would make them easier to read and edit.

Contributor

juliantaylor commented Oct 18, 2013

as I understand it the test failures are fixed, @arinkverma can you rebase it?
actually I don't think these macros need to be macros, converting them to normal inline functions would make them easier to read and edit.

@juliantaylor

This comment has been minimized.

Show comment
Hide comment
@juliantaylor

juliantaylor Oct 24, 2013

Contributor

closed, idea is now in #3974

Contributor

juliantaylor commented Oct 24, 2013

closed, idea is now in #3974

juliantaylor added a commit to juliantaylor/numpy that referenced this pull request Nov 6, 2013

ENH: avoid expensive clears in fenv functions
Clearing is 50-100 times more expensive than checking on x86, so check if there
is anything needs to be cleared first. This speeds up scalar operations
by 10%-20%.
Based on Arink Verma code in #3739.

Implement the functions as new C-API functions npy_get_floatstatus and
npy_clear_floatstatus in npy_math.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment