Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
ENH: Speedup UFUNC_CHECK_STATUS by avoding heavy clear opition #3739
The way numpy detect divide-by-zero, overflow, underflow, etc., is that before each ufunc loop it clear the FP error flags, and then after the ufunc loop we see if any have become set. And clear again. I have avoided clear if not needed to save time.
Performance result is post on blog. http://www.arinkverma.in/2013/09/speedup-ufunccheckstatus-by-avoiding.html
On windows, we should use
Overall I think this patch pretty much does the right thing but it leaves a big pile of pointlessly tangled spaghetti.
What we actually want:
If you trace through all the code then this is... pretty much what ends up happening. Right now clearfperr calls getfperr, and getfperr calls
Problem is you actually have to read and understand the macros, and the high-level interface, and the ufunc loop callers, before you can figure out whether this is right or not.
@juliantaylor: I don't think we actually need any guarantees that it's non-destructive? In fact right now the WIN64 checking macro does still clear the flags, but this is a bug.
AFAICT right now all our code will work fine even if some platform does only provide a get-and-clear operation, but there are no such platforms. So if we ever discover one of them we'll have to make some tweaks but no big deal.