Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
ENH: Speedup UFUNC_CHECK_STATUS by avoding heavy clear opition #3739
The way numpy detect divide-by-zero, overflow, underflow, etc., is that before each ufunc loop it clear the FP error flags, and then after the ufunc loop we see if any have become set. And clear again. I have avoided clear if not needed to save time.
Performance result is post on blog. http://www.arinkverma.in/2013/09/speedup-ufunccheckstatus-by-avoiding.html
On windows, we should use
Overall I think this patch pretty much does the right thing but it leaves a big pile of pointlessly tangled spaghetti.
What we actually want:
If you trace through all the code then this is... pretty much what ends up happening. Right now clearfperr calls getfperr, and getfperr calls
Problem is you actually have to read and understand the macros, and the high-level interface, and the ufunc loop callers, before you can figure out whether this is right or not.
@juliantaylor: I don't think we actually need any guarantees that it's non-destructive? In fact right now the WIN64 checking macro does still clear the flags, but this is a bug.
AFAICT right now all our code will work fine even if some platform does only provide a get-and-clear operation, but there are no such platforms. So if we ever discover one of them we'll have to make some tweaks but no big deal.
UFUNC_CHECK_STATUS is just single macro which do both checking clearing the error flags. It clear error flags every time after checking. We should avoid clear operation if not needed, as it is a bit expensive and take significant amount of time.
Clearing is 50-100 times more expensive than checking on x86, so check if there is anything needs to be cleared first. This speeds up scalar operations by 10%-20%. Based on Arink Verma code in numpy#3739. Implement the functions as new C-API functions npy_get_floatstatus and npy_clear_floatstatus in npy_math.