New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid conversion to NumPy Scalar #2941
Conversation
After profiling I noticed that a bottleneck for NumPy scalar operations occurs when trying to extract the underlying C value from a Python float because it first converts the Python scalar into its matching NumPy scalar (e.g. PyFloat -> float64) and then it extracts the C value from the NumPy scalar. For some types, it is a lot faster to just extract the value directly from the Python scalar. I only did for PyFloat in this modified code but the code is laid out such that it can be easily extended to other types such as Integers. I did not do them because I was unsure if there was a special scenario to handle across OS and/or between 32 and 64 bit platforms. The ratio of speed to do different operations are listed below (Old time / New time with modifications). In other words, the bigger the number, the bigger the speed up we get. Tested in Python 2.6 Windows RATIO TEST 1.1 Array * Array 1.1 PyFloat * Array 1.1 Float64 * Array 1.0 PyFloat + Array 1.3 Float64 + Array 1.1 PyFloat * PyFloat 1.0 Float64 * Float64 4.0 PyFloat * Float64 2.9 PyFloat * vector1[1] 3.9 PyFloat + Float64 9.8 PyFloat < Float64 9.9 PyFloat < Float64 1.0 Create array from list 1.0 Assign PyFloat to all 1.0 Assign Float64 to all 4.2 Float64 * pyFloat * pyFloat * pyFloat * pyFloat 1.0 pyFloat * pyFloat * pyFloat * pyFloat * pyFloat 1.0 Float64 * Float64 * Float64 * Float64 * Float64 1.0 Float64 ** 2 1.0 pyFloat ** 2
if (@PYCHECKEXACT@(a)){ | ||
*arg1 = @PYEXTRACTCTYPE@(a); | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation here is funny.
Also, it'd be nice if we could avoid duplicating the whole function body just to add this one check -- anyone more familiar with the @@ templating stuff have an opinion on whether that's doable? I guess we could #define NO_CHECK_EXACT(a) 0
and then use it as the PYCHECKEXACT
/PYEXTRACTCTYPE
functions for the types which don't have python scalars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your suggestion to unify the code and I got warnings and errors on the type,
numpy\core\src\scalarmathmodule.c.src(693) : error C2440: '=' : cannot convert from 'int' to 'npy_cfloat'
for some of the lines,
*arg1 = NO_CHECK_EXACT(a);
for now all I did was commit the indentation fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to handle complex explicitly I think. Maybe adding an @iscomplex@
and then doing some preprocessor work to write to *arg1->real
and *arg1->imag
if you have a complex type.
Regarding the (now hidden) conversation about unifying the function templates: I guess the klugey solution would be to use Or, I think there are other places where we have special case code like this, and we use a construct like
and then below do
i.e., use the preprocessor to eliminate the extra checks, so the C compiler never even sees the dead code and thus can't get confused by it. I'd be interested in other people's opinions about which style is best here. |
@njsmith I think that usage is perfectly acceptable as long as it doesn't get out of hand. I broke up some of the templates because they made my head hurt. @raulcota I like the idea, but for now I'd just define the one function without the template, remove the extraneous code, and return -1 if the number doesn't check as exact, because that will be a flaw in numpy somewhere. If it later looks worthwhile to generalize things to try to detect the integer case for python < 3, we can look for more complicated solutions at that time. More complicated because we will need to figure out which integer that is ;) But for now I don't see much point in being overly general. There should also be a test for this case. There may already be but it isn't guaranteed. |
Concerns and status:
I am bias of course :) , my personal opinion is that the speed up is very meaningful and it is worth adding as is but I understand the concerns. My counter argument is that at least the code is close to each other and a comment could explain the reasoning and a reminder to keep them consisten. |
I'm not too worried about code duplication. It would be nice to avoid it, but sometimes clarity beats concision. I'd also not worry about integer types yet, they are going to be more complicated with platform and Python version dependency. |
Let me know if there is anything else you think I should do. |
Looks fine to me. |
Avoid conversion to NumPy Scalar
Let's get this in. Cleanups/extensions can come later. |
After profiling I noticed that a bottleneck for NumPy scalar operations
occurs when trying to extract the underlying C value from a Python float
because it first converts the Python scalar into its matching NumPy
scalar (e.g. PyFloat -> float64) and then it extracts the C value from
the NumPy scalar.
For some types, it is a lot faster to just extract the value directly
from the Python scalar.
I only did for PyFloat in this modified code but the code is laid out
such that it can be easily extended to other types such as Integers. I
did not do them because I was unsure if there was a special scenario to
handle across OS and/or between 32 and 64 bit platforms. The ratio of
speed to do different operations are listed below (Old time / New time
with modifications). In other words, the bigger the number, the bigger
the speed up we get.
Tested in Python 2.6 Windows
RATIO TEST
1.1 Array * Array
1.1 PyFloat * Array
1.1 Float64 * Array
1.0 PyFloat + Array
1.3 Float64 + Array
1.1 PyFloat * PyFloat
1.0 Float64 * Float64
4.0 PyFloat * Float64
2.9 PyFloat * vector1[1]
3.9 PyFloat + Float64
9.8 PyFloat < Float64
9.9 PyFloat < Float64
1.0 Create array from list
1.0 Assign PyFloat to all
1.0 Assign Float64 to all
4.2 Float64 * pyFloat * pyFloat * pyFloat * pyFloat
1.0 pyFloat * pyFloat * pyFloat * pyFloat * pyFloat
1.0 Float64 * Float64 * Float64 * Float64 * Float64
1.0 Float64 ** 2
1.0 pyFloat ** 2