Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid conversion to NumPy Scalar #2941

Merged
merged 2 commits into from Jul 29, 2013

Conversation

raulcota
Copy link
Contributor

After profiling I noticed that a bottleneck for NumPy scalar operations
occurs when trying to extract the underlying C value from a Python float
because it first converts the Python scalar into its matching NumPy
scalar (e.g. PyFloat -> float64) and then it extracts the C value from
the NumPy scalar.

For some types, it is a lot faster to just extract the value directly
from the Python scalar.

I only did for PyFloat in this modified code but the code is laid out
such that it can be easily extended to other types such as Integers. I
did not do them because I was unsure if there was a special scenario to
handle across OS and/or between 32 and 64 bit platforms. The ratio of
speed to do different operations are listed below (Old time / New time
with modifications). In other words, the bigger the number, the bigger
the speed up we get.

Tested in Python 2.6 Windows

RATIO TEST
1.1 Array * Array
1.1 PyFloat * Array
1.1 Float64 * Array
1.0 PyFloat + Array
1.3 Float64 + Array
1.1 PyFloat * PyFloat
1.0 Float64 * Float64
4.0 PyFloat * Float64
2.9 PyFloat * vector1[1]
3.9 PyFloat + Float64
9.8 PyFloat < Float64
9.9 PyFloat < Float64
1.0 Create array from list
1.0 Assign PyFloat to all
1.0 Assign Float64 to all
4.2 Float64 * pyFloat * pyFloat * pyFloat * pyFloat
1.0 pyFloat * pyFloat * pyFloat * pyFloat * pyFloat
1.0 Float64 * Float64 * Float64 * Float64 * Float64
1.0 Float64 ** 2
1.0 pyFloat ** 2

After profiling I noticed that a bottleneck for NumPy scalar operations
occurs when trying to extract the underlying C value from a Python float
because it first converts the Python scalar into its matching NumPy
scalar (e.g. PyFloat -> float64) and then it extracts the C value from
the NumPy scalar.

For some types, it is a lot faster to just extract the value directly
from the Python scalar.

I only did for PyFloat in this modified code but the code is laid out
such that it can be easily extended to other types such as Integers. I
did not do them because I was unsure if there was a special scenario to
handle across OS and/or between 32 and 64 bit platforms. The ratio of
speed to do different operations are listed below (Old time / New time
with modifications). In other words, the bigger the number, the bigger
the speed up we get.

Tested in Python 2.6 Windows

RATIO  TEST
1.1  Array * Array
1.1  PyFloat * Array
1.1  Float64 * Array
1.0  PyFloat + Array
1.3  Float64 + Array
1.1  PyFloat * PyFloat
1.0  Float64 * Float64
4.0  PyFloat * Float64
2.9  PyFloat * vector1[1]
3.9  PyFloat + Float64
9.8  PyFloat < Float64
9.9  PyFloat < Float64
1.0  Create array from list
1.0  Assign PyFloat to all
1.0  Assign Float64 to all
4.2  Float64 * pyFloat * pyFloat * pyFloat * pyFloat
1.0  pyFloat * pyFloat * pyFloat * pyFloat * pyFloat
1.0  Float64 * Float64 * Float64 * Float64 * Float64
1.0  Float64 ** 2
1.0  pyFloat ** 2
if (@PYCHECKEXACT@(a)){
*arg1 = @PYEXTRACTCTYPE@(a);
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation here is funny.

Also, it'd be nice if we could avoid duplicating the whole function body just to add this one check -- anyone more familiar with the @@ templating stuff have an opinion on whether that's doable? I guess we could #define NO_CHECK_EXACT(a) 0 and then use it as the PYCHECKEXACT/PYEXTRACTCTYPE functions for the types which don't have python scalars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your suggestion to unify the code and I got warnings and errors on the type,
numpy\core\src\scalarmathmodule.c.src(693) : error C2440: '=' : cannot convert from 'int' to 'npy_cfloat'

for some of the lines,
*arg1 = NO_CHECK_EXACT(a);

for now all I did was commit the indentation fixes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to handle complex explicitly I think. Maybe adding an @iscomplex@ and then doing some preprocessor work to write to *arg1->real and *arg1->imag if you have a complex type.

@njsmith
Copy link
Member

njsmith commented Jan 22, 2013

Regarding the (now hidden) conversation about unifying the function templates: I guess the klugey solution would be to use *arg1 = (@type@) @PYEXTRACTCTYPE@(a); to get rid of the warnings (with a comment explaining that this is only to avoid a warning in the case where this branch is never taken, and is a no-op otherwise!).

Or, I think there are other places where we have special case code like this, and we use a construct like

/* In the template repeat variable setup: */
#HAS_PY_SCALAR = 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0#
/* Or the shorthand that saves some typing: */
#HAS_PY_SCALAR = 0 * 10, 1, 0 * 5#

and then below do

#if HAS_PY_SCALAR
   ...<code here>...
#endif

i.e., use the preprocessor to eliminate the extra checks, so the C compiler never even sees the dead code and thus can't get confused by it.

I'd be interested in other people's opinions about which style is best here.

@charris
Copy link
Member

charris commented Mar 2, 2013

@njsmith I think that usage is perfectly acceptable as long as it doesn't get out of hand. I broke up some of the templates because they made my head hurt. @raulcota I like the idea, but for now I'd just define the one function without the template, remove the extraneous code, and return -1 if the number doesn't check as exact, because that will be a flaw in numpy somewhere. If it later looks worthwhile to generalize things to try to detect the integer case for python < 3, we can look for more complicated solutions at that time. More complicated because we will need to figure out which integer that is ;) But for now I don't see much point in being overly general. There should also be a test for this case. There may already be but it isn't guaranteed.

@charris
Copy link
Member

charris commented Jul 9, 2013

@raulcota @njsmith I'd like to get this in, what needs to be done to bring it to completion?

@raulcota
Copy link
Contributor Author

raulcota commented Jul 9, 2013

Concerns and status:

  1. Too much duplication of code,
    I tried a few things but I couldn't figure out anything practical. I now forgot the details but the attempts that started to get close to working looked way too confusing.

  2. Extend the speed up to other data types (e.g. integers),
    I didn't even try this one because it gets messy with 64 bit, Python 3, etcetera.

I am bias of course :) , my personal opinion is that the speed up is very meaningful and it is worth adding as is but I understand the concerns. My counter argument is that at least the code is close to each other and a comment could explain the reasoning and a reminder to keep them consisten.

@charris
Copy link
Member

charris commented Jul 10, 2013

I'm not too worried about code duplication. It would be nice to avoid it, but sometimes clarity beats concision. I'd also not worry about integer types yet, they are going to be more complicated with platform and Python version dependency.

@raulcota
Copy link
Contributor Author

Let me know if there is anything else you think I should do.
The changes are so old now that I am bit paranoid they will not work anymore but I'd guess the tests would show.

@njsmith
Copy link
Member

njsmith commented Jul 17, 2013

Looks fine to me.

@charris
Copy link
Member

charris commented Jul 29, 2013

Let's get this in. Cleanups/extensions can come later.

@raulcota raulcota deleted the avoid_create-kill_npscalars branch August 5, 2013 17:00
charris added a commit that referenced this pull request Aug 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants