Avoid conversion to NumPy Scalar #2941

raulcota · 2013-01-22T18:38:08Z

After profiling I noticed that a bottleneck for NumPy scalar operations
occurs when trying to extract the underlying C value from a Python float
because it first converts the Python scalar into its matching NumPy
scalar (e.g. PyFloat -> float64) and then it extracts the C value from
the NumPy scalar.

For some types, it is a lot faster to just extract the value directly
from the Python scalar.

I only did for PyFloat in this modified code but the code is laid out
such that it can be easily extended to other types such as Integers. I
did not do them because I was unsure if there was a special scenario to
handle across OS and/or between 32 and 64 bit platforms. The ratio of
speed to do different operations are listed below (Old time / New time
with modifications). In other words, the bigger the number, the bigger
the speed up we get.

Tested in Python 2.6 Windows

RATIO TEST
1.1 Array * Array
1.1 PyFloat * Array
1.1 Float64 * Array
1.0 PyFloat + Array
1.3 Float64 + Array
1.1 PyFloat * PyFloat
1.0 Float64 * Float64
4.0 PyFloat * Float64
2.9 PyFloat * vector1[1]
3.9 PyFloat + Float64
9.8 PyFloat < Float64
9.9 PyFloat < Float64
1.0 Create array from list
1.0 Assign PyFloat to all
1.0 Assign Float64 to all
4.2 Float64 * pyFloat * pyFloat * pyFloat * pyFloat
1.0 pyFloat * pyFloat * pyFloat * pyFloat * pyFloat
1.0 Float64 * Float64 * Float64 * Float64 * Float64
1.0 Float64 ** 2
1.0 pyFloat ** 2

After profiling I noticed that a bottleneck for NumPy scalar operations occurs when trying to extract the underlying C value from a Python float because it first converts the Python scalar into its matching NumPy scalar (e.g. PyFloat -> float64) and then it extracts the C value from the NumPy scalar. For some types, it is a lot faster to just extract the value directly from the Python scalar. I only did for PyFloat in this modified code but the code is laid out such that it can be easily extended to other types such as Integers. I did not do them because I was unsure if there was a special scenario to handle across OS and/or between 32 and 64 bit platforms. The ratio of speed to do different operations are listed below (Old time / New time with modifications). In other words, the bigger the number, the bigger the speed up we get. Tested in Python 2.6 Windows RATIO TEST 1.1 Array * Array 1.1 PyFloat * Array 1.1 Float64 * Array 1.0 PyFloat + Array 1.3 Float64 + Array 1.1 PyFloat * PyFloat 1.0 Float64 * Float64 4.0 PyFloat * Float64 2.9 PyFloat * vector1[1] 3.9 PyFloat + Float64 9.8 PyFloat < Float64 9.9 PyFloat < Float64 1.0 Create array from list 1.0 Assign PyFloat to all 1.0 Assign Float64 to all 4.2 Float64 * pyFloat * pyFloat * pyFloat * pyFloat 1.0 pyFloat * pyFloat * pyFloat * pyFloat * pyFloat 1.0 Float64 * Float64 * Float64 * Float64 * Float64 1.0 Float64 ** 2 1.0 pyFloat ** 2

njsmith · 2013-01-22T19:39:22Z

numpy/core/src/scalarmathmodule.c.src

+	if (@PYCHECKEXACT@(a)){
+		*arg1 = @PYEXTRACTCTYPE@(a);
+		return 0;
+	}


Indentation here is funny.

Also, it'd be nice if we could avoid duplicating the whole function body just to add this one check -- anyone more familiar with the @@ templating stuff have an opinion on whether that's doable? I guess we could #define NO_CHECK_EXACT(a) 0 and then use it as the PYCHECKEXACT/PYEXTRACTCTYPE functions for the types which don't have python scalars?

I tried your suggestion to unify the code and I got warnings and errors on the type,
numpy\core\src\scalarmathmodule.c.src(693) : error C2440: '=' : cannot convert from 'int' to 'npy_cfloat'

for some of the lines,
*arg1 = NO_CHECK_EXACT(a);

for now all I did was commit the indentation fixes

You will need to handle complex explicitly I think. Maybe adding an @iscomplex@ and then doing some preprocessor work to write to *arg1->real and *arg1->imag if you have a complex type.

njsmith · 2013-01-22T23:26:31Z

Regarding the (now hidden) conversation about unifying the function templates: I guess the klugey solution would be to use *arg1 = (@type@) @PYEXTRACTCTYPE@(a); to get rid of the warnings (with a comment explaining that this is only to avoid a warning in the case where this branch is never taken, and is a no-op otherwise!).

Or, I think there are other places where we have special case code like this, and we use a construct like

/* In the template repeat variable setup: */
#HAS_PY_SCALAR = 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0#
/* Or the shorthand that saves some typing: */
#HAS_PY_SCALAR = 0 * 10, 1, 0 * 5#

and then below do

#if HAS_PY_SCALAR
   ...<code here>...
#endif

i.e., use the preprocessor to eliminate the extra checks, so the C compiler never even sees the dead code and thus can't get confused by it.

I'd be interested in other people's opinions about which style is best here.

charris · 2013-03-02T02:23:09Z

@njsmith I think that usage is perfectly acceptable as long as it doesn't get out of hand. I broke up some of the templates because they made my head hurt. @raulcota I like the idea, but for now I'd just define the one function without the template, remove the extraneous code, and return -1 if the number doesn't check as exact, because that will be a flaw in numpy somewhere. If it later looks worthwhile to generalize things to try to detect the integer case for python < 3, we can look for more complicated solutions at that time. More complicated because we will need to figure out which integer that is ;) But for now I don't see much point in being overly general. There should also be a test for this case. There may already be but it isn't guaranteed.

charris · 2013-07-09T15:32:55Z

@raulcota @njsmith I'd like to get this in, what needs to be done to bring it to completion?

raulcota · 2013-07-09T21:29:32Z

Concerns and status:

Too much duplication of code,
I tried a few things but I couldn't figure out anything practical. I now forgot the details but the attempts that started to get close to working looked way too confusing.
Extend the speed up to other data types (e.g. integers),
I didn't even try this one because it gets messy with 64 bit, Python 3, etcetera.

I am bias of course :) , my personal opinion is that the speed up is very meaningful and it is worth adding as is but I understand the concerns. My counter argument is that at least the code is close to each other and a comment could explain the reasoning and a reminder to keep them consisten.

charris · 2013-07-10T01:44:30Z

I'm not too worried about code duplication. It would be nice to avoid it, but sometimes clarity beats concision. I'd also not worry about integer types yet, they are going to be more complicated with platform and Python version dependency.

raulcota · 2013-07-15T14:47:11Z

Let me know if there is anything else you think I should do.
The changes are so old now that I am bit paranoid they will not work anymore but I'd guess the tests would show.

njsmith · 2013-07-17T15:09:17Z

Looks fine to me.

Avoid conversion to NumPy Scalar

charris · 2013-07-29T17:35:46Z

Let's get this in. Cleanups/extensions can come later.

Fixup of PR #4921

njsmith reviewed Jan 22, 2013
View reviewed changes

Fix indentation

dde7ea5

raulcota mentioned this pull request Jun 13, 2013

ENH: Optimize array creation by avoiding errors #3410

Merged

arinkverma mentioned this pull request Jul 28, 2013

ENH: Avoiding conversion to NumPy Scalar. Implementation Copy-paste tag in conv_template.py #3556

Closed

charris added a commit that referenced this pull request Jul 29, 2013

Merge pull request #2941 from raulcota/avoid_create-kill_npscalars

32206dd

Avoid conversion to NumPy Scalar

charris merged commit 32206dd into numpy:master Jul 29, 2013

arinkverma mentioned this pull request Aug 2, 2013

ENH: Avoiding conversion of integer to NumPy Scalar. #3567

Merged

raulcota deleted the avoid_create-kill_npscalars branch August 5, 2013 17:00

charris added a commit that referenced this pull request Aug 23, 2014

Merge pull request #4982 from charris/gh-2941

2059cce

Fixup of PR #4921

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid conversion to NumPy Scalar #2941

Avoid conversion to NumPy Scalar #2941

raulcota commented Jan 22, 2013

njsmith Jan 22, 2013

raulcota Jan 22, 2013

seberg Jan 23, 2013

njsmith commented Jan 22, 2013

charris commented Mar 2, 2013

charris commented Jul 9, 2013

raulcota commented Jul 9, 2013

charris commented Jul 10, 2013

raulcota commented Jul 15, 2013

njsmith commented Jul 17, 2013

charris commented Jul 29, 2013

Avoid conversion to NumPy Scalar #2941

Avoid conversion to NumPy Scalar #2941

Conversation

raulcota commented Jan 22, 2013

njsmith Jan 22, 2013

Choose a reason for hiding this comment

raulcota Jan 22, 2013

Choose a reason for hiding this comment

seberg Jan 23, 2013

Choose a reason for hiding this comment

njsmith commented Jan 22, 2013

charris commented Mar 2, 2013

charris commented Jul 9, 2013

raulcota commented Jul 9, 2013

charris commented Jul 10, 2013

raulcota commented Jul 15, 2013

njsmith commented Jul 17, 2013

charris commented Jul 29, 2013