Numerical-stable sum (similar to math.fsum) (Trac #1855) #2448

numpy-gitbot · 2012-10-19T22:27:26Z

Original ticket http://projects.scipy.org/numpy/ticket/1855 on 2011-06-02 by trac user ling, assigned to unknown.

For some applications, having a numerical-stable sum is critical. What I mean can be well illustrated in the examples below:

math.fsum([1, 1e-10] * 1000), numpy.sum([1, 1e-10] * 1000)
(1000.0000001, 1000.0000001000171)
math.fsum([1, 1e100, 1, -1e100] * 10000), numpy.sum([1, 1e100, 1, -1e100] * 10000)
(20000.0, 0.0)

Here math.fsum is a numerical stable sum introduced in Python 2.6, which uses the Shewchuk algorithm (seems to be described in http://code.activestate.com/recipes/393090/).

Other complaints about the plain vanilla numpy.sum could be found in a recent thread: http://article.gmane.org/gmane.comp.python.numeric.general/42756.

I hope numpy.sum can also implement something similar (e.g., Shewchuk algorithm or Kahan summation algorithm or partial sum). We could add an optional argument "stable=True/False" to numpy.sum and let the user decides whether they want a more accuracy (but potentially slower) version.

numpy-gitbot · 2012-10-23T02:45:59Z

trac user ling wrote on 2011-06-02

This could be related to #1696. If the requested algorithm (or a better one) is implemented there, it could also be enabled by a "stable=True" parameter. I.e., users have the choice between the default (fast but may be inaccurate) and the stable one.

cdeil · 2013-01-11T09:52:34Z

I need a fast and accurate sum for large numpy arrays and did a quick test with simply using a higher precision accumulator via data.sum(dtype="float128"):
https://gist.github.com/4509330

Questions to the numpy experts:

Why doesn't the speed of the sum depend on the data or accumulator float precision?
Will data.sum(dtype="float128") work with 128 bit precision on all or most machines, or does it sometimes give 64 bit precision?

In any case: this requested feature for a numerically stable sum in numpy would be very useful to me.

njsmith · 2013-01-11T19:47:57Z

Your speed measure is items/time, so larger is better? That is rather
odd. I wouldn't necessarily expect that using a higher-precision
accumulator would be slow, since like you say, the problem's probably
bounded by the speed of reading out the original arrays -- but then I
can't explain why it's just as fast to process the 128-bit input as
the 32-bit input. I guess another explanation would be that it
actually is bounded by the fp operation time, and Intel has just gone
overboard and made 32/64/80-bit fp operations equally fast. (Possibly
they all use the 80-bit units internally and then downcast on output.)

As for your second question, I think the best answer is a hollow
laugh. dtype=float128 probably won't give 128 bit precision on any
machine you care about. What numpy actually exposes is:
np.longdouble <- this is whatever your C compiler calls "long double"
np.float<8 * sizeof(long double)> <- an alias for the above, named
is based on the memory size of a C "long double" on your machine

On x86-32, "long double" is an 80-bit fp number stored in 96 bits (so
numpy calls it "float96"); on x86-64, it's a 80-bit (!) fp number
stored in 128 bits (this is the float128 you're using). On other
architectures it does other fun things, like on PPC where it's usually
"double-double" arithmetic:
https://en.wikipedia.org/wiki/Long_double
https://en.wikipedia.org/wiki/Double-double_%28arithmetic%29#Double-double_arithmetic

Nowadays we actually could expose a real float128, at least on
systems where we have working modern compilers
(cough_notwindows_cough). But no-one's done the work to sort out the
dumb existing naming scheme and add this support.

There have several mailing list discussions about how messed up this
all is, usually started by Matthew Brett discovering yet another wacky
bug or confusing inconsistency in this code when running on Sparc or
something.

My suggestions for using extended precision:

Use np.longdouble, not np.float128 (since it's defined to something
useful more often)
Be aware that this is a dusty corner of numpy, so let us know if you
run into bugs (though it should work and some people do use it)
Be prepared for it to be unavailable on some systems (though it
should be some sort of extended precision whenever it exists)

njsmith · 2013-01-11T21:18:39Z

On the actual bug's request for a stable summation algorithm:

The thing that makes this less than perfectly obvious is that there actually is no np.sum implementation anywhere in numpy. Instead what we have is addition code (np.add), plus a general mechanism to turn any binary operation into a reduction operation (so the code for np.sum for floating point is also the code for np.prod, and the code for np.sum for integers, etc.). So we can't just go to the np.sum code directly and add Kahan summation or whatever.

That said, it'd be lovely to have an exact algorithm like Shewchuk's (as used in Python's fsum, which might be steal-able code), or a bounded-given-condition-number algorithm like Kahan summation, or both, in numpy. The way to do it would be to define a special-purpose "generalized ufunc" for each, for floating point only. Details of how to do this:
http://jayvius.github.com/numpy-dtypes/newgufunc.html
http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html
You'd want to generate one implementation for each version of floating point arithmetic numpy supports; if you look in loops.c.src (which would be a good place for the code to be added) then you can see how to write a kind of "code template" for the algorithm that then gets filled in with each type.

This wouldn't be too hard -- Kahan summation in particular is only about 10 lines of code, and hooking it up into numpy not much more. And then we could add a method= argument to np.sum and ndarray.sum that chose whether you wanted to pass on to add.reduce (using the default reduction algorithm, which doesn't know anything about addition in particular), or to one of the new special case generalized ufuncs.

The other thing we might want to consider is making add.reduce do pairwise summation by default:
https://en.wikipedia.org/wiki/Pairwise_summation
in principle this is just as fast as what we do now; not as good as Kahan or Shewchuk, but much better than the naive implementation. And the general principle could be implemented in the shared ufunc reduction code without giving it special knowledge of addition. We have a duty to our users to use good algorithms by default, since so very many people will not stop and think about these things.

cdeil · 2013-01-11T22:06:35Z

Yes, my speed measure is summed elements in GHz.
With any data and accumulator dtype I get 1 GHz on my 2.6 GHz Intel Core I7 Macbook.
I ran the same test on a Xeon Intel 2.6 GHz machine (see here) and there the speeds are different (0.2 to 1 GHz speed) for different data / accumulater dtypes.
I didn't try to understand what is going on. I think for my application (adding the per-bin or per-event fit statistic for 1e6 or 1e9 or more bins / events) the speed of the sum might not be relevant because computing the per-bin or per-event fit statistic is much slower, but I think precision is an issue, at least it was pointed out to me that the Sherpa Python fitting package is using Kahan summation and I should do the same.

Thanks for your detailed comments. I guess using a float128 dtype for the accumulator is not the answer then, but one of the stable adding methods you mentioned is. It would be awesome if this became available in numpy, unfortunately I don't have the expertise / time to contribute them myself.

kyleabeauchamp · 2013-07-18T18:08:15Z

So fsum is already available as np.math.fsum(). I wonder if we should just include some better documentation about using fsum when appropriate.

We could include a note about this function in the docstring (the See Also section?) of np.sum. I think that would be sufficient for many users.

njsmith · 2013-07-18T21:59:34Z

Please don't use np.math.fsum. That's just an accidental alias for
math.fsum.

On Thu, Jul 18, 2013 at 7:08 PM, kyleabeauchamp notifications@github.comwrote:

So fsum is already available as np.math.fsum(). I wonder if we should
just include some better documentation about using fsum when appropriate.

We could include a note about this function in the docstring (the See Also
section?) of np.sum. I think that would be sufficient for many users.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2448#issuecomment-21202863
.

kyleabeauchamp · 2013-07-18T22:02:50Z

Obviously. The point is that there's already a function to do this, and it's already imported.

njsmith · 2013-07-18T22:28:46Z

It already being imported is probably a bug.

A reference to math.fsum would make sense in the np.sum docs, but it's not
really a full solution: it doesn't support per-axis summation etc. like an
np.fsum would.
On 18 Jul 2013 23:06, "kyleabeauchamp" notifications@github.com wrote:

Obviously. The point is that there's already a function to do this, and
it's already imported.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2448#issuecomment-21218374
.

charris · 2014-02-20T18:53:23Z

This has been improved by #3685.

In [5]: math.fsum([1, 1e-10] * 1000), numpy.sum([1, 1e-10] * 1000)
Out[5]: (1000.0000001, 1000.0000001)

In [6]: math.fsum([1, 1e100, 1, -1e100] * 10000), numpy.sum([1, 1e100, 1, -1e100] * 10000)
Out[6]: (20000.0, 0.0)

The second sum is heading towards the pathological.

Note that the fsum function could be cythonized pretty easily and made part of a specialized package. Given the way x87 and SSE registers can get used on 32 bit systems it is also a bit compiler/cpu dependent, the l0 + hi trick can fail.

I'm going to close this as we probably have a good enough solution for most things.

nschloe · 2018-02-27T14:56:10Z

For everyone interested: I've just released accupy to deal with ill-conditioned sums and dot products. The sums are somewhat slower that numpy.sum, but more accurate too; see the README for details.

cdeil · 2018-02-27T16:01:21Z

@nschloe - Thank you! Nice project name.

nouiz mentioned this issue Jan 14, 2013

Make CPU reduction more precisse in float32 Theano/Theano#1092

Closed

charris closed this as completed Feb 20, 2014

argriffing mentioned this issue May 13, 2014

Numerical stability #4694

Closed

ihaque mentioned this issue Jul 1, 2014

[MRG+1] Add partial_fit to GaussianNB scikit-learn/scikit-learn#3324

Closed

Jorge-C mentioned this issue Oct 9, 2014

Add Hommola cospeciation test scikit-bio/scikit-bio#557

Closed

musically-ut mentioned this issue Mar 17, 2015

Numerically unstable mean calculation for Timedeltas. pandas-dev/pandas#9670

Closed

argriffing mentioned this issue Aug 21, 2015

deprecate np.math #6230

Open

rreusser mentioned this issue Apr 3, 2016

Kahan Summation scijs/ndarray-blas-level1#15

Open

DanHickstein mentioned this issue May 11, 2016

Assertion failure when using larger windows with more points pyNLO/PyNLO#30

Closed

nschloe mentioned this issue Mar 15, 2017

numpy.sum not stable enough sometimes (Kahan, math.fsum) #8786

Open

kreczko mentioned this issue Jan 21, 2022

ak.sum on flat 1-D numpy array: different results between axis=None and axis=-1 scikit-hep/awkward#1241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical-stable sum (similar to math.fsum) (Trac #1855) #2448

Numerical-stable sum (similar to math.fsum) (Trac #1855) #2448

numpy-gitbot commented Oct 19, 2012

numpy-gitbot commented Oct 23, 2012

cdeil commented Jan 11, 2013

njsmith commented Jan 11, 2013

njsmith commented Jan 11, 2013

cdeil commented Jan 11, 2013

kyleabeauchamp commented Jul 18, 2013

njsmith commented Jul 18, 2013

kyleabeauchamp commented Jul 18, 2013

njsmith commented Jul 18, 2013

charris commented Feb 20, 2014

nschloe commented Feb 27, 2018

cdeil commented Feb 27, 2018

Navigation Menu

Numerical-stable sum (similar to math.fsum) (Trac #1855) #2448

Numerical-stable sum (similar to math.fsum) (Trac #1855) #2448

Comments

numpy-gitbot commented Oct 19, 2012

numpy-gitbot commented Oct 23, 2012

cdeil commented Jan 11, 2013

njsmith commented Jan 11, 2013

njsmith commented Jan 11, 2013

cdeil commented Jan 11, 2013

kyleabeauchamp commented Jul 18, 2013

njsmith commented Jul 18, 2013

kyleabeauchamp commented Jul 18, 2013

njsmith commented Jul 18, 2013

charris commented Feb 20, 2014

nschloe commented Feb 27, 2018

cdeil commented Feb 27, 2018