Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy.gradient() doesn't compute boundary values correctly in version 1.9.0 #5184

Closed
drandreasdr opened this issue Oct 14, 2014 · 19 comments
Closed

Comments

@drandreasdr
Copy link

It seems that numpy.gradient() computes the boundary values incorrectly in version 1.9.0 (but correctly in e.g. version 1.8.1)

To reproduce:
Alternative 1:

>>> x = np.array([1, 2, 4], dtype=np.float)
>>> np.gradient(x)
array([ 0.5,  1.5,  2.5])

although the resulting array should be [(2-1)/1, (4-1)/2, (4-2)/1] = [1, 1.5, 2]

Alternative 2:
Just run the first example in the documentation for numpy.gradient():

>>> x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> np.gradient(x)

The result is

array([ 0.5,  1.5,  2.5,  3.5,  4.5,  5.5])

although the documentation says that it should be

array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])

the latter also seems right, while the former seems wrong.

(see also http://stackoverflow.com/questions/26361007/numpy-gradient-seems-to-produce-erroneous-boundary-values-using-first-differe)

@drandreasdr drandreasdr changed the title numpy.gradient() doesn't work as intended in version 1.9.0 numpy.gradient() doesn't compute boundary values correctly in version 1.9.0 Oct 14, 2014
@pseudocubic
Copy link

It looks like changes were introduced in 332d628 that were intended to make the gradient calculation at boundaries more accurate, but the documentation on the website only goes up to 1.8.1. You can see the updated documentation with print(np.gradient.__doc__)

@tacaswell
Copy link
Contributor

This is also the source of matplotlib/matplotlib#3598

@charris
Copy link
Member

charris commented Oct 14, 2014

The change makes the gradient second order at the boundaries. In the x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float) example above (a quadratic), the new results are correct. However, edges can be tricky. I think there was a proposal to add a keyword to choose between the old and new.

@charris
Copy link
Member

charris commented Oct 14, 2014

Because the new method breaks reproducibility with old code, we should probably make the improvement optional.

@charris
Copy link
Member

charris commented Oct 14, 2014

See the recent thread http://thread.gmane.org/gmane.comp.python.numeric.general/58971/focus=58975. Does someone want to make a PR for this? We need to decide on a keyword.

@jenshnielsen
Copy link

The real issue for matplotlib is the behaviour on masked arrays:

Ie the following taken from a matplotlib test and rounded to make easier to read:

import numpy as np
y, x = np.mgrid[-1.2:1.2:8j, -1.2:1.2:8j]
z = 10 * np.cos(x**2 + y**2)
zmask = np.ma.masked_greater(z, 9.9)
z1 = zmask.copy()
print('z before\n')
print(np.round(z1))
dx, dy = np.gradient(z1)
print('\nz after\n')
print(np.round(z1))
print('\ndx\n')
print(np.round(dx))
print('\ndy\n')
print(np.round(dy))

With numpy 1.8.2 this results in:

z before

[[-10.0 -6.0 -1.0 1.0 1.0 -1.0 -6.0 -10.0]
 [-6.0 1.0 5.0 7.0 7.0 5.0 1.0 -6.0]
 [-1.0 5.0 9.0 10.0 10.0 9.0 5.0 -1.0]
 [1.0 7.0 10.0 -- -- 10.0 7.0 1.0]
 [1.0 7.0 10.0 -- -- 10.0 7.0 1.0]
 [-1.0 5.0 9.0 10.0 10.0 9.0 5.0 -1.0]
 [-6.0 1.0 5.0 7.0 7.0 5.0 1.0 -6.0]
 [-10.0 -6.0 -1.0 1.0 1.0 -1.0 -6.0 -10.0]]

z after

[[-10.0 -6.0 -1.0 1.0 1.0 -1.0 -6.0 -10.0]
 [-6.0 1.0 5.0 7.0 7.0 5.0 1.0 -6.0]
 [-1.0 5.0 9.0 10.0 10.0 9.0 5.0 -1.0]
 [1.0 7.0 10.0 -- -- 10.0 7.0 1.0]
 [1.0 7.0 10.0 -- -- 10.0 7.0 1.0]
 [-1.0 5.0 9.0 10.0 10.0 9.0 5.0 -1.0]
 [-6.0 1.0 5.0 7.0 7.0 5.0 1.0 -6.0]
 [-10.0 -6.0 -1.0 1.0 1.0 -1.0 -6.0 -10.0]]

dx

[[4.0 7.0 7.0 6.0 6.0 7.0 7.0 4.0]
 [4.0 6.0 5.0 4.0 4.0 5.0 6.0 4.0]
 [3.0 3.0 2.0 -- -- 2.0 3.0 3.0]
 [1.0 1.0 0.0 -- -- 0.0 1.0 1.0]
 [-1.0 -1.0 -0.0 -- -- -0.0 -1.0 -1.0]
 [-3.0 -3.0 -2.0 -- -- -2.0 -3.0 -3.0]
 [-4.0 -6.0 -5.0 -4.0 -4.0 -5.0 -6.0 -4.0]
 [-4.0 -7.0 -7.0 -6.0 -6.0 -7.0 -7.0 -4.0]]

dy

[[4.0 4.0 3.0 1.0 -1.0 -3.0 -4.0 -4.0]
 [7.0 6.0 3.0 1.0 -1.0 -3.0 -6.0 -7.0]
 [7.0 5.0 -- -- -- -- -5.0 -7.0]
 [6.0 4.0 -- -- -- -- -4.0 -6.0]
 [6.0 4.0 -- -- -- -- -4.0 -6.0]
 [7.0 5.0 -- -- -- -- -5.0 -7.0]
 [7.0 6.0 3.0 1.0 -1.0 -3.0 -6.0 -7.0]
 [4.0 4.0 3.0 1.0 -1.0 -3.0 -4.0 -4.0]]

Where as with numpy 1.9 this results in.

z before

[[-10.0 -6.0 -1.0 1.0 1.0 -1.0 -6.0 -10.0]
 [-6.0 1.0 5.0 7.0 7.0 5.0 1.0 -6.0]
 [-1.0 5.0 9.0 10.0 10.0 9.0 5.0 -1.0]
 [1.0 7.0 10.0 -- -- 10.0 7.0 1.0]
 [1.0 7.0 10.0 -- -- 10.0 7.0 1.0]
 [-1.0 5.0 9.0 10.0 10.0 9.0 5.0 -1.0]
 [-6.0 1.0 5.0 7.0 7.0 5.0 1.0 -6.0]
 [-10.0 -6.0 -1.0 1.0 1.0 -1.0 -6.0 -10.0]]

z after

[[-- -6.0 -- -- -- -- -6.0 --]
 [-6.0 1.0 5.0 7.0 7.0 5.0 1.0 -6.0]
 [-- 5.0 -- -- -- -- 5.0 --]
 [-- 7.0 -- -- -- -- 7.0 --]
 [-- 7.0 -- -- -- -- 7.0 --]
 [-- 5.0 -- -- -- -- 5.0 --]
 [-6.0 1.0 5.0 7.0 7.0 5.0 1.0 -6.0]
 [-- -6.0 -- -- -- -- -6.0 --]]

dx

[[4.0 8.0 9.0 -- -- 9.0 8.0 4.0]
 [4.0 6.0 5.0 4.0 4.0 5.0 6.0 4.0]
 [3.0 3.0 2.0 -- -- 2.0 3.0 3.0]
 [1.0 1.0 0.0 -- -- 0.0 1.0 1.0]
 [-1.0 -1.0 -0.0 -- -- -0.0 -1.0 -1.0]
 [-3.0 -3.0 -2.0 -- -- -2.0 -3.0 -3.0]
 [-4.0 -6.0 -5.0 -4.0 -4.0 -5.0 -6.0 -4.0]
 [-4.0 -8.0 -9.0 -- -- -9.0 -8.0 -4.0]]

dy

[[-- 4.0 -- -- -- -- -4.0 --]
 [8.0 6.0 3.0 1.0 -1.0 -3.0 -6.0 -8.0]
 [-- 5.0 -- -- -- -- -5.0 --]
 [-- 4.0 -- -- -- -- -4.0 --]
 [-- 4.0 -- -- -- -- -4.0 --]
 [-- 5.0 -- -- -- -- -5.0 --]
 [8.0 6.0 3.0 1.0 -1.0 -3.0 -6.0 -8.0]
 [-- 4.0 -- -- -- -- -4.0 --]]

I.e many more masked values in the gradients and a modification of the original array.

Perhaps we should never have used gradient on masked arrays?

@pseudocubic
Copy link

I created a pull request (#5186) for this issue, which seems like a simple fix. I used second_order as the keyword, but there is perhaps some ambiguity there?

@charris
Copy link
Member

charris commented Oct 14, 2014

I'm thinking edge_order, but open to discussion.

@pseudocubic
Copy link

edge_order taking an int would be appropriate if we want to leave the possibility for higher-order methods later. The default could be 1 to maintain compatibility.

@matthew-brett
Copy link
Contributor

Raising an error for e.g. edge_order = 3?

@pseudocubic
Copy link

Since the second argument is *varargs, in order to maintain compatibility with old code I can't add a keyword argument before *varargs since it would be positional, but in order to make the code compatible with python 2.6 & 2.7, I can't add a positional keyword arg after *varargs (results in a SyntaxError).

Would it be appropriate to use **kwargs and then define the default in the function, e.g.

if 'edge_order' not in kwargs:
    kwargs['edge_order'] = 1

I just want to make sure I use an accepted convention for numpy.

@njsmith
Copy link
Member

njsmith commented Oct 14, 2014

gradient does not match any other conventions, so we're free to do whatever
makes sense :-)

**kwargs is a reasonable way to implement a py2-compatible kwarg-only
argument. To be thorough I'd suggest something like:

kwarg_only = set(["edge_order"])
for kwarg in kwargs:
if kwarg not in kwarg_only:
raise TypeError("%r is not a valid keyword argument" % (kwarg,))
edge_order = kwargs.get("edge_only", )

AFAICT that's equivalent to a py3 gradient(x, varargs, *,
edge_order=)

On Tue, Oct 14, 2014 at 9:36 PM, David M Fobes notifications@github.com
wrote:

Since the second argument is *varargs, in order to maintain compatibility
with old code I can't add a keyword argument before *varargs since it
would be positional, but in order to make the code compatible with python
2.6 & 2.7, I can't add a positional keyword arg after *varargs.

Would it be appropriate to use **kwargs and then define the default in
the function, e.g.

if 'edge_order' not in kwargs:
kwargs['edge_order'] = 1

I just want to make sure I use an accepted convention for numpy.


Reply to this email directly or view it on GitHub
#5184 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@charris
Copy link
Member

charris commented Oct 14, 2014

Argghhh! What @njsmith said. I don't think we would accept gradient as is these days, but at this point it is an heirloom.

@seberg
Copy link
Member

seberg commented Oct 15, 2014

I think I saw something like this as an idiom, as well. But I don't have any opinion on what is best.

edge_only = kwargs.pop('egde_only', default)
if kwargs:
    raise TypeError('"{}" is not a valid keyword argument.'.format(kwargs.keys()[0]))

@njsmith
Copy link
Member

njsmith commented Oct 15, 2014

Oh, that's pretty good :-)
On 15 Oct 2014 10:38, "seberg" notifications@github.com wrote:

I think I saw something like this as an idiom, as well. But I don't have
any opinion on what is best.

edge_only = kwargs.pop('egde_only', default)
if len(kwargs) > 0:
raise TypeError('"{}" is not a valid keyword argument.'.format(kwargs.keys()[0]))


Reply to this email directly or view it on GitHub
#5184 (comment).

@tacaswell
Copy link
Contributor

Wouldn't be better to show all of the extra keys, not just the first one?

@seberg
Copy link
Member

seberg commented Oct 15, 2014

@tacaswell, yeah it kind of would. But python never does it and inventing a different format just for that seems not worth it.

charris added a commit that referenced this issue Oct 16, 2014
…word

BUG: Fixes #5184 gradient calculation behavior at boundaries
@njsmith
Copy link
Member

njsmith commented Oct 16, 2014

@jenshnielsen: Could you check whether the recently merged change solves your masked array problems? And if -- as I suspect -- you still see buggy effects with edge_order=2 (esp. any in-place modification of the input array!) then could you file another bug for that?

@ everyone who ran into problems from this: do you have an opinion on what the best long-term default here is?

njsmith added a commit to njsmith/numpy that referenced this issue Oct 18, 2014
Previously, operations which created a new masked array from an old
masked array -- e.g., np.empty_like -- would tend to result in the new
and old arrays sharing the same .mask attribute. This leads to
horrible brokenness in which writes to one array affect the other. In
particular this was responsible for part of the brokenness that
@jenshnielsen reported in numpygh-5184 in which np.gradient on masked
arrays would modify the original array's mask.

This fixes the worst part of the issues addressed in numpygh-3404, though
there's still an argument that we ought to deprecate the mask-copying
behaviour entirely so that empty_like returns an array with an empty
mask. That can wait until we find someone who cares though.

I also applied a small speedup to np.gradient (avoiding one copy);
previously this inefficiency was masking (heh) some of the problems
with masked arrays, so removing it is both an optimization and makes
it easier to test that things are working now.
@njsmith
Copy link
Member

njsmith commented Oct 18, 2014

@jenshnielsen: I looked into this a bit more and just submitted #5203 which should fix the issue with the input array getting modified in place. (Also can I just take this opportunity to whinge that everything about ndarray subclassing is so incoherent that the best one can hope for is to rearrange the kluges, and renew my plea to stop using np.ma? I know you probably won't, it just makes me feel better to say.)

njsmith added a commit to njsmith/numpy that referenced this issue Oct 21, 2014
Previously, operations which created a new masked array from an old
masked array -- e.g., np.empty_like -- would tend to result in the new
and old arrays sharing the same .mask attribute. This leads to
horrible brokenness in which writes to one array affect the other. In
particular this was responsible for part of the brokenness that
@jenshnielsen reported in numpygh-5184 in which np.gradient on masked
arrays would modify the original array's mask.

This fixes the worst part of the issues addressed in numpygh-3404, though
there's still an argument that we ought to deprecate the mask-copying
behaviour entirely so that empty_like returns an array with an empty
mask. That can wait until we find someone who cares though.

I also applied a small speedup to np.gradient (avoiding one copy);
previously this inefficiency was masking (heh) some of the problems
with masked arrays, so removing it is both an optimization and makes
it easier to test that things are working now.
juliantaylor pushed a commit to juliantaylor/numpy that referenced this issue Oct 26, 2014
* Previous expected behavior was that the gradient is computed using central
  differences in the interior and first differences at the boundaries.

* gradient was updated in v1.9.0 so that second-order accurate calculations are
  done at the boundaries, but this breaks expected behavior with old code, so
  `edge_order` keyword (Default: 1) is added to specify whether first or second
  order calculations at the boundaries should be used.

* Since the second argument is *varargs, in order to maintain compatibility
  with old code and compatibility with python 2.6 & 2.7, **kwargs is used, and
  all kwargs that are not `edge_order` raise an error, listing the offending
  kwargs.

* Tests and documentation updated to reflect this change.

* Added `.. versionadded:: 1.9.1` to the new optional kwarg `edge_order`
  documentation, and specified supported `edge_order` kwarg values.

* Clarified documentation for `varargs`.

* Made indentation in docstring consistent with other docstring styles.

* Examples corrected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants