Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy.spacing documentation inaccuracies #15331

Open
mdickinson opened this issue Jan 15, 2020 · 11 comments
Open

numpy.spacing documentation inaccuracies #15331

mdickinson opened this issue Jan 15, 2020 · 11 comments

Comments

@mdickinson
Copy link
Contributor

mdickinson commented Jan 15, 2020

The current numpy.spacing documentation seems inaccurate. At the top, it says:

Return the distance between x and the nearest adjacent number.

But this isn't right for powers of 2: for example, if x = 1.0, the nearest representable float to x is 1.0 - 2**-53, so the distance would be 2.**-53. But in this case, spacing gives 2.0**-52.

It's also not immediately clear from the description what the expected sign of the result is. From experimentation, it looks as though np.spacing(-x) is -np.spacing(x), except in the case of zeros, where the result is the same for both negative and positive zeros.

@mdickinson
Copy link
Contributor Author

mdickinson commented Jan 15, 2020

One more corner case: I'm not sure whether this counts as a documentation issue or an implementation issue (or both, or neither):

>>> np.spacing(np.finfo(np.float64).max)
inf

It could be argued that the right result here is the distance between that max value and the next float down. (That distance being 2.0**1023, for np.float64, assuming IEEE 754 binary64.)

@seberg
Copy link
Member

seberg commented Jan 15, 2020

So I guess it actually is the nearest "larger" floating point number (by absolute value)... I agree that "distance" should really be an absolute value, I have never used this function, so I am not sure if there is some use, in that a + np.spacing(a) gives something more useful then if it returned the actual distance.

I think the inf example follows from the other examples, and is just a bit more extreme case. As long as spacing is the larger distance between both neighbours (so to speak), the infseems consistent to me (plus it does give a warning for me).

The function is implemented as a bunch of bit-fiddling, could be easily adapted to return the absolute value, but for backward compatibility concerns.... The internal function could also be changed to return your value (it has a flag whether to return the larger or the smaller spacing and it is set to the larger spacing), but I would think the larger is just as well?

@mdickinson
Copy link
Contributor Author

On the sign, having the sign of the result match the sign of the input seems a perfectly reasonable choice; I think it's potentially valuable that a + np.spacing(a) gives the next-away-from-zero value from a. I'm not suggesting any behaviour change here - even without backwards compatibility concerns it doesn't seem a clear choice either way (and as you say, with backwards compatibility concerns, it is clear that this shouldn't be changed).

The -0.0 corner case is a bit surprising, in that it's the only case where the sign of the result doesn't match the sign of the input. OTOH, there's an argument for having -0.0 and 0.0 behave identically in almost all numeric contexts, with a few well-known exceptions. Both behaviours seem reasonable, but again it would be good to document.

For the np.finfo(np.float64).max case, I guess I find this surprising because I don't think of the output as being a difference so much as being the value of the least significant place in the given float. OTOH, the "spacing" name clearly indicates that this should be thought of as a difference.

But I agree that all the current choices seem reasonable, and that this is really just a documentation issue.

For context, I was looking at this mostly because CPython just implemented math.ulp, with a similar purpose but a slightly different set of choices: https://bugs.python.org/issue39310.

@eric-wieser
Copy link
Member

I think it's potentially valuable that a + np.spacing(a) gives the next-away-from-zero value from a.

Well, you can spell that as np.nextafter(a, np.inf), so I'm not certain that provides all that much value.

@mattip
Copy link
Member

mattip commented Jan 16, 2020

Are the new math.nextafter and math.ulp going to cause us a whole new set of edge-case incompatibilities with the numpy definitions?

@mdickinson
Copy link
Contributor Author

@mattip I hope not. math.nextafter has identical semantics to numpy.nextafter (not really surprisingly, since they're both thin wrappers around C's nextafter).

I'd hope that math.ulp and numpy.spacing are sufficiently differently named that people won't assume without checking that they do the same thing. But even if they do, the most common use-case is presumably positive finite floats, and there the two agree (with the exception of the largest finite positive float).

@eric-wieser I guess my point was that you can't spell that as np.nextafter(a, np.inf): for negative a you need np.nextafter(a, -np.inf) instead, so to cover both cases you'd want something like np.nextafter(a, np.copysign(np.inf, a)). As I commented on the CPython issue for nextafter, what I commonly seem to need is next_away_from_zero(a), and it's a minor nice-to-have to be able to spell that as simply as a + np.spacing(a).

@seberg
Copy link
Member

seberg commented Jan 16, 2020

Well, right now we still have a chance to change Python. E.g. if we think that our definition for the largest representable float is more reasonable (inf rather than the smaller spacing), python can still change it.
The name spacing is actually pretty nice, and I wonder if python thought of it ;), so something that is not ulp but can be guessed to mean ulp, OTOH, python choosing a different name is better for us ;).

I guess, we could at some point add np.ulp and once we do that discourage np.spacing, in either case, I will make a note on the bpo to link here, I think they should at least be aware...

@seberg
Copy link
Member

seberg commented Jan 16, 2020

Sorry, nvm. I see you already commented about the existence of np.spacing. So the only question would be if someone here disagrees with the choice there to say that ulp(float_max) != inf, which seems strange, but the special case also seems a bit strange to me.

@cournape
Copy link
Member

So just for context, IIRC, I initially implemented those functions to implement some test functions in np.testing, that was the main use case.

@mrmbernardi
Copy link

As of writing this comment the documentation is still wrong about the nearest adjacent number:

https://numpy.org/devdocs/reference/generated/numpy.spacing.html

@miccoli
Copy link
Contributor

miccoli commented Dec 2, 2023

I was recently bitten by this inaccuracy, falsely assuming that np.spacing would return a positive value, according to the common meaning of distance.

If nobody else is working on this, I can open a PR:

  • DOC
    • expliclty state that np.spacing returns an oriented distance pointing away from zero
    • define edge cases based on current behaviour
    • state differences with math.ulp
  • testing: check that what stated in the docs is true (if not already checked).

A maybe useful invariant for visualizing current behaviour could be

np.all(np.abs(np.spacing(a) + a) > np.abs(a))

which holds for all finite a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants