Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IEEE floating point signalling bit can be inverted on MIPS (Python assumes its not) #104263

Closed
seberg opened this issue May 7, 2023 · 13 comments
Closed
Assignees
Labels
type-bug An unexpected behavior, bug, or error

Comments

@seberg
Copy link
Contributor

seberg commented May 7, 2023

Probably better links for this, but on MIPS, the floating point signalling bit seems typically inverted: https://sourceware.org/binutils/docs/as/MIPS-NaN-Encodings.html

However, Python's _Py_dg_stdnan should create quiet NaNs, but hard-codes the bit to 0.

This again continues to do weird things if mixed with NumPy apparently visible in some fmin downstream test failures (to be clear, I don't think the signalling bit should matter for that, but it is incorrect anyway; the bit should never be set in our part of the world). (See numpy/numpy#23158)

To me, all of that is a bit moot though. Python has this manual definition for NaN and Inf, but also defines Py_NAN and Py_HUGE_VAL. In theory, Py_HUGE_VAL doesn't need to be inf and Py_NAN I am not actually sure what it could be if NaN is not available...

But... If you have IEEE doubles and setting the bit-pattern manually is reasonable. You certainly can assume that both are defined to the right thing and Py_NAN is in fact a NaN value. Doing that simplifies the code, but should also solve the MIPS issue since Py_NAN presumably is correctly defined to a quiet NaN.

This is done in gh-104202. I suppose a test might be:

if hasattr(math, "nan"):
    bits_nan1 = struct.pack("d", math.inf - math.inf)
    bits_nan2 = struct.pack("d", math.nan)
    assert bits_nan1 == bits_nan2
@seberg seberg added the type-bug An unexpected behavior, bug, or error label May 7, 2023
@mdickinson
Copy link
Member

mdickinson commented May 7, 2023

Repeating and clarifying some comments from the PR:

Replacing _Py_dg_stdnan with a use of Py_NaN would change the behaviour on some platforms (including Intel 64). We currently have:

>>> math.copysign(1.0, float('nan'))
1.0

But with the proposed change we'd get instead (at least on my machine):

>>> math.copysign(1.0, float('nan'))
-1.0

(And yes, people shouldn't care about signs on NaNs, but I guarantee that there would be bug reports if this changed.)

Could we look for solutions that only affect MIPS? E.g., a preprocessor define (we could either have something indirect that tells us that the meaning of bit 51 is inverted from its usual meaning, or we could have a #define that specifies the preferred quiet nan bit pattern directly).

FWIW, CPython is now requiring that double uses IEEE 754 format, and IEEE 754 does allow for this inversion - it recommends that the relevant bit be 0 for signaling and 1 for quiet, but it uses the language "should" rather than "shall": "A quiet NaN bit string should be encoded with the first bit (d1) of the trailing significand field T being 1."

@seberg
Copy link
Contributor Author

seberg commented May 7, 2023

How come Py_NaN is defined with negative sign though? Isn't that wrong of Py_NaN?!

My assumption here is that Py_NAN should be a "positive" quiet NaN on all platforms and if it is not, that is a bug. Manually creating a new nan via the bit-pattern seems like an anti-pattern to me that just makes the code harder to understand.

I am very surprised by the above sign issue and I would suggest rather adding a test for copysign and making sure the PR passes the test. Yes, float('nan') must have the correct sign, whether sane or not but using the hard-coded bit-pattern to create a nan when you also have Py_NAN defined and used seems like nothing but clutter.

If CPython requires IEEE 754 float format now, then the PR is just a step in the right direction of removing unnecessarily confusing code. Or am I missing something?

@mdickinson
Copy link
Member

How come Py_NaN is defined with negative sign though?

Because on Intel 64 machines, that's what the default NaN is. It's exactly this sort of machine dependence I'd like to get away from, which is why I'd prefer to move away from Py_NaN and towards something more controllable. We could force the sign bit by adding an fabs call to the Py_NaN macro, but that still doesn't give us control of the payload.

Isn't that wrong of Py_NaN?!

Why would it be?

@mdickinson
Copy link
Member

mdickinson commented May 7, 2023

N.B. The right way to do this in new code, now that we're able to rely both on IEEE 754 double and C99, is probably to use the C99 macros NAN and INFINITY (C99 §7.12p4, §7.12p5). But we do need to be careful not to introduce gratuitous changes in the current behaviour.

@seberg
Copy link
Contributor Author

seberg commented May 7, 2023

But we do need to be careful not to introduce gratuitous changes in the current behaviour.

True, but I would argue it is better to do that via testing the desired behavior if we must:

  • Test your copysign for math.nan and float("nan") and float("-nan")
  • Test payload of them if you insist via struct.pack
  • Test that inf - inf gives the same also (not sure about sign bit there)

My poin is that more than half of even caring to touch this code for me was that it seems like you can remove ~200 lines of code and a layer of complexity that seems to serves no purpose anymore but to be subtly buggy. Adding a "quiet NaN bit has reverse meaning" flag would add another layer of confusion instead.

I don't really see the unexpected sign bit on my computer or https://godbolt.org/z/Eqv78Gz8Y, although I do believe you that it happens on some platforms even without Py_NAN being manually defined to the wrong thing.

What I do wonder if you really want the Py_NAN returned in many math functions to be different then the math.nan value in this case, though. Arguably, even if float("nan") was wrong with the changes, maybe all of those math functions are also giving the undesired result?

So if you say the right way is to use NAN and INFINITY. How about we just modify that Py_NAN definition to (double)NAN, and add Py_INFINITY or change Py_HUGE_VAL to that (but really INFINITY is a better name when it is an infinity)?!

@seberg
Copy link
Contributor Author

seberg commented May 7, 2023

Btw. I didn't care about backporting. If we want to backport this, then I agree the fix should probably just be some weird #if in the code that builds that NaN used by float("nan").

@mdickinson
Copy link
Member

mdickinson commented May 7, 2023

It's definitely true that the current code isn't as clean as it could be, mostly for historical and lack-of-cycles reasons (dating from days when we couldn't even assume that NaNs and infinities existed at all on the target platform); I agree it would be good to clean it up.

People do notice signs of NaNs and complain when they change; I do think we need to avoid changing those. I could be persuaded that no-one notices or cares about payloads, given that to the best of my knowledge modern systems don't bother setting interesting payloads anyway. (Apple's SANE did make use of payloads, but that was some time and some number of architectures ago ...).

Taking things out of order a bit:

I don't really see the unexpected sign bit on my computer or https://godbolt.org/z/Eqv78Gz8Y, although I do believe you that it happens on some platforms even without Py_NAN being manually defined to the wrong thing.

Huh, that's interesting. Looks like builtin_nan and NAN give a "positive" NaN here, too. I was assuming that the result would be the same as that of inf / inf (or inf - inf), which do give a "negative" NaN on my machine, but it looks like I'm wrong.

Adding a "quiet NaN bit has reverse meaning" flag would add another layer of confusion instead.

Fair enough.

  • Test your copysign for math.nan and float("nan") and float("-nan")

Yes please.

  • Test payload of them if you insist via struct.pack

Let's not bother. Unlike signs, I don't have any evidence that anyone cares about payloads in real life. And if we're using struct.pack to get actual bitstrings, we'd potentially have to deal with the issue of MIPS being different again, and end up with tests that fail in obscure ways on obscure platforms. (Slightly OT: do you happen to know what the payload is for the "standard" quiet NaN on MIPS, if there is such a thing? It can't be zero, because that would result in an infinity.)

  • Test that inf - inf gives the same also (not sure about sign bit there)

Again I wouldn't bother, mostly because we're at the mercy of the CPU/FPU here. We either accept that we'll get differently-signed results on different systems, or we try to force the sign of the result by adding code to check whether the subtraction result is a NaN and replace it with our own NaN if so. The latter doesn't seem like a good use of cycles, and would be a needless behaviour change.

What I do wonder if you really want the Py_NAN returned in many math functions [...]

It doesn't actually matter, because those Py_NAN values typically don't see the light of day. There are two cases:

  • Case 1: One or more NaNs are inputs to the math function. In this case the usual pattern should be to return one of the input NaNs unchanged (though I suspect there are a few cases where we're not doing that, and returning a Py_NaN instead; those should probably be fixed).
  • Case 2: No input to the math function is NaN. In that case Python should be raising ValueError instead of returning a NaN in (almost?) all cases, so we never get to see the sign of the returned NaN.

But if it weren't for the above, I don't see as strong a need for standardisation of the returned NaN from those math functions as I do for math.nan or float("nan"); I'd even argue that the "right thing" to do is whatever the underlying math library does, given that (most of) the math library is supposed to be a set of wrappers around the C99 library.

So if you say the right way is to use NAN and INFINITY. How about we just modify that Py_NAN definition to (double)NAN, and add Py_INFINITY or change Py_HUGE_VAL to that (but really INFINITY is a better name when it is an infinity)?!

That could work. I think the risk of unintended consequences is high enough that this is a change that we wouldn't want to backport, though - it goes beyond just a bugfix. So if we want a fix that can be backported to 3.10 and 3.11 for MIPS, that would need to be a separate PR with smaller scope.

@mdickinson
Copy link
Member

mdickinson commented May 7, 2023

Looks like builtin_nan and NAN give a "positive" NaN here, too.

Confirmed with a test. Compiling the following

#include <math.h>
#include <stdio.h>

double sub(double x, double y) {
    return x - y;
}

int main (void) {
    double x = NAN;
    double y = __builtin_nan("");
    double z = sub(INFINITY, INFINITY);

    printf("NAN signbit: %d\n", signbit(x));
    printf("__builtin_nan() signbit: %d\n", signbit(y));
    printf("INF - INF signbit: %d\n", signbit(z));

    return 0;
}

with gcc -Wall -O0 test.c on my Intel MacBook gives:

NAN signbit: 0
__builtin_nan() signbit: 0
INF - INF signbit: 1

If I inline the sub function or increase optimization level then I get a 0 for the INF - INF signbit.

@mdickinson
Copy link
Member

So if we want a fix that can be backported to 3.10 and 3.11 for MIPS, that would need to be a separate PR with smaller scope.

Looks like you already commented about this, and I missed that comment while I was writing the earlier one ... Seems we're on the same page here.

@seberg
Copy link
Contributor Author

seberg commented May 7, 2023

I don't have any evidence that anyone cares about payloads in real life.

OK, that makes things easier. IIRC, R uses payloads to tag on masked values, which is a reasonable use-case for them and I may well be interested in doing similar things as an addition on top of NumPy long-term.
But, such a use should avoid (double)NAN conflicts of the bit-pattern anyway.

Well, if you look at mathmodule.c all of gamma, log<?>, norm, hypot (via norm), dist do return Py_NAN e.g. for infinity inputs. I agree math.nan is far more important, but it will be nice if its all the same bit-pattern.

Trying around a bit, I am surprised, but not actually sure that MIPS doesn't set a weird payload (and who knows, maybe even expects it somehow).

If I inline the sub function or increase optimization level then I get a 0 for the INF - INF signbit.

Heh, but I guess we can agree that Py_NAN should have a positive sign bit (because that is what everyone would expect) and that float("nan") must have it.


TL;DR: I will suggest I add the test for the sign bit to my PR, also for inf to be math.isinf() and a sign bit for that. At that point, PR would still only be a step in the right direction, so happy to add Py_INFINITY to make that step a bit larger.

I am not sure about a test that fails on MIPS, because honestly I can't really think of one that doesn't dive into C and that seems a bit harder than worth it.

@mdickinson
Copy link
Member

and add Py_INFINITY or change Py_HUGE_VAL to that [...]

Agreed that adding and using Py_INFINITY seems like the way to go. Ideally we'd drop Py_HUGE_VAL altogether, but I suspect that would cause too much breakage in 3rd party extensions.

@mdickinson
Copy link
Member

mdickinson commented May 7, 2023

all of gamma, log<?> [...]

Those are examples of cases where the returned Py_NAN doesn't get returned to Python level.

For example:

>>> from math import gamma
>>> gamma(-2.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: math domain error

mdickinson added a commit that referenced this issue May 10, 2023
This PR removes `_Py_dg_stdnan` and `_Py_dg_infinity` in favour of
using the standard `NAN` and `INFINITY` macros provided by C99.
This change has the side-effect of fixing a bug on MIPS where the
hard-coded value used by `_Py_dg_stdnan` gave a signalling NaN
rather than a quiet NaN.
---------

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
@mdickinson
Copy link
Member

Fixed in #104202. Not backporting the fix to 3.11 or below, as discussed above.

carljm added a commit to carljm/cpython that referenced this issue May 10, 2023
* main:
  pythonGH-102181: Improve specialization stats for SEND (pythonGH-102182)
  pythongh-103000: Optimise `dataclasses.asdict` for the common case (python#104364)
  pythongh-103538: Remove unused TK_AQUA code (pythonGH-103539)
  pythonGH-87695: Fix OSError from `pathlib.Path.glob()` (pythonGH-104292)
  pythongh-104263: Rely on Py_NAN and introduce Py_INFINITY (pythonGH-104202)
  pythongh-104010: Separate and improve docs for `typing.get_origin` and `typing.get_args` (python#104013)
  pythongh-101819: Adapt _io._BufferedIOBase_Type methods to Argument Clinic (python#104355)
  pythongh-103960: Dark mode: invert image brightness (python#103983)
  pythongh-104252: Immortalize Py_EMPTY_KEYS (pythongh-104253)
  pythongh-101819: Clean up _io windows console io after pythongh-104197 (python#104354)
  pythongh-101819: Harden _io init (python#104352)
  pythongh-103247: clear the module cache in a test in test_importlib/extensions/test_loader.py (pythonGH-104226)
  pythongh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (python#103849)
  pythongh-74895: adjust tests to work on Solaris (python#104326)
  pythongh-101819: Refactor _io in preparation for module isolation (python#104334)
  pythongh-90953: Don't use deprecated AST nodes in clinic.py (python#104322)
  pythongh-102327: Extend docs for "url" and "headers" parameters to HTTPConnection.request()
  pythongh-104328: Fix typo in ``typing.Generic`` multiple inheritance error message (python#104335)
carljm added a commit to carljm/cpython that referenced this issue May 11, 2023
* main: (27 commits)
  pythongh-87849: fix SEND specialization family definition (pythonGH-104268)
  pythongh-101819: Adapt _io.IOBase.seek and _io.IOBase.truncate to Argument Clinic (python#104384)
  pythongh-101819: Adapt _io._Buffered* methods to Argument Clinic (python#104367)
  pythongh-101819: Refactor `_io` futher in preparation for module isolation (python#104369)
  pythongh-101819: Adapt _io.TextIOBase methods to Argument Clinic (python#104383)
  pythongh-101117: Improve accuracy of sqlite3.Cursor.rowcount docs (python#104287)
  pythonGH-92184: Convert os.altsep to '/' in filenames when creating ZipInfo objects (python#92185)
  pythongh-104357: fix inlined comprehensions that close over iteration var (python#104368)
  pythonGH-90208: Suppress OSError exceptions from `pathlib.Path.glob()` (pythonGH-104141)
  pythonGH-102181: Improve specialization stats for SEND (pythonGH-102182)
  pythongh-103000: Optimise `dataclasses.asdict` for the common case (python#104364)
  pythongh-103538: Remove unused TK_AQUA code (pythonGH-103539)
  pythonGH-87695: Fix OSError from `pathlib.Path.glob()` (pythonGH-104292)
  pythongh-104263: Rely on Py_NAN and introduce Py_INFINITY (pythonGH-104202)
  pythongh-104010: Separate and improve docs for `typing.get_origin` and `typing.get_args` (python#104013)
  pythongh-101819: Adapt _io._BufferedIOBase_Type methods to Argument Clinic (python#104355)
  pythongh-103960: Dark mode: invert image brightness (python#103983)
  pythongh-104252: Immortalize Py_EMPTY_KEYS (pythongh-104253)
  pythongh-101819: Clean up _io windows console io after pythongh-104197 (python#104354)
  pythongh-101819: Harden _io init (python#104352)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants