Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: ordering numeric values, including NaNs, and the consequences thereof #59531

Open
robpike opened this issue Apr 11, 2023 · 10 comments
Open
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@robpike
Copy link
Contributor

robpike commented Apr 11, 2023

There are thorny issues around ordering of numerical values, and rather than add to existing conversations I thought it worth pulling them into their own conversation.

The questions here touch on math.Compare (issue #56491) as well as the proposed ordered constraint (issue #59488). That said, I feel there are higher level issues around comparison that should be discussed and resolved before making any other changes to the library or language. I also feel that math.Compare should be pulled for now, not released, until these issues are better understood.

Go provides less than and less-than-or-equal operators that apply to strings, integers, and scalar numbers. (There is no ordering for complex values.) The proposal to add an ordered constraint somewhere is a priori reasonable as it allows one to write comparators, finders, and sorters that apply to all such types.

Except they don't, not quite. The problem is IEEE-754 NaNs. It is a peculiarity of IEEE-754 and its implementation in both hardware and most languages that NaNs do not behave like other numbers under comparison. All "less than" and related comparison operations, including "equals", return false when comparing NaNs, even when the bit pattern of the operands is identical. This is specified by IEEE-754 and, although odd, provides an easy way to see if a value is a NaN: if it is neither equal nor unequal to itself, it is not a number.

This behavior is the key reason why the map clear builtin was added: One cannot pull out all the values of a map to clear it as it is never possible to match a NaN key.

Similarly, sorting and such will misbehave in the presence of NaNs. This is why math.Compare was added, to provide a total ordering on floating point. But of course, the sort package does not use it, nor could a generic sorter, at least absent a generic compare, which is possibly what should happen instead.

Mischievously, this total ordering of NaNs is also defined by IEEE-754 but is not the standard behavior of less than, either as a hardware instruction or in a programming language.

Thus NaNs are ordered but not by the comparison operator. The consequences of this must be clear before defining any ordering constraint.

That is, the addition of an ordered constraint to the language, however it is done, bumps up against this conflict. If it is defined, as suggested, to be any type that implements the < less than operator, any function that uses that constraint may fail unpredictably if a NaN arises, as IEEE-754 demands. On the other hand, if it is instead defined using the total ordering of math.Compare, programs will always work reliably but the < operator is then not allowed, which is bizarre for a constraint defining an ordering.

Neither of these paths is satisfactory. So what should we do? Short answer, I don't know, but we certainly should not change the language or libraries until we have thoroughly thought this through.

I do not, at least at the moment, propose a solution. Instead I am isolating the issue in a discussion to make sure we do the right thing with ordered, so we don't end up in a situation like the one for maps where the language had to be changed to make them behave properly.

Discuss.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Apr 11, 2023

No answers, just some notes.

Java implements a class java.lang.Double. This is an object that contains a single value of type double (same as Go float64). Double has an equal method that is mostly like using == on a double, except that calling equal on two NaN values returns true, and calling equal on a positive zero and a negative zero returns false. Similarly, Double has a compareTo method and a compare function that reliably sort NaN after all other values (including infinity) and sort positive zero after negative zero.

In Rust the trait core::cmp::Ord (roughly equivalent to Go's golang.org/x/exp/constraints.Ordered) does not permit floating-point types. The Rust trait core::cmp::PartialOrd does permit floating-point types, the difference between that the PartialOrd comparison method is permitted to return None meaning that there is no ordering. Sorting a vector of floating-point numbers requires specifying an ordering function to use.

C++ and Python programmers are best advised to not sort vectors containing NaN values.

In Go we have sort.Float64s today. It sorts NaN values before all other values (which is not what the unreleased math.Compare function does).

@ianlancetaylor
Copy link
Contributor

A conservative approach is to make the Ordered constraint, wherever it winds up, not permit floating-point types. The effect would be that generic functions like sort.Min (or cmp.Min, whereever it winds up) and slices.Sort would not permit floating-point types. This might be occasionally annoying, but it would not be hard to provide a math.Less function that could be used with slices.SortFunc.

However, I have a feeling that the effect would be that people who want to sort a slice of floating-point numbers would reach for slices.Sort and, when it fails, shake their heads in exasperation, do a web search, and call slices.SortFunc without thinking about it further.

So there is another approach we can take. Don't permit floating-point types for Ordered. Add a new constraint, maybe called PartiallyOrdered although there is probably a better name, that does permit floating-point types. Define slices.Sort and sort.Min to take a PartiallyOrdered constraint, and write them to handle NaNs correctly (this can be done in a general and efficient manner as described at #59488 (comment)).

With this approach, the naïve user will use Ordered and the right thing will happen. The sophisticated user will use PartiallyOrdered and, we hope, learn about how to write their comparisons to handle NaNs.

@zephyrtronium
Copy link
Contributor

zephyrtronium commented Apr 11, 2023

With this approach, the naïve user will use Ordered and the right thing will happen. The sophisticated user will use PartiallyOrdered and, we hope, learn about how to write their comparisons to handle NaNs.

In my experience with Rust, the situation I've seen tends to be more like, "the naïve user will use PartialOrd because they want their code to work with f64, and the sophisticated user will use Ord because they understand why PartialOrd exists." That might be a matter of documentation.

I also note that narrowing a constraint from PartiallyOrdered to Ordered (e.g. to choose < or math.Compare in a generic Less) would be especially awkward. It seems like this would make me want #45380 even more.

@dr2chase
Copy link
Contributor

How much would

people who want to sort a slice of floating-point numbers would reach for slices.Sort and, when it fails, shake their heads in exasperation

be helped by a carefully targeted error message?

@dr2chase dr2chase added NeedsFix The path to resolution is known, but the work has not been done. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 11, 2023
@gopherbot gopherbot removed the NeedsFix The path to resolution is known, but the work has not been done. label Apr 11, 2023
@ianlancetaylor
Copy link
Contributor

@zephyrtronium I'm not sure I entirely understand your comment about narrowing the constraint. To be clear, you could still use the < operator with the PartiallyOrdered constraint.

@dr2chase A good error message would help, of course, but my guess is that most people who want to sort floating-point numbers simply don't care about NaN values at all. So it's weird to tell them that the obvious function doesn't work. Especially since we can choose to make it work.

@zephyrtronium
Copy link
Contributor

@ianlancetaylor I think I had the wrong language in mind when I wrote that part. Since < works with PartiallyOrdered, that concern is less relevant. 🙂

@smasher164
Copy link
Member

I always assumed that the use-cases for < and math.Compare were

Sort(slice)
SortFunc(slice, math.Compare)

respectively. It provides an opportunity to opt into a different comparison predicate, while preserving existing behavior. The XFunc variants of generic APIs in general allow for this, which is also why #58559 is relevant, since you can just write SortFunc(slice, operators.Lt).

That being said, I agree that Ordered as a name for a constraint is kind of a misnomer when some of the values it applies to are not comparable. Heck, slices.Sort even documents that it misbehaves with NaN. PartiallyOrdered is a much better name to encapsulate types that support <. It was even mentioned in the discussion at #34915 that Less must be a partial order.

Does renaming Ordered to PartiallyOrdered address this issue? Is Ordered even necessary at that point?

@robpike
Copy link
Contributor Author

robpike commented Apr 12, 2023

I forgot to mention the delights of signed zeros. Not as problematic as NaNs, but again inconsistent with regard to traditional "less than" vs. total ordering: https://gotipplay.golang.org/p/qd16ohSUz_p

package main

import (
	"fmt"
	"math"
)

func main() {
	pzero := math.Float64frombits(0)
	nzero := math.Float64frombits(1 << 63)
	fmt.Println(pzero, nzero, nzero == pzero, nzero < pzero, math.Compare(nzero, pzero))
}

prints

0 -0 true false -1

@atdiar
Copy link

atdiar commented Apr 13, 2023

One little idea, could be to consider NaNs and signed 0 as subtypes.

But that'd be a language change and would be more involved.

Essentially, it'd allow types to be further constrainable so that a non-NaN float64 type can be defined and enforced in user code.

The mechanism would still be quite similar to what is done today for interface (assertions, assignment).
In fact, seems like a straight extension of the theory of the current generic implementation.

I imagine that it could solve comparison and ordering in one go. (pun half intended)

But that's quite involved I think and not sure if that complexity is warranted (maybe it would be fine, needs further investigation about use-cases).

@ianlancetaylor
Copy link
Contributor

An update on the current status.

We removed math.Compare (#60519).

We added cmp.Compare and cmp.Less (#59488). These work for all ordered types including floating-point types. They document how they handle NaN and negative zero (any NaN equals any other NaN, any NaN is is less than any non-NaN, negative zero equals positive zero).

We added the cmp.Ordered constraint (also #59488). It permits floating-point types. We document that people who care about NaN should use cmp.Compare or cmp.Less rather than == or <.

I don't know whether there is anything else to do here. I think we're an OK place right now, but I'm definitely too close to the problem.

@seankhliao seankhliao added this to the Unplanned milestone Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants