Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a total ordering method for floating-point #53938

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 34 additions & 0 deletions src/libcore/cmp.rs
Expand Up @@ -462,6 +462,40 @@ impl<T: Ord> Ord for Reverse<T> {
}
}

/// A wrapper newtype for providing a total order on a type that's normally partial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The brief summary seems to indicate that only T: PartialOrds are accepted but the signature does not require that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's completely true, but note that cmp::Reverse also doesn't bound itself.

Copy link
Contributor

@gnzlbg gnzlbg Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, do you know why that is? Appears to be an oversight, what use is reverse if T does not implement PartialOrd ? - If this was an oversight for Reverse, we don't have to make the same mistake here.

///
/// This is most commonly used for floating-point:
///
/// ```rust
/// #![feature(float_total_cmp)]
///
/// use std::cmp::Total;
///
/// assert!(-0.0 == 0.0);
/// assert!(Total(-0.0) < Total(0.0));
/// assert_eq!(std::f32::NAN.partial_cmp(&0.0), None);
/// assert_eq!(Total(std::f32::NAN).partial_cmp(&Total(0.0)), Some(std::cmp::Ordering::Greater));
///
/// let mut a = [3.0, 1.0, 2.0];
/// // a.sort(); // ERROR, because floats are !Ord
/// a.sort_by_key(|x| std::cmp::Total(*x)); // But this works!
/// assert_eq!(a, [1.0, 2.0, 3.0]);
///
/// // By using `Total`, the struct can derive `Eq` and `Ord`.
/// #[derive(PartialEq, Eq, PartialOrd, Ord)]
/// struct MyData {
/// foo: Total<f32>,
/// bar: Total<f64>,
/// }
/// ```
///
/// It can also be used to provide both partial and total order implementations
/// for an enum, if orders between variant don't make sense conceptually but
/// are still desired for use as a `BTreeMap` key or similar.
#[derive(Debug, Copy, Clone)]
#[unstable(feature = "float_total_cmp", issue = "55339")]
pub struct Total<T>(#[unstable(feature = "float_total_cmp", issue = "55339")] pub T);

Copy link
Contributor

@gnzlbg gnzlbg Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PartialOrd/Ord "API" makes the assumption that only a single three-way comparator makes sense for any given type. This API covers the case in which more than one comparator makes sense for a given type, and the one implemented for the type is not total.

This feel like an extremely niche case for such a general API in core::cmp. For example, are we going to cover the opposite case, e.g., by adding Partial<T> as well? What about types with two total orders, where users would like to use the non-default one: are we going to add a Total2<T> to core::cmp to cover that? I don't think adding these kinds of types to core::cmp are a good solution to the problem.

Floats are ubiquitous, and it makes sense to make them easier to use. I think we could add [f32, f64]::total_order(self) -> TotalFloat<Self> methods to the floating-point types, and a TotalFloat type wrapper to core::float (or somewhere else) that implements the float API, but where the default ordering is total instead of partial. That's a smaller and more focused problem to solve.

In std::cmp, either we should try to solve the underlying problem, or provide better duct-tape, e.g., Total<T, Fn(Self, Self)-> Ordering> and Partial<T, Fn(Self,Self)->Option<Ordering>>, such that one can write Total<f32, f32::total_cmp> to select the "default" orderings or similar.


The PartialOrd/Ord provide "the default ordering for a type", but the thing they do not address yet is that partial / total are properties of orderings, not of types, and that multiple orderings with different properties exist for most types.

A less intrusive solution to the problem of adjusting the default ordering of aggregates could be to just use macros to control which ordering is used, e.g., instead of

#[derive(PartialOrd,Ord,PartialEq,Eq)]
struct A {
   x: Total<f32>,
}

one could

#[derive(PartialOrd,Ord,PartialEq,Eq)]
struct A {
   #[Ord = f32::total_cmp, Eq = f32::total_eq]
   x: f32,
}

A better long-term solution for types with multiple orderings in the ecosystem would be to evolve APIs like HashSet from requiring the constraints on the type HashSet<T: Eq + Hash>, which kind of implies that for every T there is only an Eq and Hash impl that makes sense, to a de-coupled API like HashSet<T, O: Eq<T> = <T as Eq> where T: Eq, H: Hash<T> = <T as Hash> where T: Hash> that lets the user decide which ordering / hashing to use when it needs to, while falling back to a default if none are provided.

/// Trait for types that form a [total order](https://en.wikipedia.org/wiki/Total_order).
///
/// An order is a total order if it is (for all `a`, `b` and `c`):
Expand Down
132 changes: 132 additions & 0 deletions src/libcore/num/f32.rs
Expand Up @@ -17,6 +17,7 @@

#![stable(feature = "rust1", since = "1.0.0")]

use cmp;
use mem;
use num::FpCategory;

Expand Down Expand Up @@ -474,4 +475,135 @@ impl f32 {
// It turns out the safety issues with sNaN were overblown! Hooray!
unsafe { mem::transmute(v) }
}

/// A 3-way version of the IEEE `totalOrder` predicate.
///
/// This is most commonly used with `cmp::Total`, not directly.
///
/// This is useful in situations where you run into the fact that `f32` is
/// `PartialOrd`, but not `Ord`, like sorting. If you need a total order
/// on arbitrary floating-point numbers you should use this -- or one of
/// the related `total_*` methods -- they're implemented efficiently using
/// just a few bitwise operations.
///
/// This function mostly agrees with `PartialOrd` and the usual comparison
/// operators in how it orders floating point numbers, but it additionally
/// distinguishes negative from positive zero (it considers -0 as less than
/// +0, whereas `==` considers them equal) and it orders Not-a-Number values
/// relative to the numbers and distinguishes NaNs with different bit patterns.
///
/// NaNs with positive sign are ordered greater than all other floating-point
/// values including positive infinity, while NaNs with negative sign are
/// ordered the least, below negative infinity. Two different NaNs with the
/// same sign are ordered first by whether the are signaling (signaling is
/// less than quiet if the sign is positive, reverse if negative) and second
/// by their payload interpreted as integer (reversed order if the sign is negative).
///
/// This means all different (canonical) floating point bit patterns are
/// placed in a linear order, given below in ascending order:
///
/// - Quiet Not-a-Number with negative sign (ordered by payload, descending)
/// - Signaling Not-a-Number with negative sign (ordered by payload, descending)
/// - Negative infinity
/// - Negative finite non-zero numbers (ordered in the usual way)
/// - Negative zero
/// - Positive zero
/// - Positive finite non-zero numbers (ordered in the usual way)
/// - Positive infinity
/// - Signaling Not-a-Number with positive sign (ordered by payload, ascending)
/// - Quiet Not-a-Number with positive sign (ordered by payload, ascending)
///
/// # Examples
///
/// Most follow the normal ordering:
/// ```
/// #![feature(float_total_cmp)]
/// use std::cmp::Ordering::*;
///
/// assert_eq!(f32::total_cmp(1.0, 2.0), Less);
/// assert_eq!(f32::total_cmp(1.0, 1.0), Equal);
/// assert_eq!(f32::total_cmp(2.0, 1.0), Greater);
/// assert_eq!(f32::total_cmp(-1.0, 1.0), Less);
/// assert_eq!(f32::total_cmp(1.0, -1.0), Greater);
/// assert_eq!(f32::total_cmp(-1.0, -2.0), Greater);
/// assert_eq!(f32::total_cmp(-1.0, -1.0), Equal);
/// assert_eq!(f32::total_cmp(-2.0, -1.0), Less);
/// assert_eq!(f32::total_cmp(-std::f32::MAX, -1.0e9), Less);
/// assert_eq!(f32::total_cmp(std::f32::MAX, 1.0e9), Greater);
/// ```
///
/// Zeros and NaNs:
/// ```
/// #![feature(float_total_cmp)]
/// use std::cmp::Ordering::*;
///
/// assert_eq!(f32::partial_cmp(&0.0, &-0.0), Some(Equal));
/// assert_eq!(f32::total_cmp(-0.0, 0.0), Less);
/// assert_eq!(f32::total_cmp(0.0, -0.0), Greater);
///
/// assert_eq!(f32::partial_cmp(&std::f32::NAN, &std::f32::NAN), None);
/// assert_eq!(f32::total_cmp(-std::f32::NAN, std::f32::NAN), Less);
/// assert_eq!(f32::total_cmp(std::f32::NAN, std::f32::NAN), Equal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 We should probably have examples demonstrating the ordering by signaling bit and payload, constructing some non-default NaNs via from_bits.

/// assert_eq!(f32::total_cmp(std::f32::NAN, -std::f32::NAN), Greater);
/// assert_eq!(f32::total_cmp(std::f32::NAN, std::f32::INFINITY), Greater);
/// assert_eq!(f32::total_cmp(-std::f32::NAN, -std::f32::INFINITY), Less);
/// ```
#[unstable(feature = "float_total_cmp", issue = "55339")]
#[inline]
pub fn total_cmp(self, other: f32) -> cmp::Ordering {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this function take references like Ord::cmp does? It would allow writing things like a.sort_by(f32::total_cmp).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I contemplated that, but decided that would be inconsistent with all the other things that would take references if f32 wasn't copy, but takes owned because it is copy. I note that there's currently not a single &self or &f32 in the inherent impl methods today.

Hopefully the copy type ergonomics stuff and better fn type coercions will make .sort_by(f32::total_cmp) just work sometime in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about this too. If one just can't .sort_by(f32::total_cmp) people will definitely reach to unsafe code for going from &[f32]s to &[Total<f32>] and vice-versa.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the other solution to that would be to just make it safe...

self.total_cmp_key().cmp(&other.total_cmp_key())
}

#[inline]
fn total_cmp_key(self) -> i32 {
let x = self.to_bits() as i32;
// Flip the bottom 31 bits if the high bit is set
x ^ ((x >> 31) as u32 >> 1) as i32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really, really needs either an in-depth explanation justifying its correctness, or a reference to an external write-up of such an explanation. It sounds really plausible but I'm also kind of unsure about various aspects, and I've already spent more time thinking about this order and about the binary exchange format than any human being should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a proof for this, but it's equivalent to the one in rust-lang/rfcs#1249 (comment). I suspect the predicate was chosen specifically to allow such a simple implementation, because of things like "the same floating-point datum" being ordered by exponent first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the exact same question for @jpetkau then 😉 I agree with your suspicion (tho note that "same floating point datum" with different exponents only happens in decimal floating point) but I'd rather have it confirmed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

"With their bits interpreted as signed integers, positive IEEE754 floats are ordered as:

+0 < (positive finite floats in order) < +inf < +NANs.

With the reverse order for floats with the sign bit set.

So if we reverse the order of the negative cases, by xoring the sign bit with the other bits, we get the order:

-NANs < -inf < (negative finite floats in order) < -0 < +0 < (positive finite floats in order) < +inf < +NANs.

IEEE754.2008 calls this the 'totalOrder' predicate.

(It also specifies an interpretation of the signaling bit which implies -QNANs < -SNANs < +SNANs < +QNANs, but if we happen to be on an older MIPS architecture with the opposite interpretation of the signaling bit, this won't be true.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds mostly convincing! The main thing that isn't obvious to me is why negating the non-sign bits reverses the order but maybe that's just someone you have to convince yourself of by spot checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that this reinterprets the whole 32- or 64-bit unit, but it's possible that an architecture uses different endianness for floats and ints, then the bytes will appear reversed when stored to memory as a float and loaded back as an integer. However, I don't think we currently support such architectures, and if we wanted to, we'd have to update more code than just this method (in particular to_bits, which would automatically make this code work correctly too).

Copy link

@jpetkau jpetkau Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dispute that it's possible that an architecture uses different endianness for floats and ints. That would be deeply insane and break a lot more than this code.

[edit] of course it's technically possible, just like an architecture could use different endianess for signed and unsigned ints if it really wanted to be perverse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean you dispute that it's possible? There's real physical processors (old ARMs) doing so. We don't support them, and I would be open to concluding we don't ever want to support them (just like e.g. architectures with non-octet memory granularity – also breaks tons of code, but definitely a real thing!), but they exist.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was just being flippant there. Such architectures do exist. But they would break far more than just this code; they'd break everything that relied on the bit pattern of floats, which is, well, everything. Every serialization library, formatter, parser, implementation of math libraries, you name it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endianness doesn't matter, because an f32 and the i32 will still have the same endianness as each other.

Thanks for the answer, I just wasn't sure if this had to be the case, but it seems that for Rust this will need to be the case.

}
}

#[unstable(feature = "float_total_cmp", issue = "55339")]
impl PartialEq for cmp::Total<f32> {
fn eq(&self, other: &Self) -> bool {
self.0.to_bits() == other.0.to_bits()
}
}

#[unstable(feature = "float_total_cmp", issue = "55339")]
impl Eq for cmp::Total<f32> {}

#[unstable(feature = "float_total_cmp", issue = "55339")]
impl PartialOrd for cmp::Total<f32> {
fn partial_cmp(&self, other: &Self) -> Option<cmp::Ordering> {
Some(self.cmp(other))
}
}

/// See `f32::total_cmp` for details.
///
/// Using this to sort floats:
/// ```
/// #![feature(float_total_cmp)]
///
/// let mut a = [
/// 1.0,
/// -1.0,
/// 0.0,
/// -0.0,
/// std::f32::NAN,
/// -std::f32::NAN,
/// std::f32::INFINITY,
/// std::f32::NEG_INFINITY,
/// ];
/// a.sort_by_key(|x| std::cmp::Total(*x));
/// assert_eq!(
/// format!("{:?}", a),
/// "[NaN, -inf, -1.0, -0.0, 0.0, 1.0, inf, NaN]"
/// );
/// ```
#[unstable(feature = "float_total_cmp", issue = "55339")]
impl Ord for cmp::Total<f32> {
fn cmp(&self, other: &Self) -> cmp::Ordering {
self.0.total_cmp(other.0)
}
}
132 changes: 132 additions & 0 deletions src/libcore/num/f64.rs
Expand Up @@ -17,6 +17,7 @@

#![stable(feature = "rust1", since = "1.0.0")]

use cmp;
use mem;
use num::FpCategory;

Expand Down Expand Up @@ -487,4 +488,135 @@ impl f64 {
// It turns out the safety issues with sNaN were overblown! Hooray!
unsafe { mem::transmute(v) }
}

/// A 3-way version of the IEEE `totalOrder` predicate.
///
/// This is most commonly used with `cmp::Total`, not directly.
///
/// This is useful in situations where you run into the fact that `f64` is
/// `PartialOrd`, but not `Ord`, like sorting. If you need a total order
/// on arbitrary floating-point numbers you should use this -- or one of
/// the related `total_*` methods -- they're implemented efficiently using
/// just a few bitwise operations.
///
/// This function mostly agrees with `PartialOrd` and the usual comparison
/// operators in how it orders floating point numbers, but it additionally
/// distinguishes negative from positive zero (it considers -0 as less than
/// +0, whereas `==` considers them equal) and it orders Not-a-Number values
/// relative to the numbers and distinguishes NaNs with different bit patterns.
///
/// NaNs with positive sign are ordered greater than all other floating-point
/// values including positive infinity, while NaNs with negative sign are
/// ordered the least, below negative infinity. Two different NaNs with the
/// same sign are ordered first by whether the are signaling (signaling is
/// less than quiet if the sign is positive, reverse if negative) and second
/// by their payload interpreted as integer (reversed order if the sign is negative).
///
/// This means all different (canonical) floating point bit patterns are
/// placed in a linear order, given below in ascending order:
///
/// - Quiet Not-a-Number with negative sign (ordered by payload, descending)
/// - Signaling Not-a-Number with negative sign (ordered by payload, descending)
/// - Negative infinity
/// - Negative finite non-zero numbers (ordered in the usual way)
/// - Negative zero
/// - Positive zero
/// - Positive finite non-zero numbers (ordered in the usual way)
/// - Positive infinity
/// - Signaling Not-a-Number with positive sign (ordered by payload, ascending)
/// - Quiet Not-a-Number with positive sign (ordered by payload, ascending)
///
/// # Examples
///
/// Most follow the normal ordering:
/// ```
/// #![feature(float_total_cmp)]
/// use std::cmp::Ordering::*;
///
/// assert_eq!(f64::total_cmp(1.0, 2.0), Less);
/// assert_eq!(f64::total_cmp(1.0, 1.0), Equal);
/// assert_eq!(f64::total_cmp(2.0, 1.0), Greater);
/// assert_eq!(f64::total_cmp(-1.0, 1.0), Less);
/// assert_eq!(f64::total_cmp(1.0, -1.0), Greater);
/// assert_eq!(f64::total_cmp(-1.0, -2.0), Greater);
/// assert_eq!(f64::total_cmp(-1.0, -1.0), Equal);
/// assert_eq!(f64::total_cmp(-2.0, -1.0), Less);
/// assert_eq!(f64::total_cmp(-std::f64::MAX, -1.0e9), Less);
/// assert_eq!(f64::total_cmp(std::f64::MAX, 1.0e9), Greater);
/// ```
///
/// Zeros and NaNs:
/// ```
/// #![feature(float_total_cmp)]
/// use std::cmp::Ordering::*;
///
/// assert_eq!(f64::partial_cmp(&0.0, &-0.0), Some(Equal));
/// assert_eq!(f64::total_cmp(-0.0, 0.0), Less);
/// assert_eq!(f64::total_cmp(0.0, -0.0), Greater);
///
/// assert_eq!(f64::partial_cmp(&std::f64::NAN, &std::f64::NAN), None);
/// assert_eq!(f64::total_cmp(-std::f64::NAN, std::f64::NAN), Less);
/// assert_eq!(f64::total_cmp(std::f64::NAN, std::f64::NAN), Equal);
/// assert_eq!(f64::total_cmp(std::f64::NAN, -std::f64::NAN), Greater);
/// assert_eq!(f64::total_cmp(std::f64::NAN, std::f64::INFINITY), Greater);
/// assert_eq!(f64::total_cmp(-std::f64::NAN, -std::f64::INFINITY), Less);
/// ```
#[unstable(feature = "float_total_cmp", issue = "55339")]
#[inline]
pub fn total_cmp(self, other: f64) -> cmp::Ordering {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these be implemented using a macro so that all tests and docs can be shared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plausibly, but that's a general f32.rs/f64.rs problem not specific to this method -- the same is true of min, max, to_bits, etc -- so I will not be taking action on it in this PR.

Copy link
Contributor

@gnzlbg gnzlbg Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because the other methods could be improved does not mean that we have to make the situation ""worse"" with this PR. It would be better to just add a float_total_ord.rs module with a macro to define the method and the tests only once, and use that from f32.rs/f64.rs.

If someone wants to reduce duplication for the other methods, they can do the same, and once we have a couple of files we could move them all to a sub-directory or something, but we should start somewhere.

self.total_cmp_key().cmp(&other.total_cmp_key())
}

#[inline]
fn total_cmp_key(self) -> i64 {
let x = self.to_bits() as i64;
// Flip the bottom 31 bits if the high bit is set
x ^ ((x >> 31) as u64 >> 1) as i64
}
}

#[unstable(feature = "float_total_cmp", issue = "55339")]
impl PartialEq for cmp::Total<f64> {
fn eq(&self, other: &Self) -> bool {
self.0.to_bits() == other.0.to_bits()
}
}

#[unstable(feature = "float_total_cmp", issue = "55339")]
impl Eq for cmp::Total<f64> {}

#[unstable(feature = "float_total_cmp", issue = "55339")]
impl PartialOrd for cmp::Total<f64> {
fn partial_cmp(&self, other: &Self) -> Option<cmp::Ordering> {
Some(self.cmp(other))
}
}

/// See `f64::total_cmp` for details.
///
/// Using this to sort floats:
/// ```
/// #![feature(float_total_cmp)]
///
/// let mut a = [
/// 1.0,
/// -1.0,
/// 0.0,
/// -0.0,
/// std::f64::NAN,
/// -std::f64::NAN,
/// std::f64::INFINITY,
/// std::f64::NEG_INFINITY,
/// ];
/// a.sort_by_key(|x| std::cmp::Total(*x));
/// assert_eq!(
/// format!("{:?}", a),
/// "[NaN, -inf, -1.0, -0.0, 0.0, 1.0, inf, NaN]"
/// );
/// ```
#[unstable(feature = "float_total_cmp", issue = "55339")]
impl Ord for cmp::Total<f64> {
fn cmp(&self, other: &Self) -> cmp::Ordering {
self.0.total_cmp(other.0)
}
}