Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an accuracy regression in f32::to_degrees #48622

Closed
wants to merge 1 commit into from

Conversation

varkor
Copy link
Member

@varkor varkor commented Feb 28, 2018

#47919 regressed some test cases for f32::to_degrees, while improving others. This change satisfies all previous test cases and should be more accurate than both previous implementations.

The f32 to f64 cast is lossless, f64::to_degrees should provide more accuracy than a native f32::to_degrees, and conversion back to f32 will be as accurate a conversion as possible.

Fixes #48617.

r? @rkruppe

rust-lang#47919 regressed some test cases for `f32::to_degrees`, while improving others. This change satisfies all previous test cases and should be more accurate than both previous implementations.

The `f32` to `f64` cast is lossless, `f64::to_degrees` should provide more accuracy than a native `f32::to_degrees`, and conversion back to `f32` will be as accurate a conversion as possible. It can be hard to reason about floating-point accuracy, but given that this passes all the tests, I feel confident it's an improvement.
@petrochenkov
Copy link
Contributor

@bors delegate=rkruppe

@bors
Copy link
Contributor

bors commented Feb 28, 2018

✌️ @rkruppe can now approve this pull request

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 1, 2018
@hanna-kruppe
Copy link
Contributor

It's good to fix the regression for the pi/6 case, but I am worried about this overall approach. It seems we are blindly stumbling through the landscape of possible implementations based on a few simple test cases. Accuracy across the wide range of not-nice inputs (i.e., not a multiple or fraction of the couple common constants) is at least as important as accuracy on the corner cases that people like to test.

I'm coming to the conclusion that a principled approach is needed, otherwise we're apparently just as likely to regress precision for some inputs not yet in our test suite. Thus I would prefer us to either bite the bullet and implement a completely accurate algorithm as @huonw mentioned in the original issue, or leave the implementation as simple and predictable as possible. Here are my reasons for it:

First there is the fact that we got blindsided by the regression for pi/6. This automatically makes me distrust any judgement coming from the same people at the same level of rigor.

Second, I'm not a fan of this implementation strategy. Using higher precision internally for a single precision calculation can be useful, but it usually just papers over inaccuracies in the algorithm and thus doesn't help f64 (at least unless we use a soft float with > 64 bits for f64::to_degrees, which seems clearly undesirable). Furthermore, it is not obvious to me that this is a choice we should be making for the user -- on some targets, double precision operations can be much more expensive than single precision ones, and require additional libraries to be linked in.

Third, I have doubts about the claim that this is always at least as accurate as an f32 multiply. The cast back to f32 at the end is correctly rounded in itself, yes, but it's still a rounding step. If the result of f64::to_degrees has some bits set in the lower 53 - 24 bits of the significant, this means double rounding (first rounding to 53 bits in to_degrees, then to 24 bits in the cast). This should usually be balanced out by the f64 multiplication being more accurate, but it's far from obvious to me that this won't be less accurate in some cases that produce interesting bit pattern in the low order bits of the significant.

I don't have an example at hand, but on the contrary the test cases here are very minimal as well, so we're starved for empirical evidence either way. Note that the test cases that result in a nice round number like 1.0 are unproblematic for the scenario I described above since the low order bits are all zero anyway.

Honestly, at this point convincing me would take trying out all ~ 4 billion f32 inputs and showing there are no accuracy regressions [1] compared to both the original implementation (180.0 / PI) and the current one (dedicated constant). This would take a while to run but it's basically feasible. f64 rounding behavior didn't change at all IIRC since 180f64 / f64::consts::PI happened to be correctly rounded? But as of right now, as I said, I feel like we're flailing around aimlessly.

[1] Note that this probably requires computing the correctly rounded result, which makes this potentially challenging.

@varkor
Copy link
Member Author

varkor commented Mar 1, 2018

I agree with everything you said; an exhaustive test does seem like the most convincing demonstration that we have a better implementation.

Furthermore, it is not obvious to me that this is a choice we should be making for the user [...]

Yeah, I didn't give this enough weight as I perhaps ought to have. If we can measure the accuracy, and using double-precision was more accurate, then it's a separate decision to be made (though it's not even clear that's the case yet).

I don't have time to write an exhaustive test right now, but I'll try to get around to it at some point if no-one else wants to give it a shot. I'm happy to either leave the pull request open, or close it for now.

@hanna-kruppe
Copy link
Contributor

Since the regression for pi/6 is already in beta and this PR is now delayed indefinitely, perhaps it's best to revert the previous PR now and backport to beta?

@varkor
Copy link
Member Author

varkor commented Mar 2, 2018

@rkruppe: Okay, I'm retracting my earlier comment. I don't think the current behaviour is incorrect. I think you're right when you suggested that the f64 to f32 conversion is causing bad rounding. I think the current implementation is correct.

Take std::f32::consts::FRAC_PI_6, for instance. Obviously, ideally π/6 = 30° in a mathematical sense, but we want to_degrees to provide the most accuracy for the floating point inputs, not just "what we might expect from the variable names".
The current implementation of FRAC_PI_6.to_degrees() does:

0.52359877559829887307710723054658381_f32 * 57.2957795130823208767981548141051703_f32

In (true) decimal, this corresponds to:

0.52359879016876220703125 * 57.295780181884765625 = 30.00000118501020551775582134723663330078125

And 30.00000118501020551775582134723663330078125_f32 is equal to 30.0000019073486328125, which is exactly what FRAC_PI_6.to_degrees() returns.

@cuviper
Copy link
Member

cuviper commented Mar 2, 2018

FWIW, I'm OK with closing #48617 as expected behavior, as long as we're sure that's expected and good, even though it seemed more accurate before. (for a few inputs, perhaps not in general)

The current implementation of FRAC_PI_6.to_degrees() does:

0.52359877559829887307710723054658381_f32 * 57.2957795130823208767981548141051703_f32

In (true) decimal, this corresponds to:

0.52359879016876220703125 * 57.295780181884765625 = 30.00000118501020551775582134723663330078125

And 30.00000118501020551775582134723663330078125_f32 is equal to 30.0000019073486328125, which is exactly what FRAC_PI_6.to_degrees() returns.

This doesn't prove accuracy -- it just proves that the compiler can do the math you gave it.

You're giving the compiler a particular pre-rounded constant. There are a few other ways we could re-frame this calculation of x * 180.0 / PI, and it's not obvious to me which will have the least floating point error in general. x * (180.0 / PI), (x * 180.0) / PI, (x / PI) * 180.0, x / (PI / 180.0)...

@varkor
Copy link
Member Author

varkor commented Mar 2, 2018

This doesn't prove accuracy -- it just proves that the compiler can do the math you gave it.

Yeah, you're right, I'm not thinking straight at all at the moment.

You're giving the compiler a particular pre-rounded constant. There are a few other ways we could re-frame this calculation of x * 180.0 / PI, and it's not obvious to me which will have the least floating point error in general.

Intuitively (though I don't want to try to put any weight behind that, as it doesn't have a good track record), using a more accurate constant should result overall in more accurate results: the previous constant (implemented using the division) was demonstrably less accurate than the new constant.

This makes me feel that, considering every input, the new method should still be more accurate. However, given the quirks of floating-point, it might have happened, by chance, that the old version was actually a better constant to use in that context.

I want to rectify my mistakes and actually resolve this. I'll try to set aside some time to test this; if I don't manage, then we can revert the change if no new evidence emerges. It's probably better to remain consistent if we can't prove we're not regressing.

Sorry for all my confusion!

@hanna-kruppe
Copy link
Contributor

Intuitively (though I don't want to try to put any weight behind that, as it doesn't have a good track record), using a more accurate constant should result overall in more accurate results: the previous constant (implemented using the division) was demonstrably less accurate than the new constant.

Rounding errors can (and often do, with round-to-nearest) cancel out.

@varkor
Copy link
Member Author

varkor commented Mar 3, 2018

Here's my analysis:

For a range of f32 numbers (not the entire range [at least yet]), I output the results from (a) the original implementation; (b) the updated constant implementation; (c) the f64 cast implementation; as well as a Mathematica implementation (m) that computes the solution with arbitrary precision before rounding to the f32. The Mathematica implementation acts as a ground truth for the others, as in theory it should be maximally accurate [1].

// (a)
x * (180.0f32 / f32::consts::PI)
// (b)
const PIS_IN_180: f32 = 57.2957795130823208767981548141051703_f32;
x * PIS_IN_180
// (c)
const PIS_IN_180: f64 = 57.2957795130823208767981548141051703_f64;
((x as f64) * PIS_IN_180) as f32

Mathematica implementation.

I then compared (a), (b) and (c) to (m) to see for how many inputs their outputs differed for the given range.

Here are my results (each cell contains the number of discrepancies):

Range (a) Original (b) Current (c) Upcasting
0..2^10 0 0 0
2^10..2^11 9 2 0
2^11..2^12 28 4 0
2^12..2^13 67 16 0
2^13..2^14 316 66 0
2^14..2^15 1267 273 0

I could go on, but Mathematica's speed is really a limiting factor here.

The discrepancies for (a) and (b) do differ — hence the loss of accuracy on some inputs, as @cuviper showed. However, (b) clearly seems to be superior to (a) (even if it unfortunately fails for some "well-known" identities).

Interestingly, the upcasting to f64 method seems fairly reliable (I wouldn't be surprised if there were a few inputs it rounded incorrectly on across the entire range, but it's significantly better than the other methods at the very least). If we accept these results, it does then raise @rkruppe's question of which is a greater evil: using f64 calculations for f32, or having some known inaccuracies in our approximation.

[1] This does rely on the correctness of the precision-handling functionality of Mathematica, but I couldn't find any literature on 0.5 ULP correct radians-to-degrees conversion implementations, so I think this is the most confident we're going to come without an (even greater) inordinate amount of effort for such a small feature.

@Badel2
Copy link
Contributor

Badel2 commented Mar 3, 2018

I just wanted to say that when I use f32 instead of f64 it is because I want to sacrifice precision for performance. It makes no sense to cast the f32 to f64 for an improvement in precision. f64 is the default float type, so when someone choses to use f32 instead, I doubt that precision loss will be a complain. Even though the performance loss from casting back and forth is likely negligible in x86_64, some microcontrollers may have a software implementation of floats, and basically it just seems wrong to have a mention of f64 in f32.rs. So I vote for closing #48617 and keeping the current improved implementation.

@Mark-Simulacrum Mark-Simulacrum added beta-nominated Nominated for backporting to the compiler in the beta channel. beta-accepted Accepted for backporting to the compiler in the beta channel. and removed beta-accepted Accepted for backporting to the compiler in the beta channel. labels Mar 4, 2018
@hanna-kruppe
Copy link
Contributor

@varkor Thanks for collecting data. Some questions:

  • Did you quantify the error or just check for match/no match? It seems unlikely that any of the implementations are more than 1 ULP off, but as I said earlier I'm not trusting any more gut feelings on this topic.
  • How are the inputs generated and bucketed specifically? Is 0..2^10 the bit patterns 0x00000000 ... 0x00000400, is it all the floats between 0 and 1024.0 (including fractional ones), is it the integers 0, 1, 2, ..., or somethin else entirely?

@varkor
Copy link
Member Author

varkor commented Mar 5, 2018

@rkruppe: I also checked to make sure none of the implementations were more than 1 ULP off. Fortunately none of them exceeded 1 ULP for the inputs I tested.

Oh, I should have clarified: I meant the bit patterns: 0x00000000..0x00000400 for 0..2^10 (exclusive of the endpoint).

@varkor
Copy link
Member Author

varkor commented Mar 8, 2018

I'm going to close this PR. If we decide that a different method is desirable (e.g. doing a division instead of a multiplication), then we can open another PR.

@varkor varkor closed this Mar 8, 2018
@nikomatsakis nikomatsakis removed the beta-nominated Nominated for backporting to the compiler in the beta channel. label Mar 15, 2018
@varkor varkor deleted the f32-to_degrees branch March 15, 2018 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants