New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math: trigonometric argument reduction fails for huge arguments #6794

Open
lexprfuncall opened this Issue Nov 20, 2013 · 21 comments

Comments

Projects
None yet
10 participants
@lexprfuncall

lexprfuncall commented Nov 20, 2013

Here is a test program

http://play.golang.org/p/lROOy0Hekp

The value 2^120 is exactly represented as a float64.  A different, incorrect answer is
returned by 8g (0.4779...).  The right answer is returned by gccgo on an 64-bit x86
(-0.9258...).  The wrong answer is returned by gccgo on 32-bit x86 where the cos
function doesn't seem to compute anything and behaves like the identity function.
@ianlancetaylor

This comment has been minimized.

Show comment
Hide comment
@ianlancetaylor

ianlancetaylor Nov 20, 2013

Contributor

Comment 1:

GCC gives the same result as gccgo for an equivalent C program compiled with -m32
-funsafe-math-optimizations.  In both cases the program simply executes the fcos
instruction.  I'm not sure what the best gccgo fix will be.

Status changed to Accepted.

Contributor

ianlancetaylor commented Nov 20, 2013

Comment 1:

GCC gives the same result as gccgo for an equivalent C program compiled with -m32
-funsafe-math-optimizations.  In both cases the program simply executes the fcos
instruction.  I'm not sure what the best gccgo fix will be.

Status changed to Accepted.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Nov 20, 2013

Contributor

Comment 2:

2^120 may be an exact float64, but the float64s on either side are 2^68 away. The
floating point rounding in that range is jumping over more multiples of 2pi than you can
count in a uint64. If there is any uncertainty at all in the input, the output is
meaningless.
The answers being given all come down to the accuracy of the argument reduction (2^120
mod 2pi). You need around 120 bits of 2pi to even start getting significant bits of the
cosine right, and you need 170 or so to get them all right. Very few implementations
bother, because this only matters if you assume the argument was also accurate to at
least 120 if not 170 or so bits of precision, which at that magnitude is very unlikely.
6g avoids use of x87 instructions, so it must compute cosine in software. The argument
reduction in src/pkg/math/sin.go, which is a translation of C code from Netlib, says:
// Partial loss of accuracy begins to occur at x = 2**30 = 1.074e9.  The loss
// is not gradual, but jumps suddenly to about 1 part in 10e7.  Results may
// be meaningless for x > 2**49 = 5.6e14.
The software Cos(2^120) returns -Inf, presumably due to overflow during argument
reduction. 
The software Sincos(2^120), which is different code, returns NaN, +Inf.
These are obviously wrong but seem not much worse than picking an arbitrary value from
[-1, +1] like the other implementations.
8g uses the x87 instruction, which is only defined to work well up to around 2^63. You
report 8g returning 0.4779 on your machine, while mine (Intel Core i5) returns
-0.5600625199559539.
www.google.com/search?q=cos(2^120) says 0.47796506772 (and it uses floating point math),
so presumably the machine running the calculator for my query has a CPU more like yours
than like mine.
http://www.wolframalpha.com/input/?i=Cos[2^120] says -0.719339800338651, but 
http://www.wolframalpha.com/input/?i=Cos[2^120-Floor[2^120/(2*pi)]*2*pi] says
-0.9258790228548378673.
echo 'c(2^120)' | bc -l says 0.89175216265252557587. (GNU bc)
echo 'c(2^120)' | 9 bc -l says -.9258790228548378673. (Research Unix bc)
It does appear that -0.9258790228548378673 is the right answer. 
The right fix is probably to translate the corresponding routines from the SunPro C
library (now FreeBSD's C library) to Go, which is where we have obtained a bunch of the
other functions in package math. If someone wants to do this, great. It might make sense
to first verify that the C function gives the desired answer.

Labels changed: added priority-someday, removed priority-triage.

Contributor

rsc commented Nov 20, 2013

Comment 2:

2^120 may be an exact float64, but the float64s on either side are 2^68 away. The
floating point rounding in that range is jumping over more multiples of 2pi than you can
count in a uint64. If there is any uncertainty at all in the input, the output is
meaningless.
The answers being given all come down to the accuracy of the argument reduction (2^120
mod 2pi). You need around 120 bits of 2pi to even start getting significant bits of the
cosine right, and you need 170 or so to get them all right. Very few implementations
bother, because this only matters if you assume the argument was also accurate to at
least 120 if not 170 or so bits of precision, which at that magnitude is very unlikely.
6g avoids use of x87 instructions, so it must compute cosine in software. The argument
reduction in src/pkg/math/sin.go, which is a translation of C code from Netlib, says:
// Partial loss of accuracy begins to occur at x = 2**30 = 1.074e9.  The loss
// is not gradual, but jumps suddenly to about 1 part in 10e7.  Results may
// be meaningless for x > 2**49 = 5.6e14.
The software Cos(2^120) returns -Inf, presumably due to overflow during argument
reduction. 
The software Sincos(2^120), which is different code, returns NaN, +Inf.
These are obviously wrong but seem not much worse than picking an arbitrary value from
[-1, +1] like the other implementations.
8g uses the x87 instruction, which is only defined to work well up to around 2^63. You
report 8g returning 0.4779 on your machine, while mine (Intel Core i5) returns
-0.5600625199559539.
www.google.com/search?q=cos(2^120) says 0.47796506772 (and it uses floating point math),
so presumably the machine running the calculator for my query has a CPU more like yours
than like mine.
http://www.wolframalpha.com/input/?i=Cos[2^120] says -0.719339800338651, but 
http://www.wolframalpha.com/input/?i=Cos[2^120-Floor[2^120/(2*pi)]*2*pi] says
-0.9258790228548378673.
echo 'c(2^120)' | bc -l says 0.89175216265252557587. (GNU bc)
echo 'c(2^120)' | 9 bc -l says -.9258790228548378673. (Research Unix bc)
It does appear that -0.9258790228548378673 is the right answer. 
The right fix is probably to translate the corresponding routines from the SunPro C
library (now FreeBSD's C library) to Go, which is where we have obtained a bunch of the
other functions in package math. If someone wants to do this, great. It might make sense
to first verify that the C function gives the desired answer.

Labels changed: added priority-someday, removed priority-triage.

@robpike

This comment has been minimized.

Show comment
Hide comment
@robpike

robpike Nov 20, 2013

Contributor

Comment 3:

Why is it not OK to use 387 instructions on amd64? Or to ask the question another way,
why is there no suitable cosine hardware on amd64?
Contributor

robpike commented Nov 20, 2013

Comment 3:

Why is it not OK to use 387 instructions on amd64? Or to ask the question another way,
why is there no suitable cosine hardware on amd64?
@ianlancetaylor

This comment has been minimized.

Show comment
Hide comment
@ianlancetaylor

ianlancetaylor Nov 20, 2013

Contributor

Comment 4:

Using the 387 instructions on amd64 requires moving the floating point value from an SSE
register to a 387 register and then back again.  This is an inefficient transfer as it
requires going through memory.  But it could be done.
I don't know why amd64 does not have a cosine hardware instruction, but in general these
kinds of fancy machine instructions are not normally considered a good tradeoff these
days.
Contributor

ianlancetaylor commented Nov 20, 2013

Comment 4:

Using the 387 instructions on amd64 requires moving the floating point value from an SSE
register to a 387 register and then back again.  This is an inefficient transfer as it
requires going through memory.  But it could be done.
I don't know why amd64 does not have a cosine hardware instruction, but in general these
kinds of fancy machine instructions are not normally considered a good tradeoff these
days.
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Nov 20, 2013

Contributor

Comment 5:

The 6g toolchain does not expose the 387 instructions. I plan to merge the 8 and 6
architecture definitions at some point, and that will take care of that. But we haven't
done that. If it were important we could use BYTE directives to get at those. 
The only function provided by the SSE2 instruction set, which is the floating point we
use for 6g, is Sqrt, perhaps because it is much easier for hardware to compute than the
trig functions. Although Intel claims the x87 trig instructions are okay to 2^63, in
practice they're not even great at that big. If nothing else, Cos(2^1000) would require
1050 or so bits of pi for argument reduction.
The real question is: do we care more about using the machine instruction or about
correctness? We can't have both. (Of course, right now on 6g we have neither. But the
answer would tell us which to fix.)
Contributor

rsc commented Nov 20, 2013

Comment 5:

The 6g toolchain does not expose the 387 instructions. I plan to merge the 8 and 6
architecture definitions at some point, and that will take care of that. But we haven't
done that. If it were important we could use BYTE directives to get at those. 
The only function provided by the SSE2 instruction set, which is the floating point we
use for 6g, is Sqrt, perhaps because it is much easier for hardware to compute than the
trig functions. Although Intel claims the x87 trig instructions are okay to 2^63, in
practice they're not even great at that big. If nothing else, Cos(2^1000) would require
1050 or so bits of pi for argument reduction.
The real question is: do we care more about using the machine instruction or about
correctness? We can't have both. (Of course, right now on 6g we have neither. But the
answer would tell us which to fix.)
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Nov 20, 2013

Contributor

Comment 6:

This site examines the Intel cosine accuracy in detail:
http://notabs.org/fpuaccuracy/. It's not very good once you get away from
small numbers. Apparently one Intel manual once admitted that the argument
reduction for pi might use as few as 66 bits.
Contributor

rsc commented Nov 20, 2013

Comment 6:

This site examines the Intel cosine accuracy in detail:
http://notabs.org/fpuaccuracy/. It's not very good once you get away from
small numbers. Apparently one Intel manual once admitted that the argument
reduction for pi might use as few as 66 bits.
@robpike

This comment has been minimized.

Show comment
Hide comment
@robpike

robpike Nov 20, 2013

Contributor

Comment 7:

Surprised to hear you say 387 is not exposed on amd64, as I tried just copying the
assembler from sin_386.s to sin_adm64.s and all tests passed.
Contributor

robpike commented Nov 20, 2013

Comment 7:

Surprised to hear you say 387 is not exposed on amd64, as I tried just copying the
assembler from sin_386.s to sin_adm64.s and all tests passed.
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Nov 20, 2013

Contributor

Comment 8:

I am surprised, but that's great. Now the question is whether we want to
use it.
Contributor

rsc commented Nov 20, 2013

Comment 8:

I am surprised, but that's great. Now the question is whether we want to
use it.
@robpike

This comment has been minimized.

Show comment
Hide comment
@robpike

robpike Nov 20, 2013

Contributor

Comment 9:

Benchmark:
existing code:
BenchmarkCos    100000000           14.3 ns/op
using 387 assembler:
BenchmarkCos    50000000            31.7 ns/op
Contributor

robpike commented Nov 20, 2013

Comment 9:

Benchmark:
existing code:
BenchmarkCos    100000000           14.3 ns/op
using 387 assembler:
BenchmarkCos    50000000            31.7 ns/op
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Nov 27, 2013

Contributor

Comment 10:

Labels changed: added go1.3maybe.

Contributor

rsc commented Nov 27, 2013

Comment 10:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Dec 4, 2013

Contributor

Comment 11:

Labels changed: added release-none, removed go1.3maybe.

Contributor

rsc commented Dec 4, 2013

Comment 11:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Dec 4, 2013

Contributor

Comment 12:

Labels changed: added repo-main.

Contributor

rsc commented Dec 4, 2013

Comment 12:

Labels changed: added repo-main.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Feb 20, 2014

Comment 13 by rayneolivetti:

@rsc, Wolfram Alpha gives the correct result in both cases; I think you've missed a
quirk of Alpha: for 2^120, it assumes the input is in degrees, not radians.
http://www.wolframalpha.com/input/?i=Cos[2^120]
"Assuming trigonometric arguments in degrees | Use radians instead"
You have to click on "radians" to get the correct behavior.

gopherbot commented Feb 20, 2014

Comment 13 by rayneolivetti:

@rsc, Wolfram Alpha gives the correct result in both cases; I think you've missed a
quirk of Alpha: for 2^120, it assumes the input is in degrees, not radians.
http://www.wolframalpha.com/input/?i=Cos[2^120]
"Assuming trigonometric arguments in degrees | Use radians instead"
You have to click on "radians" to get the correct behavior.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Feb 20, 2014

Comment 14 by rayneolivetti:

I just checked the FreeBSD's msun library (as well as netlib/fdlibm); the important
piece is a function that will compute x % pi/2 \approx r + dr where dr is the "tail"(a
small number) and r is between [-pi/4, pi/4], in addition to the result of the division
at the same time. The function's name is __ieee754_rem_pio2.
Porting __ieee754_rem_pio2 (and __kernel_rem_pio2 called by it) over should be
sufficient to fix this issue.
Then there are routines __kernel_cos(r, dr) that assume these conditions on r and dr and
don't have to worry about r being too large.
http://www.netlib.org/fdlibm/k_cos.c

gopherbot commented Feb 20, 2014

Comment 14 by rayneolivetti:

I just checked the FreeBSD's msun library (as well as netlib/fdlibm); the important
piece is a function that will compute x % pi/2 \approx r + dr where dr is the "tail"(a
small number) and r is between [-pi/4, pi/4], in addition to the result of the division
at the same time. The function's name is __ieee754_rem_pio2.
Porting __ieee754_rem_pio2 (and __kernel_rem_pio2 called by it) over should be
sufficient to fix this issue.
Then there are routines __kernel_cos(r, dr) that assume these conditions on r and dr and
don't have to worry about r being too large.
http://www.netlib.org/fdlibm/k_cos.c
@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Apr 9, 2014

Contributor

Comment 15:

As pointed out by Russ, the issue is catastrophic errors introduced during argument
reduction. For a list of relevant literature see also (duplicate) issue #7110.
Contributor

griesemer commented Apr 9, 2014

Comment 15:

As pointed out by Russ, the issue is catastrophic errors introduced during argument
reduction. For a list of relevant literature see also (duplicate) issue #7110.
@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Apr 9, 2014

Contributor

Comment 16:

Issue #7110 has been merged into this issue.

Contributor

griesemer commented Apr 9, 2014

Comment 16:

Issue #7110 has been merged into this issue.

@minux

This comment has been minimized.

Show comment
Hide comment
@minux

minux Aug 14, 2014

Member

Comment 17:

Issue #8526 has been merged into this issue.

Member

minux commented Aug 14, 2014

Comment 17:

Issue #8526 has been merged into this issue.

@dr2chase

This comment has been minimized.

Show comment
Hide comment
@dr2chase

dr2chase May 31, 2017

Contributor

I'm pretty sure use of fdlibm would fix this. That's the case for Java, for instance, where I did an implementation of FP stuff once upon a time. 387 math uses a 66-bit approx of pi for arg reduction (bits 67 and 68 are zero), so things fall apart for inputs as small as sin(machine_rep_of_PI), where by "fall apart" I mean that a float64 answer has (about) 15 correct bits in its mantissa instead of the hoped-for (about) 53. (sin(machine_rep_of_pi) approx= pi - machine_rep_of_pi. mrop = first 53 bits of pi, appropriately rounded, so what the subtraction is supposed to give you is the NEXT 53 bits of pi (ignore rounding for now), but only 15 (68-53) are available, and the rest are zero.

This is substantially worse than the claimed precision at the time, which was one ulp.

Once upon a time, Sun's JIT would wrongly use the hardware instruction when it decided that your code was hot, which meant that it would work right at first, then change to wrong, and it would change to wrong faster on multiprocessors because the JIT compiler got a chance to run early.

Contributor

dr2chase commented May 31, 2017

I'm pretty sure use of fdlibm would fix this. That's the case for Java, for instance, where I did an implementation of FP stuff once upon a time. 387 math uses a 66-bit approx of pi for arg reduction (bits 67 and 68 are zero), so things fall apart for inputs as small as sin(machine_rep_of_PI), where by "fall apart" I mean that a float64 answer has (about) 15 correct bits in its mantissa instead of the hoped-for (about) 53. (sin(machine_rep_of_pi) approx= pi - machine_rep_of_pi. mrop = first 53 bits of pi, appropriately rounded, so what the subtraction is supposed to give you is the NEXT 53 bits of pi (ignore rounding for now), but only 15 (68-53) are available, and the rest are zero.

This is substantially worse than the claimed precision at the time, which was one ulp.

Once upon a time, Sun's JIT would wrongly use the hardware instruction when it decided that your code was hot, which meant that it would work right at first, then change to wrong, and it would change to wrong faster on multiprocessors because the JIT compiler got a chance to run early.

@MichaelTJones

This comment has been minimized.

Show comment
Hide comment
@MichaelTJones

MichaelTJones Jul 31, 2017

Contributor

The right answer is indeed -0.9258... as stated above.

cos(2^120) = -0.925879022854837867303861764107414946730833209928656460236049372618839291445382039698667367235188503605454959090834167176783445531614718194354868603064477383826066930073681426232927237014395564232737502670263136419514308982212308324981128756091768991384727116127182343614687859734231062568153628963751154387657011813354119526431689238651865868571354193068903448888710998317926597877246757860980497318

Contributor

MichaelTJones commented Jul 31, 2017

The right answer is indeed -0.9258... as stated above.

cos(2^120) = -0.925879022854837867303861764107414946730833209928656460236049372618839291445382039698667367235188503605454959090834167176783445531614718194354868603064477383826066930073681426232927237014395564232737502670263136419514308982212308324981128756091768991384727116127182343614687859734231062568153628963751154387657011813354119526431689238651865868571354193068903448888710998317926597877246757860980497318

@robpike

This comment has been minimized.

Show comment
Hide comment
@robpike

robpike Jul 31, 2017

Contributor

Ivy says
-0.92587902285483786730386176410741494673083320992866
and it's got lots of bits (I ran with 1024, far more than 120) and uses a Taylor series. I trust the result.

Contributor

robpike commented Jul 31, 2017

Ivy says
-0.92587902285483786730386176410741494673083320992866
and it's got lots of bits (I ran with 1024, far more than 120) and uses a Taylor series. I trust the result.

@bmkessler

This comment has been minimized.

Show comment
Hide comment
@bmkessler

bmkessler Sep 25, 2017

Contributor

Quoting the relevant literature list from #7110 to have it centralized here:

The problem is due to catastrophic errors introduced during argument reduction. There's
very little literature besides the classic Payne & Hanek argument reduction paper:

Payne & Hanek: "Radian reduction for trigonometric functions"
http://dl.acm.org/citation.cfm?id=1057602 (ACM paywall)
Sun's implementation:
K. C. Ng, "Argument Reduction for Huge Arguments: Good to the Last Bit"
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.5616&rep=rep1&type=pdf and:
http://www.validlab.com/arg.txt
A newer approach:
http://www.imada.sdu.dk/~kornerup/papers/RR2.pdf
http://perso.ens-lyon.fr/jean-michel.muller/RangeReductionIEEETC0305.pdf

Note, the last two papers actually fall back to use Payne & Hanek's method when the argument is >= 2^63-1 so it seems like Payne & Hanek is the method to use for very large arguments.

Also, since root cause of the issue was already identified, it might make sense to rename this issue to something more like "math: trigonometric argument reduction fails for huge arguments" or similar.

Contributor

bmkessler commented Sep 25, 2017

Quoting the relevant literature list from #7110 to have it centralized here:

The problem is due to catastrophic errors introduced during argument reduction. There's
very little literature besides the classic Payne & Hanek argument reduction paper:

Payne & Hanek: "Radian reduction for trigonometric functions"
http://dl.acm.org/citation.cfm?id=1057602 (ACM paywall)
Sun's implementation:
K. C. Ng, "Argument Reduction for Huge Arguments: Good to the Last Bit"
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.5616&rep=rep1&type=pdf and:
http://www.validlab.com/arg.txt
A newer approach:
http://www.imada.sdu.dk/~kornerup/papers/RR2.pdf
http://perso.ens-lyon.fr/jean-michel.muller/RangeReductionIEEETC0305.pdf

Note, the last two papers actually fall back to use Payne & Hanek's method when the argument is >= 2^63-1 so it seems like Payne & Hanek is the method to use for very large arguments.

Also, since root cause of the issue was already identified, it might make sense to rename this issue to something more like "math: trigonometric argument reduction fails for huge arguments" or similar.

@robpike robpike changed the title from math: math.Cos incorrectly returns -Inf for large value to math: trigonometric argument reduction fails for huge arguments Sep 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment