Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upmath: trigonometric argument reduction fails for huge arguments #6794
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rsc
Nov 20, 2013
Contributor
2^120 may be an exact float64, but the float64s on either side are 2^68 away. The floating point rounding in that range is jumping over more multiples of 2pi than you can count in a uint64. If there is any uncertainty at all in the input, the output is meaningless. The answers being given all come down to the accuracy of the argument reduction (2^120 mod 2pi). You need around 120 bits of 2pi to even start getting significant bits of the cosine right, and you need 170 or so to get them all right. Very few implementations bother, because this only matters if you assume the argument was also accurate to at least 120 if not 170 or so bits of precision, which at that magnitude is very unlikely. 6g avoids use of x87 instructions, so it must compute cosine in software. The argument reduction in src/pkg/math/sin.go, which is a translation of C code from Netlib, says: // Partial loss of accuracy begins to occur at x = 2**30 = 1.074e9. The loss // is not gradual, but jumps suddenly to about 1 part in 10e7. Results may // be meaningless for x > 2**49 = 5.6e14. The software Cos(2^120) returns -Inf, presumably due to overflow during argument reduction. The software Sincos(2^120), which is different code, returns NaN, +Inf. These are obviously wrong but seem not much worse than picking an arbitrary value from [-1, +1] like the other implementations. 8g uses the x87 instruction, which is only defined to work well up to around 2^63. You report 8g returning 0.4779 on your machine, while mine (Intel Core i5) returns -0.5600625199559539. www.google.com/search?q=cos(2^120) says 0.47796506772 (and it uses floating point math), so presumably the machine running the calculator for my query has a CPU more like yours than like mine. http://www.wolframalpha.com/input/?i=Cos[2^120] says -0.719339800338651, but http://www.wolframalpha.com/input/?i=Cos[2^120-Floor[2^120/(2*pi)]*2*pi] says -0.9258790228548378673. echo 'c(2^120)' | bc -l says 0.89175216265252557587. (GNU bc) echo 'c(2^120)' | 9 bc -l says -.9258790228548378673. (Research Unix bc) It does appear that -0.9258790228548378673 is the right answer. The right fix is probably to translate the corresponding routines from the SunPro C library (now FreeBSD's C library) to Go, which is where we have obtained a bunch of the other functions in package math. If someone wants to do this, great. It might make sense to first verify that the C function gives the desired answer.
Labels changed: added priority-someday, removed priority-triage.
2^120 may be an exact float64, but the float64s on either side are 2^68 away. The floating point rounding in that range is jumping over more multiples of 2pi than you can count in a uint64. If there is any uncertainty at all in the input, the output is meaningless. The answers being given all come down to the accuracy of the argument reduction (2^120 mod 2pi). You need around 120 bits of 2pi to even start getting significant bits of the cosine right, and you need 170 or so to get them all right. Very few implementations bother, because this only matters if you assume the argument was also accurate to at least 120 if not 170 or so bits of precision, which at that magnitude is very unlikely. 6g avoids use of x87 instructions, so it must compute cosine in software. The argument reduction in src/pkg/math/sin.go, which is a translation of C code from Netlib, says: // Partial loss of accuracy begins to occur at x = 2**30 = 1.074e9. The loss // is not gradual, but jumps suddenly to about 1 part in 10e7. Results may // be meaningless for x > 2**49 = 5.6e14. The software Cos(2^120) returns -Inf, presumably due to overflow during argument reduction. The software Sincos(2^120), which is different code, returns NaN, +Inf. These are obviously wrong but seem not much worse than picking an arbitrary value from [-1, +1] like the other implementations. 8g uses the x87 instruction, which is only defined to work well up to around 2^63. You report 8g returning 0.4779 on your machine, while mine (Intel Core i5) returns -0.5600625199559539. www.google.com/search?q=cos(2^120) says 0.47796506772 (and it uses floating point math), so presumably the machine running the calculator for my query has a CPU more like yours than like mine. http://www.wolframalpha.com/input/?i=Cos[2^120] says -0.719339800338651, but http://www.wolframalpha.com/input/?i=Cos[2^120-Floor[2^120/(2*pi)]*2*pi] says -0.9258790228548378673. echo 'c(2^120)' | bc -l says 0.89175216265252557587. (GNU bc) echo 'c(2^120)' | 9 bc -l says -.9258790228548378673. (Research Unix bc) It does appear that -0.9258790228548378673 is the right answer. The right fix is probably to translate the corresponding routines from the SunPro C library (now FreeBSD's C library) to Go, which is where we have obtained a bunch of the other functions in package math. If someone wants to do this, great. It might make sense to first verify that the C function gives the desired answer. Labels changed: added priority-someday, removed priority-triage. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ianlancetaylor
Nov 20, 2013
Contributor
Using the 387 instructions on amd64 requires moving the floating point value from an SSE register to a 387 register and then back again. This is an inefficient transfer as it requires going through memory. But it could be done. I don't know why amd64 does not have a cosine hardware instruction, but in general these kinds of fancy machine instructions are not normally considered a good tradeoff these days.
Using the 387 instructions on amd64 requires moving the floating point value from an SSE register to a 387 register and then back again. This is an inefficient transfer as it requires going through memory. But it could be done. I don't know why amd64 does not have a cosine hardware instruction, but in general these kinds of fancy machine instructions are not normally considered a good tradeoff these days. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rsc
Nov 20, 2013
Contributor
The 6g toolchain does not expose the 387 instructions. I plan to merge the 8 and 6 architecture definitions at some point, and that will take care of that. But we haven't done that. If it were important we could use BYTE directives to get at those. The only function provided by the SSE2 instruction set, which is the floating point we use for 6g, is Sqrt, perhaps because it is much easier for hardware to compute than the trig functions. Although Intel claims the x87 trig instructions are okay to 2^63, in practice they're not even great at that big. If nothing else, Cos(2^1000) would require 1050 or so bits of pi for argument reduction. The real question is: do we care more about using the machine instruction or about correctness? We can't have both. (Of course, right now on 6g we have neither. But the answer would tell us which to fix.)
The 6g toolchain does not expose the 387 instructions. I plan to merge the 8 and 6 architecture definitions at some point, and that will take care of that. But we haven't done that. If it were important we could use BYTE directives to get at those. The only function provided by the SSE2 instruction set, which is the floating point we use for 6g, is Sqrt, perhaps because it is much easier for hardware to compute than the trig functions. Although Intel claims the x87 trig instructions are okay to 2^63, in practice they're not even great at that big. If nothing else, Cos(2^1000) would require 1050 or so bits of pi for argument reduction. The real question is: do we care more about using the machine instruction or about correctness? We can't have both. (Of course, right now on 6g we have neither. But the answer would tell us which to fix.) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rsc
Nov 20, 2013
Contributor
This site examines the Intel cosine accuracy in detail: http://notabs.org/fpuaccuracy/. It's not very good once you get away from small numbers. Apparently one Intel manual once admitted that the argument reduction for pi might use as few as 66 bits.
This site examines the Intel cosine accuracy in detail: http://notabs.org/fpuaccuracy/. It's not very good once you get away from small numbers. Apparently one Intel manual once admitted that the argument reduction for pi might use as few as 66 bits. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
Feb 20, 2014
@rsc, Wolfram Alpha gives the correct result in both cases; I think you've missed a quirk of Alpha: for 2^120, it assumes the input is in degrees, not radians. http://www.wolframalpha.com/input/?i=Cos[2^120] "Assuming trigonometric arguments in degrees | Use radians instead" You have to click on "radians" to get the correct behavior.
gopherbot
commented
Feb 20, 2014
@rsc, Wolfram Alpha gives the correct result in both cases; I think you've missed a quirk of Alpha: for 2^120, it assumes the input is in degrees, not radians. http://www.wolframalpha.com/input/?i=Cos[2^120] "Assuming trigonometric arguments in degrees | Use radians instead" You have to click on "radians" to get the correct behavior. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
Feb 20, 2014
I just checked the FreeBSD's msun library (as well as netlib/fdlibm); the important piece is a function that will compute x % pi/2 \approx r + dr where dr is the "tail"(a small number) and r is between [-pi/4, pi/4], in addition to the result of the division at the same time. The function's name is __ieee754_rem_pio2. Porting __ieee754_rem_pio2 (and __kernel_rem_pio2 called by it) over should be sufficient to fix this issue. Then there are routines __kernel_cos(r, dr) that assume these conditions on r and dr and don't have to worry about r being too large. http://www.netlib.org/fdlibm/k_cos.c
gopherbot
commented
Feb 20, 2014
I just checked the FreeBSD's msun library (as well as netlib/fdlibm); the important piece is a function that will compute x % pi/2 \approx r + dr where dr is the "tail"(a small number) and r is between [-pi/4, pi/4], in addition to the result of the division at the same time. The function's name is __ieee754_rem_pio2. Porting __ieee754_rem_pio2 (and __kernel_rem_pio2 called by it) over should be sufficient to fix this issue. Then there are routines __kernel_cos(r, dr) that assume these conditions on r and dr and don't have to worry about r being too large. http://www.netlib.org/fdlibm/k_cos.c |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
griesemer
Apr 9, 2014
Contributor
As pointed out by Russ, the issue is catastrophic errors introduced during argument reduction. For a list of relevant literature see also (duplicate) issue #7110.
As pointed out by Russ, the issue is catastrophic errors introduced during argument reduction. For a list of relevant literature see also (duplicate) issue #7110. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Issue #7110 has been merged into this issue. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Issue #8526 has been merged into this issue. |
lexprfuncall
added
accepted
labels
Aug 14, 2014
griesemer
referenced this issue
Dec 8, 2014
Closed
math: Sin returns value not in [-1, 1] for very large values #7110
minux
referenced this issue
Dec 8, 2014
Closed
math: Calls to math.Sin and math.Cos with values > 1e17 return values outside of [-1,1] #8526
rsc
added this to the Unplanned milestone
Apr 10, 2015
rsc
removed
priority-someday
labels
Apr 10, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dr2chase
May 31, 2017
Contributor
I'm pretty sure use of fdlibm would fix this. That's the case for Java, for instance, where I did an implementation of FP stuff once upon a time. 387 math uses a 66-bit approx of pi for arg reduction (bits 67 and 68 are zero), so things fall apart for inputs as small as sin(machine_rep_of_PI), where by "fall apart" I mean that a float64 answer has (about) 15 correct bits in its mantissa instead of the hoped-for (about) 53. (sin(machine_rep_of_pi) approx= pi - machine_rep_of_pi. mrop = first 53 bits of pi, appropriately rounded, so what the subtraction is supposed to give you is the NEXT 53 bits of pi (ignore rounding for now), but only 15 (68-53) are available, and the rest are zero.
This is substantially worse than the claimed precision at the time, which was one ulp.
Once upon a time, Sun's JIT would wrongly use the hardware instruction when it decided that your code was hot, which meant that it would work right at first, then change to wrong, and it would change to wrong faster on multiprocessors because the JIT compiler got a chance to run early.
|
I'm pretty sure use of fdlibm would fix this. That's the case for Java, for instance, where I did an implementation of FP stuff once upon a time. 387 math uses a 66-bit approx of pi for arg reduction (bits 67 and 68 are zero), so things fall apart for inputs as small as sin(machine_rep_of_PI), where by "fall apart" I mean that a float64 answer has (about) 15 correct bits in its mantissa instead of the hoped-for (about) 53. (sin(machine_rep_of_pi) approx= pi - machine_rep_of_pi. mrop = first 53 bits of pi, appropriately rounded, so what the subtraction is supposed to give you is the NEXT 53 bits of pi (ignore rounding for now), but only 15 (68-53) are available, and the rest are zero. This is substantially worse than the claimed precision at the time, which was one ulp. Once upon a time, Sun's JIT would wrongly use the hardware instruction when it decided that your code was hot, which meant that it would work right at first, then change to wrong, and it would change to wrong faster on multiprocessors because the JIT compiler got a chance to run early. |
rsc
referenced this issue
Jun 5, 2017
Closed
math: Exp(1.0) returns different values on amd64 vs arm64 #20319
rsc
modified the milestones:
Go1.9,
Unplanned
Jun 19, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
MichaelTJones
Jul 31, 2017
Contributor
The right answer is indeed -0.9258... as stated above.
cos(2^120) = -0.925879022854837867303861764107414946730833209928656460236049372618839291445382039698667367235188503605454959090834167176783445531614718194354868603064477383826066930073681426232927237014395564232737502670263136419514308982212308324981128756091768991384727116127182343614687859734231062568153628963751154387657011813354119526431689238651865868571354193068903448888710998317926597877246757860980497318
|
The right answer is indeed -0.9258... as stated above. cos(2^120) = -0.925879022854837867303861764107414946730833209928656460236049372618839291445382039698667367235188503605454959090834167176783445531614718194354868603064477383826066930073681426232927237014395564232737502670263136419514308982212308324981128756091768991384727116127182343614687859734231062568153628963751154387657011813354119526431689238651865868571354193068903448888710998317926597877246757860980497318 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
robpike
Jul 31, 2017
Contributor
Ivy says
-0.92587902285483786730386176410741494673083320992866
and it's got lots of bits (I ran with 1024, far more than 120) and uses a Taylor series. I trust the result.
|
Ivy says |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bmkessler
Sep 25, 2017
Contributor
Quoting the relevant literature list from #7110 to have it centralized here:
The problem is due to catastrophic errors introduced during argument reduction. There's
very little literature besides the classic Payne & Hanek argument reduction paper:
Payne & Hanek: "Radian reduction for trigonometric functions"
http://dl.acm.org/citation.cfm?id=1057602 (ACM paywall)
Sun's implementation:
K. C. Ng, "Argument Reduction for Huge Arguments: Good to the Last Bit"
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.5616&rep=rep1&type=pdf and:
http://www.validlab.com/arg.txt
A newer approach:
http://www.imada.sdu.dk/~kornerup/papers/RR2.pdf
http://perso.ens-lyon.fr/jean-michel.muller/RangeReductionIEEETC0305.pdf
Note, the last two papers actually fall back to use Payne & Hanek's method when the argument is >= 2^63-1 so it seems like Payne & Hanek is the method to use for very large arguments.
Also, since root cause of the issue was already identified, it might make sense to rename this issue to something more like "math: trigonometric argument reduction fails for huge arguments" or similar.
|
Quoting the relevant literature list from #7110 to have it centralized here:
Note, the last two papers actually fall back to use Payne & Hanek's method when the argument is >= 2^63-1 so it seems like Payne & Hanek is the method to use for very large arguments. Also, since root cause of the issue was already identified, it might make sense to rename this issue to something more like "math: trigonometric argument reduction fails for huge arguments" or similar. |
lexprfuncall commentedNov 20, 2013