-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usage of sincos #41
Comments
Hmm. I need to add support for multiple return values (or, equivalently, returning a tuple and unpacking it). However, in case your curiosity is impatient and you doesn't want to wait for me to fix this issue: julia> using SLEEFPirates, SIMDPirates
julia> x = SVec(ntuple(Val(4)) do i Core.VecElement(rand()) end)
SVec{4,Float64}<0.25773116159740384, 0.6887968844827816, 0.3906818208655902, 0.5101711516658685>
julia> sin(x)
SVec{4,Float64}<0.25488730940177806, 0.6356088237080095, 0.38081894899915164, 0.4883266114084775>
julia> cos(x)
SVec{4,Float64}<0.966970764555952, 0.7720112843893673, 0.9246496244973993, 0.8726609425145104>
julia> sincos(x)
(SVec{4,Float64}<0.25488730940177806, 0.6356088237080095, 0.38081894899915164, 0.4883266114084775>, SVec{4,Float64}<0.966970764555952, 0.7720112843893673, 0.9246496244973993, 0.8726609425145105>)
julia> using BenchmarkTools
julia> @benchmark sincos($x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 21.699 ns (0.00% GC)
median time: 21.903 ns (0.00% GC)
mean time: 22.186 ns (0.00% GC)
maximum time: 333.447 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
julia> @benchmark sin($x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 20.016 ns (0.00% GC)
median time: 20.105 ns (0.00% GC)
mean time: 20.424 ns (0.00% GC)
maximum time: 326.284 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
julia> @benchmark cos($x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 20.389 ns (0.00% GC)
median time: 20.481 ns (0.00% GC)
mean time: 20.926 ns (0.00% GC)
maximum time: 226.778 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
julia> versioninfo()
Julia Version 1.5.0-DEV.168
Commit c4c4a13* (2020-01-28 16:23 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.0 (ORCJIT, haswell) To compare with the links, the CPU I tested on is closest to the Intel® Xeon® CPU E5-2699 v4. I assume "HA" and "LA" mean high and low accuracy, respectively? SLEEFPirates does far better than base Julia, but is still definitely suboptimal. Would be great to improve it or switch to a better library in the future. Note that the C-library "SLEEF" is now on major version 3, while SLEEFPirates.jl was forked from SLEEF.jl, which was based on C-SLEEF major version 2. |
Adding to this conversation and since you were interested in fixed-point sine and cosine approximation, I have written an article at nextjournal about my findings: https://nextjournal.com/zorn/fast-fixed-point-sine-and-cosine-approximation-with-julia |
@zsoerenm You may be interested in VectorizedRNG.jl, which I discussed before. I'm still using a floating point approximation of sin and cosine instead of fixed point, but I am using a very simple approximation I wrote. Floating point has the advantage of not needing to keep track of the exponents and lets you avoid all those shifts (although shifts are very fast). I used Remez.jl for my polynomial coefficients. My random
and Unfortunately, counting the number of leading zeros requires AVX512CD to vectorize, but this is still reasonably fast on AVX2. Putting these together, I get about a 2x improvement on |
Very interesting!
Sure, however, my requirement for accuracy is very low. 2 or 3 bit would already be enough. Unfortunately there is no build in Float16, that I can use to make things faster. Int16 was the best bet.
As far as I understand, it does a sort of least square fit, doesn't it? In that case you'll have to pay attention to the boundaries of the quarter sine / cosine approximation. Otherwise the function takes a jump every quarter of a circle. I have chosen the polynomials in such a way that the boundaries have a zero error. Here you'll find a solution for a sixth order polynomial, that has zero errors at the boundaries and let's you calculate sin and cos simultaneously: http://www.olliw.eu/2014/fast-functions/
How do you restrict the phase to be between 0 and 2pi? I found that the |
Ah, if you want to minimize the number of bits per number, that makes sense.
It does min-max fitting. That is, it minimizes the maximum error over the given range, which in my case was only the first of four quadrants.
The inputs are random 64-bit integers. I use the first two bits to ultimately pick the signs of julia> VectorizedRNG.mask(typemin(UInt), Float64)
1.0
julia> VectorizedRNG.mask(typemax(UInt), Float64)
1.9999999999999998 From there, I adjust into the open-open The sin approximation is scaled so that numbers in the range So no need for Additionally, because the purpose is as part of a random number generator, and because the only inputs are themselves these random unsigned integers, continuity isn't important. sincos(2pi*rand()) The boundary values are: julia> using VectorizedRNG
julia> u = Core.VecElement.((0x0000000000000000, 0x0fffffffffffffff))
(VecElement{UInt64}(0x0000000000000000), VecElement{UInt64}(0x0fffffffffffffff))
julia> VectorizedRNG.randsincos(u, Float64)
((VecElement{Float64}(3.9734697296186967e-7), VecElement{Float64}(1.000000397346973)), (VecElement{Float64}(1.000000397346973), VecElement{Float64}(3.973469727874718e-7))) So it doesn't match exactly at What I do want is to use a lot of bits to generate the random number, so that a long stream of them would be unique, and so that it passes RNG tests like those in RNGTest.jl. I should run it against BigCrush when I get home tonight, but SmallCrush isn't a problem: julia> using RNGTest, Distributions, Random, VectorizedRNG
julia> struct RandNormal01{T<:VectorizedRNG.AbstractPCG} <: Random.AbstractRNG
pcg::T
end
julia> function Random.rand!(r::RandNormal01, x::AbstractArray)
randn!(r.pcg, x)
x .= cdf.(Normal(0,1), x) # small crush tests uniform numbers, so transform normal to uniform
end
julia> rngnorm = RNGTest.wrap(RandNormal01(local_pcg()), Float64);
julia> RNGTest.smallcrushTestU01(rngnorm)
========= Summary results of SmallCrush =========
Version: TestU01 1.2.3
Generator:
Number of statistics: 15
Total CPU time: 00:00:15.92
All tests were passed EDIT: |
@zsoerenm julia> const MODMASK30 = (one(UInt32) << 30) - 0x01
julia> mod30(u) = u & MODMASK30
julia> u = rand(UInt32)
0xea0e3557
julia> mod30(u) === u % (UInt32(2)^30)
true This works for powers of 2 because julia> bitstring(one(UInt32) << 30)
"01000000000000000000000000000000"
julia> bitstring(one(UInt32) << 30 - one(UInt32))
"00111111111111111111111111111111" You're just setting all the higher bits to 0, which is the same as subtracting off all the higher powers of 2 to leave only the remainder. Although, if you use julia> rem30(u) = u % 0x40000000
rem30 (generic function with 1 method)
julia> @code_llvm debuginfo=:none rem30(u)
define i32 @julia_rem30_19002(i32) {
top:
%1 = and i32 %0, 1073741823
ret i32 %1
}
julia> reinterpret(Int32, MODMASK30)
1073741823 It's using |
Yes, I am aware of that. I also use it for the Look-up-Table comparison in https://nextjournal.com/zorn/fast-fixed-point-sine-and-cosine-approximation-with-julia and in the function |
Here are two variants of a function that I cannot figure how one can apply
@avx
in front offor
If I refer to https://software.intel.com/en-us/mkl-vmperfdata-sincos and https://software.intel.com/en-us/mkl-vmperfdata-sin , those pages suggest that not a big gain is available if
sin
andcos
are to be evaluated usingsincos
than separately. But I would like to test it myself.The text was updated successfully, but these errors were encountered: