In the 2014 Thomson and Haley paper, space physics data are given that appear to have _large numbers of peaks_ that may signify the modulating effects of rather minor variations in the solar magnetic field on the solar wind. Given that this hypothesis may have far-reaching consequences, we quantify how many significant peaks a multitaper spectrum estimate computed on white noise ought to have. 

In [None]:
using Multitaper, Random, Plots
pyplot()
N = 1024
Random.seed!(123)
whitenoise = randn(N)
NW = 4.0; K = 7
S = multispec(whitenoise, NW = NW, K = K, a_weight = false, guts=true);

plot(S,yscale = :identity, label="MT, NW = 4.0")
plot!(S.f[[1,end]], ones(2,1), label = "True Spectrum")

A white noise process has a flat spectrum, and note that we have plotted the spectrum on a linear scale. The multitaper spectrum consists of $K$ spectrum estimates, each of which are $\chi^2_2$ distributed, which means their sum is $\chi^2_{2K}$ distributed and their average is $\Gamma(K, 1/K)$ distributed, so we can label the above spectrum with signficance levels. 

Note that _all significant peaks are spurious._

In [None]:
using Distributions
sig = [0.05, 0.5, 0.9, 0.99, 0.999, 0.9999]
z = quantile.(Gamma(K, 1/K), sig)

In [None]:
plot!(S.f[[1,end]], kron(ones(2,1),transpose(z)), label = sig', 
    line = :dash, legend = :topright)

So let's count the number of upcrossings that we get.

In [None]:
function upcrossing_times(x::Vector{Float64}, thresh::Float64)
  return findall(y -> y < 0, diff(x .- thresh .< 0.0))
end

scatter!(S.f[upcrossing_times(S.S,z[3])],[z[3]], label = "Upcrossings above $(100*sig[3])%")


Note the difference between the upcrossing counter, below.

In [None]:
function upcrossing_N(x::Vector{Float64}, thresh::Float64)
  return sum(y -> y < 0, diff(x .- thresh .< 0.0))
end

upcrossing_N(S.S, z[3])

And the crossing counter, below. We will use the crossing counter, generically. Note that the result of the crossing counter is often just double that of the upcrossing counter.

In [None]:
function ccount(x,thresh)
    return sum(abs.(diff(x .> thresh)))
end

ccount(S.S, z[3])

Let's count all of the levels.

In [None]:
crsngs = map(x->ccount(S.S,x), z)
hcat(sig*100,z,crsngs)

Great! 

Now, theoretically, a $\Gamma(K, 1/K)$ process is actually a process for which we know the level crossing rate. 
Barakat and others showed that the level crossing rate at each threshold is given by equation (4.2) in the Thomson and Haley paper. Note that this depends on the serial correlations and their second derivative evaluated at zero, eqn (4.3). 

One important thing about that formula is the idea of Rayleigh resolution. This is the smallest separation in frequency that could be resolvable from a sample of size N with sampling rate $\Delta t$. Be careful! The Rayleigh resolution is *always* $1/(N\Delta t)$ and _does not change_ with zero-padding.   

Continuing, we can evaluate (4.2) for the multitaper eventually arriving at (4.22) and (4.23), and in fact, the supplement to TH14 includes values of (4.22) and (4.23) pre-calculated. Note that when N is large, the calculations don't change much into the last decimal places, so we have used N = 1024 and extrapolated for the calculation of $\psi$. This is easy to verify for yourself. 

Now suppose we have a multitaper spectrum computed with $NW = CR = 4.0$, and $K = \alpha = 6$ tapers, the number of upcrossings we expect in 512 Rayleigh resolutions (From 0 to 0.5) is:

In [None]:
tab, lab = Multitaper.UCtable(K, NW, num_Ray = 512, sig=sig) 
println(join(lab,", "))
println(join(tab[3,:],", "))
println(join(tab[4,:],", "))

This particular function outputs a little table for you which contains the number of upcrossings for that particular multitaper estimate at that significance, prints the level z, and the dwell band, namely the number of Rayleigh resolutions over which we expect the process to stay above the level, on average. Let's add the results from our white noise process simulation to the table. Since the number of upcrossings is supposed to be a Poisson counting process, we can use square root of the number of upcrossings as the standard deviation. 

In [None]:
println(join(lab[1:3],", ")*", sigma, Gauss")
hcat(tab[:,1:3],sqrt.(tab[:,3]),crsngs)

One can compare this with Table 1 on page 7 of Thomson and Haley if we swap out 512 with 10000 for the number of Rayleighs. 

In [None]:
tab, lab = Multitaper.UCtable(K, NW, num_Ray = 100000, sig=sig)
tab

We get half as many upcrossings as are expected. Just for kicks, let's illustrate what happens when we average two multitaper spectrum estimates that have no overlap.

Now suppose we had wanted to use a periodogram instead. How many upcrossings would we get?

In [None]:
using FFTW

Pxx = abs2.(fft(whitenoise))/length(whitenoise)
freq = LinRange(0,1,length(whitenoise)+1)
halffreq = Int64(length(whitenoise)/2) + 1
plot(freq[2:halffreq],Pxx[2:halffreq])
z = quantile.(Chisq(2), [0.9])/2
plot!(freq[[1,halffreq]], kron(ones(2,1),transpose(z)), label = "90%", 
    line = :dash, legend = :topright)

In [None]:
E_pgram_crsngs = (N/2)*Multitaper.Pgram_upcrossings.(z, N) 

Compare with the number of spurious peaks given by the multitaper (20, above), and it is clear that you get a lot more false detections of line components if you use a periodogram. 

Finally, suppose you used a periodogram to estimate the spectrum. How many more spurious upcrossings would you get by not using a multitaper estimate? The answer to this is shown in Figure 2a) Here we replicate this figure so you can play with NW and K.

In [None]:
sig = LinRange(0.0001, 0.9, 100)
l = [0.5, 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, 0.002, 0.001, 0.0002, 0.0001]
m = [0.5, 0.2, 0.1, 0.05, 0.01, 0.001, 0.0001]

K = 8; NW = 5.0
Z = quantile.(Chisq(2*K), 1.0 .- sig)/(2*K)
U = Multitaper.MT_Upcrossings.(Z, K, NW, 1024)
plot(sig, U, xscale=:log10, xaxis = :flip, yscale=:log10, label="MT", xlabel = "Probability of exceeding level",
    ylabel = "Upcrossing Rate per Rayleigh", 
    yticks = (l,l), xticks=(m,m*100))

Z_pgram = quantile.(Chisq(2),1.0 .- sig)/2
NCP = Multitaper.Pgram_upcrossings.(Z_pgram, N) 
plot!(sig, NCP, xaxis = :flip, xscale =:log10, yscale=:log10, label= "Pgram")