# Analysis of periodicity in GOES data

[Spectral analysis](https://en.wikipedia.org/wiki/Spectral_density_estimation) is an extensively-developed set of techniques for identifying the main periodic components of a time series.  The most common approaches are based on [Fourier analysis](https://en.wikipedia.org/wiki/Fourier_analysis), which shows that many functions can be written as linear combinations of sine and cosine functions that are mutually orthogonal.  This orthogonality permits the development of fast algorithms, including the [fast Fourier transformation](https://en.wikipedia.org/wiki/Fast_Fourier_transform) (FFT).  However if the time series are sampled on an irregular grid, these standard fast algorithms cannot be applied.  It remains possible to obtain an additive decomposition, and the calculations can be carried out using [least squares spectral analysis](https://en.wikipedia.org/wiki/Least-squares_spectral_analysis).

In [None]:
import numpy as np
from read import *
from scipy.signal import lombscargle
import matplotlib.pyplot as plt

Due to extreme skew (and likely heavy tails) we carry out the periodicity analysis on the log scale.

In [None]:
df = get_goes(2017)
df["Flux1_log"] = np.log(df["Flux1"] + 1e-7)

Arange the data into blocks containing mp observations, which is around mp/30 minutes of data.

In [None]:
mp = 5000
tix, flx = make_blocks(df, mp, 0, fluxvar="Flux1_log")
print(mp/30)

In [None]:
def make_plot(w, ma, ti):
    plt.clf()
    plt.grid(True)
    for m in ma:
        plt.plot(60*w, m)
    plt.ylabel("Energy", size=15)
    plt.xlabel("Cycles/minute (frequency)", size=15)
    plt.title(ti)
    plt.show()
    plt.clf()
    plt.grid(True)
    for m in ma:
        plt.plot(1/(60*w), m)
    plt.ylabel("Energy", size=15)
    plt.xlabel("Minutes/cycle (period)", size=15)
    plt.title(ti)
    plt.show()

We use simulated test data to be sure that we are interpreting the plot correctly.  In the test data, there is one sinusoidal component with additive "white noise".  The sinusoidal component has with 1 cycle every 10 seconds, or a frequency of 0.1 cycles per second, or 6 cycles per minute.  The first dataset has a very strong sinusoid and is easily detected.  The second dataset has much lower signal-to-noise ratio and is still detected, but with other frequencies having nearly as much energy as the true one.

In [None]:
period = 10     # True period
w = 1 / period  # True frequency
ti = np.sort(1000*np.random.uniform(size=100000)) # Irregular time points
for s in [10, 35]:
    flux1 = np.cos(w*2*np.pi*ti) + s*np.random.normal(size=ti.shape[0])
    wf = np.linspace(0.1, 2, 400)  # Frequencies to fit
    m = lombscargle(ti, flux1, wf, precenter=True, normalize=True)
    ww = wf / (2*np.pi)
    make_plot(ww, [m], "Simulated data (s=%.2f)" % s)

Below we calculate the spectral densities using least squares methods for the blocks of the overall series.

In [None]:
ma = []
w = np.linspace(0.01, 0.5, 800)  # Frequencies to fit
ww = w / (2*np.pi)
for k in range(flx.shape[0]):
    if k % 100 == 0:
        print(k)
    m = lombscargle(tix[k, :], flx[k, :], w, precenter=True, normalize=True)
    if np.any(np.isnan(m)):
        continue
    ma.append(m)
    if k >= 100:
        break # Stop here for speed

Each row of the matrix 'ma' below is one estimated spectral density, from one block of the Flux time series.

In [None]:
ma = np.asarray(ma)
ma.shape

Below is the mean of the block-wise spectra.

In [None]:
plt.clf()
plt.grid(True)
plt.plot(60*ww, ma.mean(0))
plt.xlabel("Cycles/minute (frequency)", size=15)
plt.ylabel("Energy", size=15)
plt.title("GOES-1")

The power levels are very right skewed, so we might want to transform them by a power transformation before proceeding.  The transformation below is the "Box-Cox" parameterization of the [power transform](https://en.wikipedia.org/wiki/Power_transform). As the exponent 'p' tends to zero, the transform becomes the log transform.

In [None]:
gm = np.exp(np.mean(np.log(ma)))
p = 0.001
xma = (ma**0.1 - 1) / (p * gm**(p-1))
xmn = xma.mean(0)
xmc = xma - xmn

plt.clf()
plt.grid(True)
plt.plot(60*ww, xmn, "blue")
plt.xlabel("Cycles/minute (frequency)", size=15)
plt.ylabel("Transformed energy", size=15)
plt.title("GOES-1")

In [None]:
u, s, vt = np.linalg.svd(xmc, 0)
v = vt.T
f = np.quantile(s[0] * u[:, 0], 0.75) - np.quantile(s[0] * u[:, 0], 0.25)

In [None]:
plt.clf()
plt.grid(True)
plt.plot(60*ww, xmn + f*v[:, 0], "-", color="grey")
plt.plot(60*ww, xmn - f*v[:, 0], "-", color="grey")
plt.plot(60*ww, xmn, "blue")
plt.xlabel("Cycles/minute (frequency)", size=15)
plt.ylabel("Transformed energy", size=15)
plt.title("GOES-1")

In [None]:
plt.plot(xmn, f*v[:, 0], "o", alpha=0.5)
plt.xlabel("Mean", size=12)
plt.ylabel("Scaled loadings", size=12)
plt.grid(True)