# GOES autocorrelation analysis


[Autocorrelation](https://en.wikipedia.org/wiki/Autocorrelation) analysis considers the statistical dependence between elements of a time series in terms of the amount of elapsed time between them.

The usual measure of time series autocorrelation is based on Pearson correlation.  But for time series with potentially heavy-tailed marginal distributions, a robust form of autocorrelation analysis can be conducted using the [Kendall tau](https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient) autocorrelation.  Specifically, we can consider the [Kendall tau autocorrelation](https://escholarship.org/uc/item/7jt8s827) at lag $s$, between the series $X(t)$ and the shifted series $X(t+s)$.

An important point of consideration in any time series analysis is whether the time series is [stationary](https://en.wikipedia.org/wiki/Stationary_process).  If the time series is not stationary, it is difficult to interpret the autocorrelation function. 

The GOES series are very long (one year of data at a 2-second cadence gives over 14 million observations), and it is unlikely that they are stationary (for example, there is an 11 year cycle and any given year occupies a limited subinterval of this cycle).  To address this, we can consider blocks of consecutive observations taken from the much longer X-ray flux time series, and estimate the autocorrelation function for each block.  Below we consider blocks of around 4 hours in duration, consisting of around 8000 serial observations. The block size can be set below via the variable 'bs'.

This analysis is informative about local dependence within each block, as well as about the extent of non-stationarity in the full (year-long) time series.

In [None]:
import numpy as np
from read import *
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.nonparametric.smoothers_lowess import lowess

In [None]:
df = get_goes(2017)
df.head()

Set the block size.  If bs=8000 the overall time of each block is around 4.4 hours.

In [None]:
bs = 8000
2*bs / (60 * 60) # the number of hours in a block

The standard error of Kendall's tau for iid data.

In [None]:
tse = np.sqrt(2 * (2 * bs + 5) / (9 * bs * (bs - 1)))

Make blocks of 'bs' consecutive time points with approximately 2-second spacing.

In [None]:
tix, flx = make_blocks(df, bs, 0)
tix.shape

In [None]:
n, p = flx.shape

Consider autocorrelation at these time lags

In [None]:
dlags = np.arange(0, 200, 10)

Convert lags to time in minutes

In [None]:
dtime = dlags * 2 / 60

Calculate these quantiles across blocks of the autocorrelations.

In [None]:
pr = [0.25, 0.5, 0.75]

Get the autocorrelation for each block

In [None]:
def get_autocor(flx, randomize=False):
    n, p = flx.shape
    qd = np.zeros((n, len(dlags)))
    for (j,d) in enumerate(dlags):
        for i in range(flx.shape[0]):
            v = flx[i, :]
            if randomize:
                v = v.copy()
                np.random.shuffle(v)
            qd[i, j] = stats.kendalltau(v[0:p-d], v[d:]).correlation
    return qd
            
qd = get_autocor(flx)

Below is a spaghetti plot of a random subset of the block-wise autocorrelation functions.

In [None]:
plt.grid(True)
ii = np.random.choice(qd.shape[0], 100, replace=False)
for i in ii:
    plt.plot(dtime, qd[i, :], "-", color="grey", alpha=0.5)
plt.xlabel("Time lag (minutes)", size=15)
plt.ylabel("Tau autocorrelation", size=15)

Plot some pointwise quantiles of the autocorrelation functions.

In [None]:
plt.axes([0.1, 0.1, 0.72, 0.8])
plt.grid(True)
for i, p0 in enumerate(pr):
    qq = np.quantile(qd, p0, axis=0)
    plt.plot(dtime, qq, label="%.2f" % p0)
ha, lb = plt.gca().get_legend_handles_labels()
leg = plt.figlegend(ha, lb, loc="center right")
leg.draw_frame(False)
plt.xlabel("Time lag (minutes)", size=15)
plt.ylabel("Tau autocorrelation quantile", size=15)

For comparison, let's randomize the data and calculate autocorrelations when we know that the sequences of observations are independent.

In [None]:
qdr = get_autocor(flx, randomize=True)

plt.grid(True)
ii = np.random.choice(qdr.shape[0], 100, replace=False)
for i in ii:
    plt.plot(dtime, qdr[i, :], "-", color="grey", alpha=0.5)
plt.xlabel("Time lag (minutes)", size=15)
plt.ylabel("Tau autocorrelation", size=15)

PC decompose the estimated autcorrelation functions

In [None]:
qdm = qd.mean(0)
qdc = qd - qdm
u, s, vt = np.linalg.svd(qdc, 0)
v = vt.T
scores = np.dot(u, np.diag(s))

A basic analysis of the spectrum:

In [None]:
plt.grid(True)
pp = np.arange(1, len(qdm)+1)
plt.plot(np.log(pp)[:-2], np.log(s)[:-2], "-o")
plt.xlabel("Log position")
plt.ylabel("Log singular value")

Below we plot the corresponding loadings:

In [None]:
plt.grid(True)
for j in range(3):
    plt.plot(dtime, v[:, j], label="PC %d" % (j+1))
plt.figlegend() 
plt.xlabel("Time lag")
plt.ylabel("Loading")

Plot the mean autocorrelation +/- each PC.

In [None]:
for j in range(3):
    plt.clf()
    plt.ylim(0, 1)
    plt.grid(True)
    plt.title("PC %d" % (j + 1))
    sd = scores[:, j].std(0)
    plt.plot(dtime, qdm, '-', color="black")
    plt.plot(dtime, qdm + sd*v[:, j], '-', color="red")
    plt.plot(dtime, qdm - sd*v[:, j], '-', color="blue")
    plt.xlabel("Time lag (minutes)", size=15)
    plt.ylabel("Tau autocorrelation", size=15)
    plt.show()

Plot the PC scores for each factor

In [None]:
for j in range(3):
    yh = lowess(scores[:, j], tix[:, 0], frac=0.1)
    plt.clf()
    plt.axes([0.15, 0.1, 0.72, 0.8])
    plt.grid(True)
    plt.title("PC %d" % (j + 1))
    plt.plot(tix[:, 0], scores[:, j], '-', color="black")
    plt.plot(tix[:, 0], yh[:, 1], '-', color="red")
    plt.xlabel("Time", size=15)
    plt.ylabel("Score for factor %d" % (j + 1), size=15)
    plt.show()