# Long range dependence in the GOES data

A time series exhibits long range dependence if its autocorrelation function is not absolutely summable.  Any time series that we observe in practice is finite and therefore the sample autocorrelation function will always be absolutely summable.  Therefore, we need a more indirect way to assess whether a time series has long range dependence.  There are many approaches for doing this and it remains an area of active research.  Below we illustrate two methods for estimating the Hurst parameter using the GOES data.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from read import *

In [None]:
df = get_goes(2017)

A basic fact from elementary statistics is that the variance of the sample mean of $m$ iid observations is $\sigma^2/m$, where $\sigma^2$ is the variance of one observation.  If instead of iid data we have short range dependent stationary data, the variance of the sample mean of $m$ consecutive values is $k/m$, for a constant $k$ that depends on the autocovariances.  However if the data are long-range dependent, the variance of $m$ consecutive values has the form $km^{2(H-1)}$.  This gives rise to a means for estimating $H$ -- we can partition the observed series into blocks of size $m$, take the sample mean of each block, then take the variance of these sample means.  We can then do this for various values of $m$ and view the log variance in relation to the logarithm of the block size $m$.

In [None]:
def hurst_vs(df, nn, d, fluxvar="Flux1"):
    """Estimate the Hurst parameter using the variance scaling method."""

    r = np.zeros((len(nn), 2))
    for j, m in enumerate(nn):

        # Generate a matrix of non-overlapping blocks of
        # size m.
        _, flx = make_blocks(df, m, d, fluxvar=fluxvar)

        # Calculate the sample mean of each block.
        bm = flx.mean(1)

        # Take the sample variance of the block means.
        r[j, :] = [m, bm.var()]

    # Estimate the Hurst exponent from the variances of
    # the sample means.
    rl = np.log(r)
    cc = np.cov(rl[:, 0], rl[:, 1])
    b = cc[0, 1] / cc[0, 0]

    return 1 + b/2

Another more recent method for estimating the Hurst parameter is the [triangle total areas method](https://www.sciencedirect.com/science/article/pii/S0378437121005616).

In [None]:
def hurst_tta(z, nn):
    """Estimate the Hurst parameter using the triangle total areas method."""
    z = z - np.median(z)
    z = z / (np.quantile(z, 0.75) - np.quantile(z, 0.25))
    z = np.cumsum(z)
    def h(d):
        u = np.mean(np.abs(z[2*d::d] - 2*z[d:-d:d] + z[0:-2*d:d])) * d / 2
        return u
    f = np.asarray([h(x) for x in nn])
    cc = np.cov(np.log(f), np.log(nn))
    return cc[0, 1] / cc[1, 1] - 1

As a check, estimate the Hurst parameter for IID normal data (the true value of the Hurst parameter here is 1/2).

In [None]:
nn = [4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]
dx = df.iloc[0:100000, :].copy()
dx["Flux1"] = np.random.normal(size=100000)
h_vs = hurst_vs(dx, nn, 0)
h_tta = hurst_tta(dx["Flux1"].values, nn)
print("Estimated Hurst parameter for IID standard normal data:")
print(h_vs)
print(h_tta)

As another check, simulate correlated data with short-range dependence (the true value of the Hurst parameter is stil 1/2).

In [None]:
fx = np.random.normal(size=dx.shape[0])
r = 0.5
for i in range(1, len(fx)):
    fx[i] = r*fx[i-1] + np.sqrt(1 - r**2)*fx[i]
dx["Flux1"] = fx
h_vs = hurst_vs(dx, nn, 0)
h_tta = hurst_tta(dx["Flux1"].values, nn)
print("Estimated Hurst parameter for short-range dependent normal data:")
print(h_vs)
print(h_tta)

Now we can estimate the Hurst Parameter for the GOES data.

In [None]:
for dx in np.array_split(df, 20):
    h0 = hurst_vs(dx, nn, 0)
    h1 = hurst_vs(dx, nn, 1)
    h2 = hurst_tta(dx["Flux1"].values, nn)
    print([h0, h1, h2])

Interestingly, the results are largely invariant to log transforming the data.

In [None]:
df["Flux1_log"] = np.log(df["Flux1"] + 1e-5)
for dx in np.array_split(df, 20):
    h0 = hurst_vs(dx, nn, 0, fluxvar="Flux1_log")
    h1 = hurst_vs(dx, nn, 1, fluxvar="Flux1_log")
    h2 = hurst_tta(dx["Flux1_log"].values, nn)
    print([h0, h1, h2])