# Comparing the False Positive Rate (Type I) Under the Null

The purpose of this simulation is to confirm the validity of the test, and thus, we expect the testing power to be close to the significance level $\alpha$. Here we use the independent AR(1) processes, and the sampling process is:

\begin{equation}
\begin{bmatrix}
X_t\\
Y_t
\end{bmatrix}
=
\begin{bmatrix}
\phi & 0\\
0 & \phi
\end{bmatrix}
\begin{bmatrix}
X_{t-1}\\
Y_{t-1}
\end{bmatrix} +
\begin{bmatrix}
\epsilon_t\\
\eta_t
\end{bmatrix},
\end{equation}

where $(\epsilon_t,\eta_t)$ is the noise generated by standard normal. For first experiment, we vary the length of time series from $n\in \{10, 20, 30, \ldots, 200\}$ with $\phi=0.5$. For second experiment, we vary the AR coefficient $\phi\in\{0.2,  0.25,\ldots, 0.95\}$ with $n=1200$. We use 1000 permutation per replication with 300 replications in total.

See here for wildHSIC and shiftHSIC computation done in matlab, and see here for data generation notebook.


In [16]:
import pandas as pd
import matplotlib.pyplot as plt
import scipy as sp
from pathlib import Path
from hyppo.time_series import DcorrX, MGCX, LjungBox
from joblib import Parallel, delayed

In [12]:
data = sp.io.loadmat("./data/1-independent_ar_n.mat")

X = data["X"]
Y = data["Y"]

n_reps = X.shape[0]

ns = list(range(10, 201, 10))

In [21]:
def worker(X, Y, test, reps=1000):
    n, d = X.shape

    res = test.test(X, Y, reps=reps)
    return n, d, res[1] # pvalue

In [22]:
test_dict = {
    "LjungBox": LjungBox,
    "DcorrX": DcorrX,
    "MGCX": MGCX
}

dfs = []

for test_name, test in test_dict.items():
    # if test_name == "LjungBox":
    #     auto=True
    # else:
    #     auto=False
    results = Parallel(-2, verbose=1)(delayed(worker)(X[i, :n, :], Y[i, :n, :], test(max_lag=1)) for n in [10, 20] for i in range(3))

    df = pd.DataFrame(results, columns=["n", "d", 'pval'])
    df['test'] = test_name
    dfs.append(df)

df = pd.concat(dfs, axis=0, ignore_index=True)
df.to_csv("./outs/indep_ar_n.csv", index=False)

[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.


[Parallel(n_jobs=-2)]: Done   2 out of   6 | elapsed:    1.7s remaining:    3.4s
[Parallel(n_jobs=-2)]: Done   3 out of   6 | elapsed:    1.7s remaining:    1.7s
[Parallel(n_jobs=-2)]: Done   4 out of   6 | elapsed:    1.7s remaining:    0.9s
[Parallel(n_jobs=-2)]: Done   6 out of   6 | elapsed:    1.7s remaining:    0.0s
[Parallel(n_jobs=-2)]: Done   6 out of   6 | elapsed:    1.7s finished
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Done   2 out of   6 | elapsed:   20.7s remaining:   41.4s
[Parallel(n_jobs=-2)]: Done   3 out of   6 | elapsed:   20.8s remaining:   20.8s
[Parallel(n_jobs=-2)]: Done   4 out of   6 | elapsed:   20.8s remaining:   10.4s
[Parallel(n_jobs=-2)]: Done   6 out of   6 | elapsed:   20.9s remaining:    0.0s
[Parallel(n_jobs=-2)]: Done   6 out of   6 | elapsed:   20.9s finished
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Done   2 out of   6 | elapsed:   