# Multivariate Time Series Analysis


In [None]:
# Setup
import numpy as np
import matplotlib.pyplot as plt

## Vector White Noise

For $m=2$ timeseries in our vector ensemble:

$$
z^{(1)}_t = \epsilon^{(1)}_t
$$

$$
z^{(2)}_t = \epsilon^{(2)}_t
$$

In [None]:
N = 100 # number of timesteps
m = 3 # number of timeseries
mu = np.zeros(m) # vector of means
Sigma = np.array([[1.0, 0.8, 0.5], 
                  [0.8, 1.0, 0.3], 
                  [0.5, 0.3, 1.0]]) # covariance matrix

np.random.seed(123)
X = np.random.multivariate_normal(mu, Sigma, size=N)

In [None]:
print(X.shape)

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(12, 4), sharex=True)

# Plot each time series in a separate subplot
for i in range(m):
    axes[i].plot(X[:, i])
    axes[i].set_title(f"Series {i+1}")

# Display the plot
plt.tight_layout()
plt.show()

In [None]:
plt.title(f"Series 1 and Series 2, Covariance {Sigma[0,1]}")
plt.scatter(X[:, 0], X[:, 1])

In [None]:
plt.title(f"Series 1 and Series 3, Covariance {Sigma[0,2]}")
plt.scatter(X[:, 0], X[:, 2])

## Connection to Differentiation

**Forward Euler Approximation to Derivative:** for time step $h$

$$
\frac{y_{t+h} - y_{t}}{h}\approx \frac{dy}{dt}
$$

For exactly linear functions, this approximation is exact. Consider $h=1$ and the linear function $y=t$, so $\frac{dy}{dt} = 1$

In [None]:
t = np.arange(20)
y = x.copy()
y2 = np.diff(y, 1)

plt.plot(t, y, label="y=x")
plt.plot(t[1:], y2, label="First Difference")
plt.plot(t, np.repeat(1, len(t)), 'g--', alpha=.3, label="y=1")
plt.legend()

In [None]:
y2

## VAR(p)

Vector autoregression of order $p$

Multivariate time series $\pmb Z_t$, with $k$ components of length $N$:

$$
\pmb Z_t = 
\begin{pmatrix} 
z_1^{(1)} & z_2^{(1)} & \dots & z_N^{(1)} \\ 
\vdots    & \vdots & \ddots & \vdots   \\
z_1^{(k)} & z_2^{(k)} & \dots & z_N^{(k)} \\ 
\end{pmatrix}
$$

Mean vector $\pmb \mu$, $k\times 1$:
$$
\begin{pmatrix} \mu^{(1)} \\ \vdots \\ \mu^{(k)}\end{pmatrix}
$$

Process defined by:

$$
(\pmb Z_t - \pmb\mu) = \sum_{j=1}^p \pmb\Phi_j(\pmb Z_{t-j}-\pmb \mu) + \pmb a_j
$$

For vector white noise process $\pmb a_t$ with covariance matrix $\pmb \Sigma$

### Special Case: VAR(1)

$$
\pmb Z_t = \pmb\Phi \pmb Z_{t-1}+\pmb a_t
$$

Assume mean vector $\pmb \mu$ is zero for convenience. Suppose $k=2$:

$$
\pmb Z_t = 
\begin{bmatrix} 
\phi^{(11)} \phi^{(12)} \\
\phi^{(22)} \phi^{(22)}
\end{bmatrix} \pmb Z_t + 
\begin{bmatrix} a_t^{(1)} \\ a_t^{(2)}\end{bmatrix}
$$

So, written individually

$$
z_t^{(1)} = \phi^{(11)} z_{t-1}^{(1)} + \phi^{(12)} z_{t-1}^{(2)} + a_t^{(1)}
$$

$$
z_t^{(2)} = \phi^{(21)} z_{t-1}^{(1)} + \phi^{(22)} z_{t-1}^{(2)} + a_t^{(2)}
$$

Parameter Interpretation:

* $\phi^{11}, \phi^{22}$: dependence of time series on its own past
* $\phi^{12}, \phi^{21}$: dependence of time series on other individual components
    * If non-zero, there is a feedback relationship between components
    * If zero, no dynamic correlation between components.
        * Contemporaneously correlated if $\pmb \Sigma$ is non-diagonal
        * Uncorrelated if $\pmb \Sigma$ is diagonal

## Modeling Options

Different model specifications have different physical meanings.

### Shared Parameters, no vector components

A single ARX model: $z_t = \sum_{j=1}^p a_j z_{t-j} + \pmb \beta \pmb X + \epsilon_t, \quad \epsilon_t \sim N(0, \sigma^2)$

**Meaning**: independent realizations of the same process. Different external inputs are main drivers of differences between time series

**Pros**: simple, can train parameters on one set of time series and use them to forecast time series at unobserved locations. 

**Cons**: can't model direct interactions between response components, fixed random error for each time series may be unreasonable. 

### Shared ARX parameters, vector errors

Modify the random error of the ARX model: $z_t = \sum_{j=1}^p a_j z_{t-j} + \pmb \beta \pmb X + \pmb \epsilon_t, \quad \epsilon_t \sim N(\pmb 0, \pmb\Sigma)$

**Meaning**: deterministic dynamics are independent, but random errors are related. 

Subcases:
* Diagonal covariance matrix, $\pmb \Sigma$ is diagonal and $\pmb \epsilon_t = [\epsilon^{(1)}_t ,\dots, \epsilon^{(k)}_t]^T$. Means each time series has its own intrinsic error variance. This could make sense for different sensors at different locations that have potentially degraded over time or have different rates of error for whatever reason
* Non-diagonal covariance matrix: errors dependent at each instant. This makes sense with shared external inputs. A shared forecast error for the temperature, for example, might lead to correlated errors across locations.


**Cons**: can't model direct interactions between response components. Fixing dimensionality of error covariance means you can't easily apply a trained model to new locations

### Full VARX

$$
\pmb Z_t = \mu + \pmb A_1 \pmb Z_{t-1} + \dots + \pmb A_p \pmb Z_{t-p} + \pmb \epsilon_t, \quad \pmb \epsilon_t\sim N(\pmb 0, \pmb \Sigma)
$$

**Meaning**: future for one time series directly depends on it's own history and the history at other locations.

**Pros:** you could use this construction with lagged external inputs like the weather.

**Cons**: Fixing dimensionality of error covariance means you can't easily apply a trained model to new locations