Analysis of Financial Time Series, Third Edition Chapter 8

# Chapter 8 Multivariate Time Series Analysis and Its Applications


Price movements in one market affect another market. One must consider them **jointly** to better understand the **dynamic** structure of the global finance. One market may **lead** the other market. The next few chapters introduce econometric models and methods belonging to **vector or multivariate time series analysis** and useful for **studying jointly** multiple return series.

A [vector or] **multivariate time series** consists of **multiple single series** [each] referred to as **components**. 

**Boldface** indicates vectors and matrices. 

Appendix A discusses vector and matrices operations and properties. 

Appendix B discusses the **multivariate normal distribution** widely used in multivariate statistical analysis, (Johnson and Wichern, 1998).

Let:   
- $\large \boldsymbol{r}_t =(r_{1t},r_{2t}, \ldots ,r_{kt})' \text{ = log returns of k assets at time t}$ 

where:   
- $\large \boldsymbol{a}'$ denotes the transpose of $\boldsymbol{a}$

For example:  
$r_{1t}$ might denote the daily log return of IBM stock and $r_{2t}$ might denote the daily log return of Microsoft [at day or time t].

[So the above $r_t$ is the equal to the transposed of the horizontal; i.e. $r_t$ is vertical and is only at time t not the entire series.]

This chapter's goals are 
- (a) to explore the basic properties of $\boldsymbol{r}_t$ 
- (b) to study **econometric models** for analyzing the **multivariate data** $\{r_t | t = 1, \ldots, T \}$.
- (c) to discuss the direct generalization of previous chapters' models and methods to the multivariate case. 
- (d) discuss new models and methods required for complicated relationships between multiple series in order to form generalizations. 
- (c) discuss these issues with emphasis on intuition and applications. 

For statistical theory of multivariate time series analysis, readers are referred to Lutkepohl (2005) and Reinsel (1993).

## 8.1 WEAK STATIONARITY AND CROSS-CORRELATION MATRICES

A k-dimensional time series is denoted [as a vector that is transposed when represented in text where horizontal alignment is better formatting].

$\large \boldsymbol{r}_t = (r_{1t} \ldots, r_{kt})'$  

The series $\boldsymbol{r}_t$ is **weakly stationary** if its first and second moments [mean vector of all components and covariance matrix of all components] are time invariant: the **mean vector** and **covariance matrix** of a weakly stationary series are **constant over** time, and this book **assumes** that return series of financial assets are weakly stationary.

https://en.wikipedia.org/wiki/Moment_(mathematics)
[In mathematics, the moments of a function are quantitative measures related to the shape of the function's graph. If the function represents mass, then the first moment is the center of the mass, and the second moment is the rotational inertia. **If** the function is a **probability distribution**, then the **first moment** is the expected value, the **second central moment** is the variance, the **third standardized moment** is the skewness, and the **fourth standardized moment** is the kurtosis. The mathematical concept is closely related to the concept of moment in physics. [note: those are exact terms.  it isnt the "first central moment" or the "second standardized moment".]

For a weakly stationary time series $\boldsymbol{r}_t$, Tsay defines a **mean vector** and **covariance matrix** [as the expected value of the time series vector $r_t$ and the outerproduct of (a) the difference between $r_t \text{ and } \mu$ and (b) the same vector difference transposed:]

(8.1)
$$\large \boldsymbol{\mu} = E( \boldsymbol{r}_t ), \;\; \boldsymbol{\Gamma}_0 = E[( \boldsymbol{r}_t −  \boldsymbol{\mu} )( \boldsymbol{r}_t −  \boldsymbol{\mu} )']$$

![image.png](attachment:image.png)

where 
- Tsay writes $\large \boldsymbol{\mu} = (\mu_1,\ldots,\mu_k)' \text{ and } \boldsymbol{\Gamma}_0 = [\Gamma_{ij}(0)]$ when the **elements** are needed.
- the expectation is taken **element** by element over the **joint distribution** of $\boldsymbol{r}_t$.  

[element by element means component by component which imparts that each elements PDF computes its own expected value but then 'over the joint distribution' is not entirely clear if that means expectations are for joint probabilities of all combinations of componet values, but think I cleared this up later.]

[see expected value here: https://en.wikipedia.org/wiki/Multivariate_random_variable where says "components are random variables on the same probability space ($\large \Omega, F, P$) where $\Omega$ is **sample space** that I guess defines the ranges for each component, F is the **sigma-algebra** or **collection of all events** and P is the **probability measure** that returns even event's probability.  Again would imply joint probability, but then goes on to say $E[X] = (E[X_1], \ldots, E[X_n])^T so each component's expected value is evaluated separately, but possible that the probabilities that are used to evaluate the separates are from the joint PDF which would make sense.]



The mean $\boldsymbol{\mu}$ is a k-dimensional vector of the **unconditional expectations** of the components of $\boldsymbol{r}_t$. [Unconditional means not conditioned on what? It could mean as it does elsewhere not conditioned on prior values which would be confusing since we might expect to use all values to compute the mean but when e.g. lags are introduced the mean is computed from $\ell$ to T which means it could have been conditioned on the values in the series from t=zero to t=$\ell$, but it isn't.]


The covariance matrix $\boldsymbol{\Gamma}_0$ is a k × k matrix. [k = number of assets.]
- The ith diagonal element [= (i,i)th element] of $\boldsymbol{\Gamma}_0$ is the variance of $\boldsymbol{r}_{it}$, 
- The (i,j)th element of $\boldsymbol{\Gamma}_0$ is the covariance between  $\boldsymbol{r}_{it}$ and  $\boldsymbol{r}_{jt}$. [Recall that  $\boldsymbol{r}_{it}$ is the whole time series of i.]. 

## 8.1.1 Cross-Correlation Matrices

Cross-correlation matrices are used to measure the strength of linear dependence between time series.

Let D be a k × k diagonal matrix consisting of the standard deviations of $\large r_{it}$ for i = 1, ..., k: 

$$\large \boldsymbol{D} = \text{diag}\{\sqrt{\Gamma_{11}(0)}, \ldots,\sqrt{\Gamma_{kk}(0)}\}$$

![image-12.png](attachment:image-12.png)

Then, the **concurrent**, or lag-zero, **cross-correlation matrix** of $\large r_{it}$ is defined:

$$\large \boldsymbol{\rho}_0 \equiv [\rho_{ij}(0)] = \boldsymbol{D}^{-1} \boldsymbol{\Gamma_0}\boldsymbol{D}^{-1}$$ 

![image-10.png](attachment:image-10.png)

[because D on either side multiplies every correlation by its 2 relevant standard deviations from row and column respectively on L and R forms of D:]

$$\large \boldsymbol{D} \boldsymbol{\rho}_0 \boldsymbol{D} =  \boldsymbol{\Gamma_0}$$ 



More specifically, the (i,j)th element of $\boldsymbol{\rho}_0$ is the correlation coefficient between $r_{it} \text{ and } r_{jt}$:

$$\large \rho_{ij}(0) = \frac{\Gamma_{ij}(0)}{\sqrt{\Gamma_{ii}(0)\Gamma_{jj}(0)}} = \frac{\text{Cov}(r_{it}, r_{jt})}{std(r_{it})std(r_{jt})}$$

![image-9.png](attachment:image-9.png)

[where $std(r_{it})$ is the standard deviation of the i-th component's time series]

In **time series analysis**, such a correlation coefficient $\large \rho_{ij}(0)$ is referred to as a **concurrent**, or **contemporaneous**, correlation coefficient because it is the correlation of the two series **at time t**. 

$$\large \rho_{ij}(0)=\rho_{ji}(0),\;\; −1 ≤ \rho_{ij}(0) ≤ 1, \;\; \rho_{ii}(0) = \rho_{jj}(0) = 1 \text{ for } 1 ≤ i,j ≤ k$$ 

![image-11.png](attachment:image-11.png)

Thus, $\rho(0)$ is a **symmetric** matrix with **unit diagonal** elements.

**Lead–lag relationships between component series** are important in multivariate time series analysis [and are represented in **lag-$\ell$ cross correlation matrices**].

[Start with ] the **lag-$\ell$ cross correlation matrix** of $r_t$ is defined [by an outer product]:

(8.2)
$$\large \boldsymbol{\Gamma}_{\ell} \equiv [ \Gamma_{ij}(\ell)] = E[(\boldsymbol{r}_t − \boldsymbol{\mu})(\boldsymbol{r}_{t-\ell} - \boldsymbol{\mu})']$$

![image-8.png](attachment:image-8.png)

where: 
- $\boldsymbol{\mu}$ is the mean vector of $\boldsymbol{r}_t$. 

Therefore, the (i,j)th element of $\boldsymbol{\Gamma}_{\ell}$ is the covariance between $r_{it} \text{ and } r_{j,t-\ell}$. 

The cross-covariance matrix is [and its entries are] a function of the lag = $\ell$ and not a function of the time index t **for a weakly stationary series**. [Obviously, the entries are a function of the components, i and j, as well.]

The lag-$\ell$ cross-correlation matrix (**CCM**) of $r_t$ is defined as:

(8.3)
$$\large \boldsymbol{\rho} \equiv [\rho_{ij}(\ell)]= \boldsymbol{D}^{−1} \boldsymbol{\Gamma}_{\ell} \boldsymbol{D}^{−1}$$

![image-7.png](attachment:image-7.png)

where, as before with the no-lag cross correlation matrix:
- $\boldsymbol{D}$ is the diagonal matrix of standard deviations of the individual series $r_{it}$

From the definition, this [is almost the same except $\ell$ was zero], is the correlation coefficient between $r_{it} \text{ and } r_{j,t−\ell}$:

(8.4)

$$\large \rho_{ij}(\ell) = \frac{\Gamma_{ij}(\ell)}{\sqrt{\Gamma_{ii}(0)\Gamma_{jj}(0)}} = \frac{\text{Cov}(r_{it}, r_{j,t-\ell})}{std(r_{it})std(r_{jt})}$$

![image-6.png](attachment:image-6.png)

[Implications:]
- [In the notation, the second component subscripted j gets comes first and thus leads the component subscripted i which lags (follows) and thus is dependent.]    
- When $\ell > 0$, this correlation coefficient $\rho_{ij}(\ell)$ measures the linear dependence of $r_{it} \text{ on } r_{j,t−\ell}$ which occurred prior to time t. [Seems this correlation coefficient $\rho_{ij}(\ell)$ measures the linear dependence of $r_{it} \text{ on } r_{j,t−\ell}$ even when $\ell = 0$, but that Tsay's pointing to i preceding j in subscript: the $\ell$ lag goes with the second subscript, j, here.]   

- Consequently, if $\rho_{ij}(\ell) ≠ 0 \text{ and } \ell > 0$, the series $r_{jt}$ **"leads"** the series $r_{it}$ at lag $\ell$. 

- Similarly, correlation coefficient $\rho_{ji}(\ell)$ measures the linear dependence of $r_{jt} \text{ and } r_{i,t−\ell}$. [The $\ell$ lag goes with the second subscript, i, here.]

- If $\rho_{ji}(\ell) ≠ 0 \text{ and } \ell > 0$, the series $r_{it}$ **"leads"** the series $r_{jt}$ at lag $\ell$. 

[To be clear, the **second** and column subscript **leads** the first **row** subscript and it is the linear dependence of the **first** row subscript **on** the second **column** subscript of the lag-$\ell$ cross correlation matrix.  But there are other conditions for these to be exclusively true.]

- (8.4) also shows that the **diagonal** element $\rho_{ii}(\ell)$ is the lag-$\ell$ **autocorrelation** coefficient of $r_{it}$.

Important [lag-$\ell$] **cross correlation properties** [descend from these implications] when $\ell > 0$: 

$$\rho_{ij} (\ell) ≠ \rho_{ji} (\ell) \text{ for i ≠ j}$$ 

![image-5.png](attachment:image-5.png)

- First, **the matrices** $\boldsymbol{\Gamma}_{\ell} \text{ and } \boldsymbol{\rho}_{\ell}$ are **in general** not symmetric because the two correlation coefficients measure [two] different linear relationships [that exist] between [four different subsets of the two time series] {$r_{it}$} and {$r_{jt}$}:
    - $r_{i,t}$
    - $r_{j,t}$
    - $r_{i,t-\ell}$
    - $r_{j,t-\ell}$

$$\large \text{Cov}(r_{it},r_{j,t−\ell}) = \text{Cov}(r_{j,t−\ell},r_{it}) = \text{Cov}(r_{jt},r_{i,t+\ell}) = \text{Cov}(r_{jt},r_{i,t−(−\ell})$$

![image.png](attachment:image.png)

- Second, using [the property that] **Cov(x,y) = Cov(y,x)** and the [assumption of] weak stationarity, [notice here that Tsay is showing that switching the order of subscripts i and j in the 2nd, 3rd and 4th covariance keeps the same covariance quantity if i is always ahead of j]:
    - [if $\ell$ remains with the same subscript: $\ell$ keeps the same sign and remains with j in the second covariance equality where i and j change places relative to their position in the first equality.]
    - [if $\ell$ changes sign: $\ell$ changes sign as $\ell$ associates with i instead of j in the third and fourth covariance equalities where i and j change places relative to their position in the first equality.] 
    
[In all 4 cases covariance is measuring the dependence between the **same** subsets of components i and j time series; only the notated order is changing.]

Thus, ... 

$$\large \Gamma_{ij}(\ell) = \Gamma_{ji}(−\ell)$$

![image-2.png](attachment:image-2.png)

Because $\Gamma_{ji}(−\ell)$ is the (j,i)th element of the matrix $\boldsymbol{\Gamma}_{−\ell}$ and the equality holds for 1 ≤ i, j ≤ k, also true are:

$$\large \boldsymbol{\Gamma}_{\ell} = \boldsymbol{\Gamma}_{-\ell}'$$

$$\large \boldsymbol{\rho}_{\ell} = \boldsymbol{\rho}_{-\ell}'$$  

![image-3.png](attachment:image-3.png)

Consequently, **unlike the univariate case**, for a **general** **vector** time series when
$\ell > 0$:

$$\boldsymbol{\rho}_{\ell} = \boldsymbol{\rho}_{-\ell}'$$

![image-4.png](attachment:image-4.png)

Because $\boldsymbol{\rho}_{\ell} = \boldsymbol{\rho}_{-\ell}'$, it suffices in practice to consider the cross-correlation matrices $\boldsymbol{\rho}_{\ell} \text{ for } \ell ≥ 0$. [That is **don't** fuss with **negative** values for **lags** and **transposed** lagged **correlation** matrices (or **covariance matrices** for that matter).]

## 8.1.2 Linear Dependence

Considered **jointly**, 
the cross-correlation matrices ...

$$\large \{\boldsymbol{ρ} | \ell = 0, 1, \ldots\}$$ 

... of a **weakly stationary** vector time series contain the following information:

1. The diagonal elements $\large \{ρ_{ii}(\ell)| \ell= 0, 1, \ldots\}$ are the **autocorrelation function** of $r_{it}$. [Remember that the autocorrelation function describes a single value of i across the full set of values for lags $\ell$ that label each matrix].
2. The off-diagonal element $\large \rho_{ij}(0)$ measures the **concurrent linear relationship** between $r_{it} \text{ and } r_{jt}$ [as lag $\ell$ is held constant = zero].
3. For $\ell > 0$, the off-diagonal element $\large \rho_{ij}(\ell)$ measures the **linear dependence** of $r_{it} \text{ on the past value } r_{j,t−\ell}$.  [here "dependence" for $\ell > 0$ and in #2 "relationship" for $\ell = 0$.]

Therefore, ...

$\large \text{if } \rho_{ij}(\ell) = 0 \text{ for all } \ell > 0$, then $r_{it}$ does not **depend** linearly on any past value $r_{j,t−\ell}$ of the $r_{jt}$ series.

[May be worth noting that dependence can run the other way, expectation for a future event can cause an earlier event.]

In **general**, the linear **relationship** between two time series $\{r_{it}\}$ and $\{r_{jt}\}$ can be summarized as follows:
1. $\{r_{it}\} \text{ and } \{r_{jt}\}$ have no linear **relationship** if 

$$\large \rho_{ij}(\ell)=\rho_{ji}(\ell) = 0 \text{ for all } \ell ≥ 0$$

2. $\{r_{it}\} \text{ and } \{r_{jt}\}$ are **concurrently correlated** if 

$$\large \rho_{ij}(0) ≠ 0$$

3. $\{r_{it}\} \text{ and } \{r_{jt}\}$ are **"uncoupled"** defined by no **lead–lag relationship** if 

$$\large \rho_{ij}(\ell) = 0 \text{ and } \rho_{ji}(\ell) = 0 \text{ for all } \ell > 0$$

[Notice $\ell ≠ 0$ here.]

4. There is a **unidirectional relationship** from $\{r_{it}\} \text{ to } \{r_{jt}\}$ 
    - $r_{it}$ does not depend on any past value of $r_{jt}$
    - $r_{jt}$ does depend on some past values of $r_{it}$.

    if ...

$$\large \rho_{ij}(\ell) = 0 \text{ for all } \ell > 0 \text{, but } \rho_{ji}(\vee) ≠ 0 \text{ for some } \vee > 0.$$ 

5. There is a feedback relationship between $\{r_{it}\} \text{ and } \{r_{jt}\}$ if 

$$\large \rho_{ij}(\ell) ≠ 0 \text{ for some } \ell > 0 \text{, and } \rho_{ji}(\vee) ≠ 0 \text{ for some } \vee > 0.$$

The **conditions stated** earlier are **sufficient** conditions. A **more informative approach** to study the relationship between time series is to build a **multivariate model** for the series because a properly specified model considers **simultaneously the serial and cross correlations** among the series [though we are asked to untangle these combined metrics]. [So i may have autocorrelation and lead j in cross correlation. Which is the strongest factor?]

## 8.1.3 Sample Cross-Correlation Matrices

["cross" conotes a lag]

Given the data ...

$$\large \{\boldsymbol{r}_t | t = 1,\ldots, T \}$$, 

![image.png](attachment:image.png)

... the $\large \boldsymbol{\Gamma}_{\ell}$ **cross-covariance** matrix is estimated by:

(8.5)
$$\large \boldsymbol{\widehat{\Gamma}}_{\ell} = \frac{1}{T} \sum_{t=\ell+1}^T (\boldsymbol{r}_t − \bar{\boldsymbol{r}})(\boldsymbol{r}_{t-\ell} − \bar{\boldsymbol{r}})'$$

![image-2.png](attachment:image-2.png)

where 
- the vector of sample means is:

$$\large \bar{\boldsymbol{r}} = \frac{\left(\sum_{t=1}^T \boldsymbol{r}_t \right)}{T}$$

![image-3.png](attachment:image-3.png)

- [each $\boldsymbol{r}_t \text{ and } \boldsymbol{r}_{t-\ell}$ is a vector of component log return values, one component log return value for each asset at the subscripted time t or t-$\ell$.]
- [each of the summed items is a matrix created from the outer product of the unlagged vector's components' differences from vector of sample means and lagged vector's components' differences from vector of sample means.]


The $\large \boldsymbol{\rho}_{\ell}$ **cross-correlation** matrix is estimated by:

(8.6)
$$\large \boldsymbol{\widehat{\rho}}_{\ell} = \boldsymbol{\widehat{D}}^{-1} \boldsymbol{\widehat{\Gamma}} \boldsymbol{\widehat{D}}^{-1}, \;\; \ell ≥ 0$$

![image-4.png](attachment:image-4.png)

where: 
- $\boldsymbol{\widehat{D}}$ is the k × k **diagonal** matrix of the **sample** standard deviations of the component series.

Like the univariate case, 
- **asymptotic properties** of the sample cross-correlation matrix 
$\boldsymbol{\widehat{\rho}}_{\ell}$ are computed under assumptions, studied in detail by *Fuller (1976, Chapter 6)*. 
- The estimate $\boldsymbol{\widehat{\rho}}_{\ell}$ is **consistent** but **biased** in a finite sample. For asset return series, the **finite sample distribution** of $\boldsymbol{\widehat{\rho}}_{\ell}$ is complicated by the presence of conditional **heteroscedasticity and high kurtosis**. Proper **bootstrap resampling** methods to estimate the distribution are recommended for finite-sample distribution of cross correlations. A crude approximation of the variance of $\widehat{\rho}_{ij}(\ell)$ is sufficient.

[frpm Wiki "Consistent Estimator" paragraph on bias vs conssistent: Biased but consistent: Consistent imparts assymptotically the estimate converges to the true (population) value.  Bias imparts that estimates are reliably in the same way away from the true (population value).  Biased but consistent would be the case where the bias wanes assymptotically as for mean calc = 1/n * (sum x_i) + 1/n]

### Example 8.1. 

![image-3.png](attachment:image-3.png)

**Figure 8.1** Time plots of monthly log returns, in percentages, for (a) IBM stock and (b) the S&P 500 index from January 1926 to December 2008.

![image-2.png](attachment:image-2.png)

**Figure 8.2** Some scatterplots for monthly log returns of IBM stock and S&P 500 index: (a) concurrent plot of IBM vs. S&P 500, (b) S&P 500 vs. lag-1 IBM, (c) IBM vs. lag-1 S&P 500, and (d) S&P 500 vs. lag-1 S&P 500.

Consider 996 observations:
- percent log returns including dividend payments
    - $r_{1t}$ = IBM stock 
    - $r_{2t}$ = S&P 500 index 
- monthly Jan 1926 to Dec 2008 
- $r_t = (r_{1t},r_{2t})'$ is a **bivariate time series**
- $r_t$ is shown in a a timeplot in fig 8.1
- $r_{1t},r_{2t}$ are shown in scatterplots in fig 8.2
    - shows that $r_{1t},r_{2t}$ the two return series are **concurrently correlated**. 
    - the **sample concurrent correlation coefficient** [$\bar{\rho}_{12} =. 0.65$] between the two returns is statistically significant at the 5% level. 
    - the **cross correlations at lag 1** are weak if any.
- Table 8.1 provides 
    - summary statistics 
    - cross-correlation matrices (**CCM**) of $r_{1t},r_{2t}$ the two series [at lagged times]. 
        - For a **bivariate** series, each CCM is a 2 × 2 matrix with 4 correlations. 
        - **Tiao and Box**'s (1981) simplified CCM notation defines a cross-correlation matrix  with “+”, “−”, and “.”:
            1. (+) denotes a correlation coefficient $\bar{\rho}_{ij} ≥ \frac{2}{\sqrt{T}}$
            2. (-) denotes a correlation coefficient $\bar{\rho}_{ij} ≤ -\frac{2}{\sqrt{T}}$
            3. (.) denotes a correlation coefficient $-\frac{2}{\sqrt{T}} > \bar{\rho}_{ij} > \frac{2}{\sqrt{T}}$
        - $\frac{1}{T}$ = the **asymptotic** 5% critical value of the **sample** correlation under the **assumption** that $r_t$ is a white noise series. [Why do Tiao and Box use $\frac{2}{T}$?]
    - Table 8.1(c) shows simplified CCM for IBM stock and S&P 500 index monthly log returns
        - significant cross correlations at the approximate 5% level appear mainly at lags 1 and 3.  [The only CCM elements ≥ 0.10 are at lag 1 and 3.]
        - sample CCMs at these two lags indicates that:   
            (a) S&P 500 index returns have marginal [marginal means weak here?] autocorrelations at lags 1, 2, 3, and 5. [I see + or - in the (2,2) at lags 1, 3, 5, but not at lag 2?]     
            (b) IBM stock returns depend weakly on the previous returns of the S&P 500 index. [Columns lead rows; rows depend on columns: 2nd column is SP500 leading 1st row IBM; 1st row IBM depending on 2nd column SP500] See significant cross correlations in the (1,2)th element of lag-1, lag-2 and lag-5 CCMs.    
            (c) [SP 500 index returns do not depend on IBM returns at any lag; thus it is a **unidirectional relationship** with SP500 leading.]    


![image.png](attachment:image.png)

Figure 8.3 shows the sample autocorrelations [UL and LR] and cross correlations [UR and LL] of the two series $r_{1t},r_{2t}$. 
- UL plot shows IBM stock returns sample ACF
- UR plot shows IBM stock returns dependence on S&P 500 index lagged returns. 
- Dashed lines represent asymptotic two standard error limits for sample auto- and cross-correlation coefficients. 
- Dynamic relationship is weak between the two return series, but their contemporaneous correlation is statistically significant.  [Where is contemporaneous correlation shown?]


![image-4.png](attachment:image-4.png)

Figure 8.3 Sample auto- and cross-correlation functions (CCF) of two monthly log return series: (a) sample ACF of IBM stock returns, (b) cross-correlations between S&P 500 index and lagged IBM stock returns (lower left), (c) cross correlations between IBM stock and lagged S&P 500 index returns, and (d) sample ACF of S&P 500 index returns. Dashed lines denote 95% limits.

### Example 8.2. 

Consider CRSP database's 696 observations
- US gov bonds simple return
- monthly from Jan 1942 to Dec 1999
- $r_t = (r_{1t}, \ldots, r_{5t})'$ = return series vector with 30y 20y 10y 5y 1y decreasing time to maturity. 

![image.png](attachment:image.png)

Figure 8.4 time plots $r_t$ on the same scales. 1y simple return variability is smaller than longer maturity simple returns. 

The vector of sample means of the simple return time series is $\hat{\boldsymbol{\mu}} = 10^{−2}(0.43, 0.45, 0.45, 0.46, 0.44)'$ 

The vector of sample standard deviations of the simple return time series is $\hat{\boldsymbol{\sigma}} = 10^{−2}(2.53, 2.43, 1.97, 1.39, 0.53)'$

The **concurrent** correlation matrix of the series is

$$\large \widehat{\boldsymbol{\rho}}_{0} = \begin{bmatrix}
1.00 & 0.98 & 0.92 & 0.85 & 0.63 \\
0.98 & 1.00 & 0.92 & 0.86 & 0.64 \\
0.92 & 0.91 & 1.00 & 0.90 & 0.68 \\
0.85 & 0.86 & 0.90 & 1.00 & 0.82 \\
0.63 & 0.64 & 0.68 & 0.82 & 1.00 \\
\end{bmatrix}$$

![image-2.png](attachment:image-2.png)

Observations:
- The multivariate time series high concurrent correlations are not surprising. 
- longer-term bond correlations are higher[0.98, 0.92, 0.91] than those between short-term bonds [0.63, 0.64. 0.68].

Table 8.2 shows lag-1 and lag-2 cross-correlation matrices of $r_t$ and the corresponding simplified matrices: 
- Most of the significant cross correlations are at lag 1 [LHS].
- The five return series appear to be intercorrelated [except last row indicating 1y is not dependent on longer maturities and has no autocorrelation]. [rows are dependent on columns; columns lead rows.] 
- The 1-year bond returns sample ACFs at lag-1 [0.40] and lag-2 [0.22] are substantially higher than those of other series with longer maturities, [but the lag-1 1y series autocorrelation not significant?].

![image-3.png](attachment:image-3.png)

## 8.1.4 Multivariate Portmanteau Tests

Hosking (1980, 1981) and Li and McLeod (1981) 
generalized 
the univariate 
Ljung–Box statistic Q(m) 
to the multivariate case. 

The null hypothesis of the test statistic for a multivariate series is 

$$\large H_0 : \boldsymbol{\rho}_1 = \cdots = \boldsymbol{\rho}_m = \boldsymbol{0}$$

![image.png](attachment:image.png)

The alternative hypothesis for a multivariate series is:

$$\large H_a : \boldsymbol{\rho}_i ≠ 0 \text{ for some i } \in \{1, \ldots,m \}$$

![image-2.png](attachment:image-2.png)

The test statistic used to test the null hypothesis that there are **no auto- and cross correlations in the vector series** $\large \boldsymbol{r_t}$ has this form:

(8.7)

$$ \large Q_k(m) = T^2 \sum_{\ell = 1}^m \frac{1}{T-\ell} tr \left( \boldsymbol{\widehat{\Gamma}}_{\ell}' \boldsymbol{\widehat{\Gamma}}_0^{-1} \boldsymbol{\widehat{\Gamma}}_{\ell} \boldsymbol{\widehat{\Gamma}}_0^{-1} \right)$$

where 
- T is the sample size
- k is the dimension of $\large \boldsymbol{r}_t$
- tr($\boldsymbol{A}$) is the trace = the sum of the diagonal elements of the matrix A. 

$\large Q_k$(m) follows asymptotically a chi-squared distribution with $\large k^2m$ degrees of freedom, under the null hypothesis and some regularity conditions.

$\large Q_k$(m) statistics are written in terms of sample cross-correlation matrices $\large \boldsymbol{\widehat{\rho}}_{\ell}$, using the $\large \bigotimes$ Kronecker product and vectorization of matrices:

$$ \large Q_k(m) = T^2 \sum_{\ell=1}^m \frac{1}{T-\ell} \boldsymbol{b}_{\ell}' \left( \boldsymbol{\widehat{\rho}}_0^{-1} \bigotimes \boldsymbol{\widehat{\rho}}_0^{-1} \right)  \boldsymbol{b}_{\ell}$$

![image-3.png](attachment:image-3.png)

where
- $\large \boldsymbol{b}_{\ell} = \text{vec}(\boldsymbol{\widehat{\rho}}_{\ell}')$
- *Li and McLeod (1981)* propose this test statistic that is asymptotically equivalent to Q_k(m): 

$$ \large Q_k^*(m) = T \sum_{\ell=1}^m \boldsymbol{b}_{\ell}' \left( \boldsymbol{\widehat{\rho}}_0^{-1} \bigotimes \boldsymbol{\widehat{\rho}}_0^{-1} \right)  \boldsymbol{b}_{\ell} + \frac{k^2m(m+1)}{2T}$$

![image-4.png](attachment:image-4.png)

The $Q_k(m)$ statistics' portmanteau tests applied to the bivariate time series of monthly log returns from IBM stock and the S&P 500 index (Example 8.1) yields $Q_2(1) = 9.81, Q_2(5) = 47.06, Q_2(10) = 71.65$. Asymptotic chi-squared distributions with $k^2m$ degrees of freedom = 2^2(1) = 4, 2^2(5) = 20, and 2^2(10) = 40, compute p values for Q2(m) statistics = 0.044, 0.001, and 0.002, thus confirming the existence of **serial dependence** [auto or cross] **in the bivariate return series** at the 5% significance level. 

[$Q_2(1)$ is autocorrelation and cross correlation for the lag 1]

For the 5-dimensional monthly simple returns of bond indexes (Example 8.2), $Q_5(5) = 1065.63$, which is highly significant compared with a chi-squared distribution with $k^2(m) = 5^2(5) = 125$ degrees of freedom.

The $Q_k(m)$ statistic is a joint test for checking the first m cross-correlation matrices of $r_t$ being zero. **If [this portmanteau test] rejects the null hypothesis [that hypothesizes zero correlation], then we build a multivariate model for the series to study the lead–lag relationships between the component series.** Next, Tsay discusses simple vector models useful for modeling the linear dynamic structure of a multivariate financial time series.


# 8.2 VECTOR AUTOREGRESSIVE MODELS

If a multivariate time series $\boldsymbol{r}_t$ follows the vector autoregressive (VAR) model ...

(8.8)

$$\large \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t$$

![image.png](attachment:image.png)

... then it is a vector autoregressive (VAR) process of order 1, VAR(1).

where:
- $\large \boldsymbol{\phi}_0$ is a k-dimensional **vector**,
- $\large \boldsymbol{\phi}$ is a k × k **matrix**, 
- $\large \{\boldsymbol{a}_t\}$ is a sequence of 
    - **serially uncorrelated** 
    - **random vectors** 
    - with mean zero and 
    - with covariance matrix $\large \boldsymbol{\Sigma}$ 
        - that is, in application, required to be positive definite; otherwise, the dimension of $r_t$ can be reduced. 
        - [imagine that if $\large \boldsymbol{\Sigma}$ is not positive definite, then one of $\large \boldsymbol{\Sigma}$'s eigenvalues that is zero might pair with an eigenvector and reduce $\large \boldsymbol{r}_t$. Is this right?]
    - In the literature, $\large \boldsymbol{a}_t$ is often **assumed** to be **multivariate normal**.

In the bivariate case, $\large k=2$ dimensions 
- [the multivariate time series, the random vector, and the **intercept** parameter vectors vectors with 2 components:]

$\large \boldsymbol{r}_t = (r_{1t},r_{2t})'$
$\large \boldsymbol{a}_t = (r_{1t},a_{2t})'$
$\large \boldsymbol{\phi}_0 = (\phi_{10},\phi_{20})'$

![image-2.png](attachment:image-2.png)

- [each of the **coefficient** parameter matrices with k x k = $k^2$ = 4 components:]

$\large \boldsymbol{\phi} = \begin{bmatrix}
\phi_{11} & \phi_{12}\\
\phi_{21} & \phi_{22}
\end{bmatrix}$

The VAR(1) model from (8.8) consists of the following two equations [that just de-bolds variables and converts subscripts from `t` to `it` by adding an i subscript to denote the component]:

$$\large r_{1t} = \phi_{10} + \phi_{11} r_{1,t-1} + \phi_{12} r_{2,t-1} + a_{1t}$$

$$\large r_{2t} = \phi_{20} + \phi_{21} r_{1,t-1} + \phi_{22} r_{2,t-1} + a_{2t}$$

![image-3.png](attachment:image-3.png)

where:
- $\phi_{ij}$ is the (i,j)th element of the matrix $\boldsymbol{\phi}$ 
- $\phi_{i0}$ is the i-th element of the vector $\boldsymbol{\phi}_0$ 
- [i denotes component.]
- [j = 0 for the intercept parameter that is insensitive to the other component.]
- [these equations simply multiply out the component parameter matrix by the t-1 lagged return vector.]

Based on the first equation:
- $\large \phi_{12}$ denotes the linear **dependence** of $r_{1t} \text{ on } r_{2,t−1}$ in the presence of $\large r_{1,t−1}$ [whose impact is described by $\large \phi_{11}$]. 
- Therefore, $\large \phi_{12}$ is the **conditional effect** of $\large r_{2,t-1} \text{ on } r_{1t}$ given $\large r_{1,t−1}$. 
- If $\large \phi_{12} = 0$, then $r_{1t} \text{ does not depend on } r_{2,t−1}$ and the VAR(1) model in (8.8) shows that $\large \phi_{12} = 0$, then $r_{1t} \text{ only depends on its own past in } r_{1,t−1} \text{ described by } \phi_{11}$. 

Similarly, the second equation shows that:
- If $\large \phi_{21} = 0$, then $r_{2t} \text{ does not depend on } r_{1,t−1}$ when $\large r_{2,t−1}$ is given. 

Consider the two equations **jointly**; [you don't get these relationships unless jointly]:
- If $\large \phi_{12} = 0, \text{ and } \phi_{21} ≠ 0$, then there is a **unidirectional relationship** from $r_{1t} \text{ [leading] to [its dependent] } r_{2t}$ 
- If $\large \phi_{12} = \phi_{21} = 0$, then there is [no relationship between $r_{1t} \text{ and } r_{2t}$] and they are termed **uncoupled*
- If $\large \phi_{12} ≠ 0, \text{ and } \phi_{21} ≠ 0$, then there is a **feedback relationship** between $r_{1t} \text{ and } r_{2t}$.

## 8.2.1 Reduced and Structural Forms

In general, [every element of $\large \boldsymbol{\phi}$ crosses time, and thus] $\large \boldsymbol{\phi}$ the **coefficient** [parameter] matrix of (8.8) measures the **dynamic dependence** of $\large \boldsymbol{r}_t$. [These are **partial correlations**.]

The **concurrent** relationship between $\large r_{1t} \text{ and } r_{2t}$ is shown by the off-diagonal element $\large\sigma_{12}$ of the **covariance** matrix $\large \boldsymbol{\Sigma} \text{ of } a_t$. [Don't miss the point, $\large \boldsymbol{\Sigma} \text{ of } a_t$ shows concurrent relationships of $\large r_{it}$.] [He doesnt mention cross time same component relationships because those are measured as correlations and partial correlations in $\phi$.] 
- If $\sigma_{12} = 0$, then there is no **concurrent linear relationship** between the two component series, 1 and 2. 

Econometric literature refers to the VAR(1) model as a **reduced-form model** because (8.8) does not show explicitly the **concurrent dependence** between the component series $\large r_{1t} \text{ and } r_{2t}$. 
- The **structural form** explicitly expresses the **concurrent relationship**.  
- The **structural form** can be deduced from the reduced-form model by a simple linear transformation: 
    - Because the covariance matrix of $\large \boldsymbol{a_t} = \boldsymbol{\Sigma}$ is positive definite, 
        - there exists a lower triangular matrix $\large \boldsymbol{L}$ with unit diagonal elements and 
        - there exists a diagonal matrix $\large \boldsymbol{G}$ 
        - that factor $\large \boldsymbol{\Sigma}$ via Cholesky decomposition:
        
$$\large \boldsymbol{\Sigma} = \boldsymbol{L} \boldsymbol{G} \boldsymbol{L}'$$ 

or

$$\large \boldsymbol{L}^{-1}\boldsymbol{\Sigma}(\boldsymbol{L}')^{-1} = \boldsymbol{G}$$

[Might be simpler than the following to simply recognize hat $


[Without reason or interpretation, just as a vehicle at this point,] define:

$$\large \boldsymbol{b}_t = (b_{1t}, \ldots, b_{kt})' = L^{−1} \boldsymbol{a}_t$$

Then [given {$\boldsymbol{a}_t$} has mean zero] 

$\large E(\boldsymbol{b}_t) = L^{−1} E(\boldsymbol{a}_t) = \boldsymbol{0}$

[thus $\large \boldsymbol{b}_t$ has mean zero too, and so the covariance matrix is formed from the outerproduct:]

$\large \begin{align} Cov(\boldsymbol{b}_t)
& = [\boldsymbol{b}_t - E(\boldsymbol{b}_t)] [\boldsymbol{b}_t - E(\boldsymbol{b}_t)]'\\
& = (\boldsymbol{b}_t - \boldsymbol{0}) (\boldsymbol{b}_t - \boldsymbol{0})'\\
& = \boldsymbol{b}_t \boldsymbol{b}_t'\\
& = \boldsymbol{L}^{−1} \boldsymbol{a}_t (\boldsymbol{L}^{−1} \boldsymbol{a}_t)'\\
& = \boldsymbol{L}^{−1} \boldsymbol{a}_t \boldsymbol{a}_t' (\boldsymbol{L}^{−1})'\\
& = \boldsymbol{L}^{−1} \boldsymbol{\Sigma} (\boldsymbol{L}^{−1})'\\
& = \boldsymbol{L}^{−1} \boldsymbol{\Sigma} (\boldsymbol{L}')^{−1}\\
& = \boldsymbol{G}\\
\end{align}$

![image.png](attachment:image.png)

Since $\large \boldsymbol{G}$ is a diagonal [**covariance**] matrix, [$\large \boldsymbol{G}$'s off diagonals are zero, and thus the components of $\large \boldsymbol{b}_t$ are **uncorrelated**. 

Multiplying $\large \boldsymbol{L}^{−1}$ from the left to model (8.8), [multiplying (8.8) from the LHS by $\large \boldsymbol{L}^{−1}$] yields:

$$\large \begin{align} \boldsymbol{r}_t & = \boldsymbol{\phi}_0 + \boldsymbol{r}_{t-1} + \boldsymbol{a}_t \;\;&(8.8)\\
\boldsymbol{L}^{−1}\boldsymbol{r}_t & = \boldsymbol{L}^{−1}\boldsymbol{\phi}_0 + \boldsymbol{L}^{−1} \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{L}^{−1}\boldsymbol{a}_t \;\;&\\
& = \boldsymbol{\phi}_0^* + \boldsymbol{\phi}^* \boldsymbol{r}_{t-1} + \boldsymbol{b}_t \;\;&(8.9)
\end{align}$$

where:
- $\large \boldsymbol{\phi}_0^* = \boldsymbol{L}^{−1}\boldsymbol{\phi}_0$ is a newly defined k-dimensional **vector**.
- $\large \boldsymbol{\phi}^* = \boldsymbol{L}^{−1} \boldsymbol{\phi}$ is a newly defined (k x k)-dimensioned **matrix**.
- $\large \boldsymbol{b}_t = \boldsymbol{L}^{−1}\boldsymbol{a}_t$ is the already defined k-dimensional **vector**. 

Because of $\large \boldsymbol{L}$'s "special matrix structure", ["lower triangular", "with unit diagonal elements"], the kth [last] row of $\large \boldsymbol{L}$ is in the form:

$$\large (w_{k1}, w_{k2}, \ldots, w_{k,k−1}, 1)$$

[... which is already a row vector and so has no ' transpose notation.]

Consequently, the kth equation of model (8.9) is:

$\large \begin{align} 
\boldsymbol{L}^{−1}\boldsymbol{r}_t & = \boldsymbol{\phi}_0^* + \boldsymbol{\phi}^* \boldsymbol{r}_{t-1} + \boldsymbol{b}_t \; & (8.9)\\
(\boldsymbol{L}^{−1}\boldsymbol{r}_t)_{k,1:k} & = \boldsymbol{\phi}_{k,0}^* + (\boldsymbol{\phi}^* \boldsymbol{r}_{t-1})_{k,1:k} + \boldsymbol{b}_{k,t} \; &\\
(w_{k,1}, w_{k,2}, \ldots, w_{k,k−1}, 1)\boldsymbol{r}_t & = \boldsymbol{\phi}_{k,0}^* + (\boldsymbol{\phi}^* \boldsymbol{r}_{t-1})_{k,1:k} + \boldsymbol{b}_{k,t} \; &\\
(w_{k,1}, w_{k,2}, \ldots, w_{k,k−1})\boldsymbol{r}_{1:k-1,t} + (1) r_{k,t} & = \boldsymbol{\phi}_{k,0}^* + (\boldsymbol{\phi}^* \boldsymbol{r}_{t-1})_{k,1:k} + \boldsymbol{b}_{k,t} \; &\\
r_{k,t} + (w_{k,1}, w_{k,2}, \ldots, w_{k,k−1})\boldsymbol{r}_{1:k-1,t} & = \boldsymbol{\phi}_{k,0}^* + (\boldsymbol{\phi}^* \boldsymbol{r}_{t-1})_{k,1:k} + \boldsymbol{b}_{k,t} \; &\\
r_{k,t} + \sum_{i=1}^{k-1} w_{k,i} r_{i,t} & = \phi_{k,0}^* + \sum_{i=1}^k \phi_{k,i}^* r_{i,t-1} + b_{k,t} \; &\\
r_{k,t} + \sum_{i=1}^{k-1} w_{ki} r_{it} & = \phi_{k,0}^* + \sum_{i=1}^k \phi_{ki}^* r_{i,t-1} + b_{kt} \; & (8.10)\\
\end{align}
$

![image-2.png](attachment:image-2.png)

where 
- $\large \phi_{k,0}^*$ is the k-th element of $\large \boldsymbol{\phi}_0^*$ 
- $\large \phi_{k,i}^*$ is the (k,i)th element of $\large \boldsymbol{\phi}^*$ 

Because $\large b_{kt}$ is uncorrelated with $\large b_{it}$ for 1 ≤ i < k, [G is a diagonal covariance matrix], equation (8.10) shows [in $\large w_{ki}$] explicitly the concurrent linear dependence of $r_{kt} \text{ on } r_{it}$, where 1 ≤ i ≤ k−1. 

This equation is referred to as a **structural equation** for $\large r_{kt}$ in the **econometric** literature.

For any other component $\large r_{it} \text{ of } r_t$, [not sure this is neeeded because one can just take the appropriate rows of $\large L^{-1}$, but] rearrange the VAR(1) model so that $\large r_{it}$ becomes the last component of $\large r_{it}$. The prior transformation method can then be applied to obtain a structural equation for $\large r_{it}$. Therefore, the reduced-form model (8.8) is equivalent to the structural form [(8.10)] used in the econometric literature. In **time series analysis, the reduced-form model is commonly used** for two reasons. The first reason is **ease** in estimation. The second and main reason is that the **concurrent correlations cannot be used in forecasting**.

### Example 8.3. 

To illustrate the transformation 
from a reduced-form model 
to structural equations, 
consider the bivariate AR(1) model:

$$\large 
\begin{bmatrix}
r_{1t}\\
r_{2t}
\end{bmatrix}
=
\begin{bmatrix}
0.2\\
0.4
\end{bmatrix}
+
\begin{bmatrix}
0.2 & 0.3\\
-0.6 & 1.1
\end{bmatrix}
\begin{bmatrix}
r_{1,t-1}\\
r_{2.t-1}
\end{bmatrix}
+
\begin{bmatrix}
a_{1t}\\
a_{2t}
\end{bmatrix},
\;\;
\boldsymbol{\Sigma} 
= 
\begin{bmatrix}
2 & 1\\
1 &1
\end{bmatrix}
$$

A Cholesky decomposition factors this covariance matrix $\large \boldsymbol{\Sigma}$
using this $\large \boldsymbol{L}^{-1}$ lower triangular matrix:

$$\large  \boldsymbol{L}^{-1} = \begin{bmatrix}1.0 & 0.0 \\ −0.5 & 1.0 \end{bmatrix}$$

[which inverts to $\large\boldsymbol{L}$ as follows]:

$$\begin{align} \large  \boldsymbol{L} 
&= \frac{1}{|\boldsymbol{L}^{-1}|} \begin{bmatrix}d & -b \\ -c & a \end{bmatrix}\\
&= \frac{1}{a*d-b*c} \begin{bmatrix}d & -b \\ -c & a \end{bmatrix}\\
&= \frac{1}{(1.0)(1.0)-(0.0)(-0.5)} \begin{bmatrix}1.0 & 0.0 \\ 0.5 & 1.0 \end{bmatrix} = \begin{bmatrix}1.0 & 0.0 \\ 0.5 & 1.0 \end{bmatrix}\end{align}$$

$$\large  \boldsymbol{L}\large  \boldsymbol{L}^{-1} = \begin{bmatrix}1.0 & 0.0 \\ 0.5 & 1.0 \end{bmatrix} \begin{bmatrix}1.0 & 0.0 \\ −0.5 & 1.0 \end{bmatrix} = \begin{bmatrix}1 & \\  & 1 \end{bmatrix}$$

[and factors $\large \boldsymbol{\Sigma}$ as follows:]

$$\large \begin{align} \boldsymbol{\Sigma} 
& = \large\boldsymbol{L} \large\boldsymbol{G} \large\boldsymbol{L}'\\
\begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix}
& = \begin{bmatrix}1.0 & 0.0 \\ 0.5 & 1.0 \end{bmatrix}
\begin{bmatrix} 2 & 0 \\ 0 & 0.5 \end{bmatrix} 
\begin{bmatrix}1.0 & 0.5 \\ 0.0 & 1.0 \end{bmatrix}
\end{align}$$

[and factors $\large \boldsymbol{G}$ as follows:]

$$\large \begin{align} \boldsymbol{G} 
& = \large\boldsymbol{L}^{-1} \large\boldsymbol{\Sigma} (\large\boldsymbol{L}')^{-1}\\
\begin{bmatrix} 2 & 0 \\ 0 & 0.5 \end{bmatrix}
& = \begin{bmatrix}1.0 & 0.0 \\ -0.5 & 1.0 \end{bmatrix}
\begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} 
\begin{bmatrix}1.0 & -0.5 \\ 0.0 & 1.0 \end{bmatrix}
\end{align}$$

To find the structural form of this model, premultiply $\large \boldsymbol{L}^{-1}$ to this example's bivariate AR(1) model:

$$\begin{align}
\boldsymbol{r}_t &= \boldsymbol{\phi}_0 + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t \\
\begin{bmatrix}
r_{1t}\\
r_{2t}
\end{bmatrix}
&=
\begin{bmatrix}
0.2\\
0.4
\end{bmatrix}
+
\begin{bmatrix}
0.2 & 0.3\\
-0.6 & 1.1
\end{bmatrix}
\begin{bmatrix}
r_{1,t-1}\\
r_{2.t-1}
\end{bmatrix}
+
\begin{bmatrix}
a_{1t}\\
a_{2t}
\end{bmatrix}\\
\boldsymbol{L}^{-1} \boldsymbol{r}_t &= \boldsymbol{L}^{-1}\boldsymbol{\phi}_0 + \boldsymbol{L}^{-1}\boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{L}^{-1}\boldsymbol{a}_t \\
\begin{bmatrix}
1.0 & 0.0 \\ 
−0.5 & 1.0 
\end{bmatrix}
\begin{bmatrix}
r_{1t}\\
r_{2t}
\end{bmatrix}
&=
\begin{bmatrix}
1.0 & 0.0 \\ 
−0.5 & 1.0 
\end{bmatrix}
\begin{bmatrix}
0.2\\
0.4
\end{bmatrix}
+
\begin{bmatrix}
1.0 & 0.0 \\ 
−0.5 & 1.0 
\end{bmatrix}
\begin{bmatrix}
0.2 & 0.3\\
-0.6 & 1.1
\end{bmatrix}
\begin{bmatrix}
r_{1,t-1}\\
r_{2.t-1}
\end{bmatrix}
+
\begin{bmatrix}
1.0 & 0.0 \\ 
−0.5 & 1.0 
\end{bmatrix}
\begin{bmatrix}
a_{1t}\\
a_{2t}
\end{bmatrix}\\
&=
\begin{bmatrix}
0.2\\
0.3
\end{bmatrix}
+
\begin{bmatrix}
0.2 & 0.3 \\ 
−0.7 & 0.95 
\end{bmatrix}
\begin{bmatrix}
r_{1,t-1}\\
r_{2.t-1}
\end{bmatrix}
+
\begin{bmatrix}
a_{1t}\\
-0.5a_{1t}+a_{2t}
\end{bmatrix}\\
&=
\begin{bmatrix}
0.2\\
0.3
\end{bmatrix}
+
\begin{bmatrix}
0.2 & 0.3 \\ 
−0.7 & 0.95 
\end{bmatrix}
\begin{bmatrix}
r_{1,t-1}\\
r_{2.t-1}
\end{bmatrix}
+
\begin{bmatrix}
b_{1t}\\
b_{2t}
\end{bmatrix}
\;\;
\boldsymbol{G} 
= 
\begin{bmatrix}
2 & 1\\
1 &1
\end{bmatrix}\\
\boldsymbol{L}^{-1} \boldsymbol{r}_t &= \boldsymbol{\phi}_0^* + \boldsymbol{L}^{-1}\boldsymbol{\phi}^* \boldsymbol{r}_{t-1} + \boldsymbol{b}_t \\
\end{align}$$


where 
- $\large \boldsymbol{G} = \text{Cov}(b_t)$ for this structural form of the model innovation just as $\large \boldsymbol{\Sigma} = \text{Cov}(a_t)$ for the reduced form of the model. 
- The second equation of this transformed [structural] model gives:

$$ \large \begin{align}
\begin{bmatrix} -0.5 & 1.0 \end{bmatrix}
\begin{bmatrix} r_{1t} \\ r_{2t} \end{bmatrix}
& =
0.3
+ 
\begin{bmatrix} -0.7 & 0.95 \end{bmatrix}
\begin{bmatrix} r_{1,t-1} \\ r_{2,t-1} \end{bmatrix}
+
b_{2t}\\
-0.5 r_{1t} + 1.0 r_{2t} 
& =
0.3
-0.7 r_{1,t-1} + 0.95 r_{2,t-1} 
+
b_{2t}\\
r_{2t} 
& =
0.3
+0.5 r_{1t}
-0.7 r_{1,t-1} + 0.95 r_{2,t-1} 
+
b_{2t}
\end{align}
$$

This stuctural form, focused on the second equation, shows explicitly the linear dependence of $\large r_{2t}$ on $\large r_{1t}$ [via concurrent correlation coefficient = 0.5].

[One cannot simply read the first equation to obtain the structural formula for $\large r_{2t}$ that includes its dependence on $\large r_{1t}$ because the $\large \boldsymbol{L}^{-1}$ that premultiplies $\large \boldsymbol{r}_t$ is lower triangle and thus the zero in its UR corner eliminates $\large r_{2t}$ from the formula and thus any hint of how $\large r_{1t}$ is dependent on $\large r_{2t}$.]

$$ \large \begin{align}
\begin{bmatrix} 1.0 & 0.0 \end{bmatrix}
\begin{bmatrix} r_{1t} \\ r_{2t} \end{bmatrix}
& =
0.2
+ 
\begin{bmatrix} 0.2 & 0.3 \end{bmatrix}
\begin{bmatrix} r_{1,t-1} \\ r_{2,t-1} \end{bmatrix}
+
b_{2t}\\
1.0 r_{1t} + 0.0 r_{2t} 
& =
0.3
0.2 r_{1,t-1} + 0.3 r_{2,t-1} 
+
b_{2t}
\end{align}
$$

So, Rearrange the order of elements in the example model and covariance matrix for $\large \boldsymbol{r}_t$.

$$\large 
\begin{bmatrix}
r_{2t}\\
r_{1t}
\end{bmatrix}
=
\begin{bmatrix}
0.4\\
0.2
\end{bmatrix}
+
\begin{bmatrix}
-0.6 & 1.1\\
0.2 & 0.3
\end{bmatrix}
\begin{bmatrix}
r_{2,t-1}\\
r_{1.t-1}
\end{bmatrix}
+
\begin{bmatrix}
a_{2t}\\
a_{1t}
\end{bmatrix},
\;\;
\boldsymbol{\Sigma} 
= 
\begin{bmatrix}
1 & 1\\
2 & 1
\end{bmatrix}
$$

A Cholesky decomposition factors this covariance matrix $\large \boldsymbol{\Sigma}$
using this $\large \boldsymbol{L}^{-1}$ lower triangular matrix:

$$\large  \boldsymbol{L}^{-1} = \begin{bmatrix}1.0 & 0.0 \\ −1.0 & 1.0 \end{bmatrix}$$

[which inverts to $\large\boldsymbol{L}$ as follows]:

$$\begin{align} \large  \boldsymbol{L} 
&= \frac{1}{|\boldsymbol{L}^{-1}|} \begin{bmatrix}d & -b \\ -c & a \end{bmatrix}\\
&= \frac{1}{a*d-b*c} \begin{bmatrix}d & -b \\ -c & a \end{bmatrix}\\
&= \frac{1}{(1.0)(1.0)-(0.0)(-1.0)} \begin{bmatrix}1.0 & 0.0 \\ 1.0 & 1.0 \end{bmatrix} = \begin{bmatrix}1.0 & 0.0 \\ 1.0 & 1.0 \end{bmatrix}\end{align}$$

$$\large  \boldsymbol{L}\large  \boldsymbol{L}^{-1} = \begin{bmatrix}1.0 & 0.0 \\ 1.0 & 1.0  \end{bmatrix} \begin{bmatrix}1.0 & 0.0 \\ −1.0 & 1.0 \end{bmatrix} = \begin{bmatrix}1 & \\  & 1 \end{bmatrix}$$

[and factors $\large \boldsymbol{\Sigma}$ as follows:]

$$\large \begin{align} \boldsymbol{\Sigma} 
& = \large\boldsymbol{L} \large\boldsymbol{G} \large\boldsymbol{L}'\\
\begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix}
& = \begin{bmatrix}1.0 & 0.0 \\ 1.0 & 1.0 \end{bmatrix}
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} 
\begin{bmatrix}1.0 & 1.0 \\ 0.0 & 1.0 \end{bmatrix}
\end{align}$$

[and factors $\large \boldsymbol{G}$ as follows:]

$$\large \begin{align} \boldsymbol{G} 
& = \large\boldsymbol{L}^{-1} \large\boldsymbol{\Sigma} (\large\boldsymbol{L}')^{-1}\\
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
& = \begin{bmatrix}1.0 & 0.0 \\ -1.0 & 1.0 \end{bmatrix}
\begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix} 
\begin{bmatrix}1.0 & -1.0 \\ 0.0 & 1.0 \end{bmatrix}
\end{align}$$

To find the structural form of this model, premultiply $\large \boldsymbol{L}^{-1}$ to this example's bivariate AR(1) model:

$$\begin{align}
\boldsymbol{r}_t &= \boldsymbol{\phi}_0 + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t \\
\begin{bmatrix}
r_{2t}\\
r_{1t}
\end{bmatrix}
&=
\begin{bmatrix}
0.4\\
0.2
\end{bmatrix}
+
\begin{bmatrix}
-0.6 & 1.1 \\
0.2 & 0.3
\end{bmatrix}
\begin{bmatrix}
r_{2,t-1}\\
r_{1.t-1}
\end{bmatrix}
+
\begin{bmatrix}
a_{2t}\\
a_{1t}
\end{bmatrix}\\
\boldsymbol{L}^{-1} \boldsymbol{r}_t &= \boldsymbol{L}^{-1}\boldsymbol{\phi}_0 + \boldsymbol{L}^{-1}\boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{L}^{-1}\boldsymbol{a}_t \\
\begin{bmatrix}
1.0 & 0.0 \\ 
−1.0 & 1.0 
\end{bmatrix}
\begin{bmatrix}
r_{2t}\\
r_{1t}
\end{bmatrix}
&=
\begin{bmatrix}
1.0 & 0.0 \\ 
−1.0 & 1.0 
\end{bmatrix}
\begin{bmatrix}
0.4\\
0.2
\end{bmatrix}
+
\begin{bmatrix}
1.0 & 0.0 \\ 
−1.0 & 1.0 
\end{bmatrix}
\begin{bmatrix}
-0.6 & 1.1\\
0.2 & 0.3
\end{bmatrix}
\begin{bmatrix}
r_{2,t-1}\\
r_{1.t-1}
\end{bmatrix}
+
\begin{bmatrix}
1.0 & 0.0 \\ 
−1.0 & 1.0 
\end{bmatrix}
\begin{bmatrix}
a_{2t}\\
a_{1t}
\end{bmatrix}\\
&=
\begin{bmatrix}
0.4\\
-0.2
\end{bmatrix}
+
\begin{bmatrix}
1.1 & -0.6 \\ 
−0.8 & 0.8 
\end{bmatrix}
\begin{bmatrix}
r_{2,t-1}\\
r_{1.t-1}
\end{bmatrix}
+
\begin{bmatrix}
a_{2t}\\
-a_{2t}+a_{1t}
\end{bmatrix}\\
&=
\begin{bmatrix}
0.4\\
-0.2
\end{bmatrix}
+
\begin{bmatrix}
1.1 & -0.6 \\ 
−0.8 & 0.8 
\end{bmatrix}
\begin{bmatrix}
r_{2,t-1}\\
r_{1.t-1}
\end{bmatrix}
+
\begin{bmatrix}
c_{1t}\\
c_{1t}
\end{bmatrix}
\;\;
\boldsymbol{G} 
= 
\begin{bmatrix}
1 & 0\\
0 &1
\end{bmatrix}\\
\boldsymbol{L}^{-1} \boldsymbol{r}_t &= \boldsymbol{\phi}_0^* + \boldsymbol{L}^{-1}\boldsymbol{\phi}^* \boldsymbol{r}_{t-1} + \boldsymbol{c}_t \\
\end{align}$$

[Notice how $c_{1t}$ is the innovation for the first equation for $r_{2t}$.]

where 
- $\large \boldsymbol{G} = \text{Cov}(c_t)$ for this structural form of the model innovation just as $\large \boldsymbol{\Sigma} = \text{Cov}(a_t)$ for the reduced form of the model. 
- The second equation of this transformed [structural] model gives:

$$ \large \begin{align}
\begin{bmatrix} -1.0 & 1.0 \end{bmatrix}
\begin{bmatrix} r_{2t} \\ r_{1t} \end{bmatrix}
& =
-0.2
+ 
\begin{bmatrix} -0.8 & 0.8 \end{bmatrix}
\begin{bmatrix} r_{2,t-1} \\ r_{1,t-1} \end{bmatrix}
+
c_{2t}\\
-1.0 r_{2t} + 1.0 r_{1t} 
& =
-0.2
-0.8 r_{2,t-1} + 0.8 r_{1,t-1} 
+
c_{2t}\\
r_{1t} 
& =
-0.2
+1.0 r_{2t}
-0.8 r_{2,t-1} + 0.8 r_{1,t-1} 
+
c_{2t}
\end{align}
$$

This stuctural form, focused on the second equation, shows explicitly the linear dependence of $\large r_{1t}$ on $\large r_{2t}$ [via concurrent correlation coefficient = 1.0].

## 8.2.2 Stationarity Condition and Moments of a VAR(1) Model

[There's a lot of the assumptions and linear algebra that is known at the end but not incorporated into the middle of this]

**Assume** the VAR(1) model in (8.8) is **weakly stationary**. Taking expectation of the model, using $E(\boldsymbol{a}_t)$ = 0, and under weak stationarity $E(\boldsymbol{r}_t)$ is time invariant and thus = $E(\boldsymbol{r}_{t-1})$, Tsay obtains:

$$\large \begin{align} \boldsymbol{r}_t & = \boldsymbol{\phi}_0 + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t \;\; & (8.8) \\
E(\boldsymbol{r}_t) & = E(\boldsymbol{\phi}_0) + E(\boldsymbol{\phi} \boldsymbol{r}_{t-1}) + E(\boldsymbol{a}_t) & \\
E(\boldsymbol{r}_t) & = \boldsymbol{\phi}_0 + \boldsymbol{\phi} E(\boldsymbol{r}_{t-1}) + 0 & \\
E(\boldsymbol{r}_t) - \boldsymbol{\phi} E(\boldsymbol{r}_{t-1}) & = \boldsymbol{\phi}_0  & \\
\boldsymbol{I} E(\boldsymbol{r}_t) - \boldsymbol{\phi} E(\boldsymbol{r}_t) & = \boldsymbol{\phi}_0  & \\
(\boldsymbol{I} - \boldsymbol{\phi}) E(\boldsymbol{r}_t) & = \boldsymbol{\phi}_0  & \\
(\boldsymbol{I} - \boldsymbol{\phi})^{-1}(\boldsymbol{I} - \boldsymbol{\phi}) E(\boldsymbol{r}_t) & = (\boldsymbol{I} - \boldsymbol{\phi})^{-1}\boldsymbol{\phi}_0  & \\
E(\boldsymbol{r}_t) & = (\boldsymbol{I} - \boldsymbol{\phi})^{-1}\boldsymbol{\phi}_0  & \\
\boldsymbol{\mu}_t \equiv E(\boldsymbol{r}_t) & = (\boldsymbol{I} - \boldsymbol{\phi})^{-1}\boldsymbol{\phi}_0  & \\
\boldsymbol{\phi}_0 & = (\boldsymbol{I} - \boldsymbol{\phi})\boldsymbol{\mu}  & \\
\end{align}$$

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

provided that 
- the matrix $\large (\boldsymbol{I} - \boldsymbol{\phi})$ is nonsingular

From the VAR(1) model in (8.8), a model for the **mean-corrected time series** = $\large \boldsymbol{\tilde{r}}_t = \boldsymbol{r}_t - \boldsymbol{\mu}$, can be written by: 
- using $\large \boldsymbol{\phi}_0 = (\boldsymbol{I} - \boldsymbol{\phi})\boldsymbol{\mu}$, 
- and letting $\large \boldsymbol{\tilde{r}}_t = \boldsymbol{r}_t - \boldsymbol{\mu}$

$$\large \begin{align} \boldsymbol{r}_t & = \boldsymbol{\phi}_0 + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t \;\; & (8.8) \\
& = (\boldsymbol{I} - \boldsymbol{\phi})\boldsymbol{\mu} + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t & \\
& = \boldsymbol{\mu} - \boldsymbol{\phi} \boldsymbol{\mu} + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t & \\
\boldsymbol{r}_t - \boldsymbol{\mu} & = -\boldsymbol{\phi} \boldsymbol{\mu} + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t & \\
& = \boldsymbol{\phi} (-\boldsymbol{\mu} + \boldsymbol{r}_{t-1}) + \boldsymbol{a}_t & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi} (\boldsymbol{r}_{t-1} - \boldsymbol{\mu}) + \boldsymbol{a}_t & \\
& = \boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-1} + \boldsymbol{a}_t & (8.11)\\
\end{align}$$

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

This (8.11) model can be used to derive properties of a VAR(1) model. 

By repeated substitutions, we can rewrite (8.11) as

$$\large  \begin{align} \boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-1} + \boldsymbol{a}_t & (8.11)\\
\boldsymbol{\tilde{r}}_{t-1} & = \boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-2} + \boldsymbol{a}_{t-1} & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi} (\boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-2} + \boldsymbol{a}_{t-1}) + \boldsymbol{a}_t & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^2 \boldsymbol{\tilde{r}}_{t-2} + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{a}_t & \\
\boldsymbol{\tilde{r}}_{t-2} & = \boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-3} + \boldsymbol{a}_{t-2} & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^2 (\boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-3} + \boldsymbol{a}_{t-2}) + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{a}_t & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^3 \boldsymbol{\tilde{r}}_{t-3} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{a}_t & \\
\boldsymbol{\tilde{r}}_{t-3} & = \boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-4} + \boldsymbol{a}_{t-3} & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^3 (\boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-4} + \boldsymbol{a}_{t-3}) + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{a}_t & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^4 \boldsymbol{\tilde{r}}_{t-4} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{a}_t & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{a}_t + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \boldsymbol{\phi}^4 \boldsymbol{\tilde{r}}_{t-4} + \cdots  & \\
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^0 \boldsymbol{a}_t + \boldsymbol{\phi}^1 \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots  & \\\boldsymbol{\tilde{r}}_t & = \boldsymbol{a}_t + \boldsymbol{\phi}^1 \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots  & \\
\end{align}$$

![image-5.png](attachment:image-5.png)

This expression shows several characteristics of a VAR(1) process: 
- "First, since $\large a_t$ is **serially uncorrelated**, it follows that $\large \text{Cov}(\boldsymbol{a_t}, \boldsymbol{r_{t-1}}) = 0$" [the covariance of $\large \boldsymbol{a}_t$ with $\large \boldsymbol{r}_{t−1}$ is zero.][To see this, I resubscript the previous expression to $\large \boldsymbol{r_{t-1}}$, **post**-multiply by $\large \boldsymbol{a}_t'$ to obtain the outer product = covariance **matrix** from E($\large \boldsymbol{a}_t$ versus $\large \boldsymbol{r}_{t−1}$). The $\large E(\boldsymbol{\tilde{r}}_t) = E(\boldsymbol{r}_t - \boldsymbol{\mu}) = 0$ as does $\large E(\boldsymbol{a_t})$. Then I take that expectation across the whole expression. Here I am unsure: is it zero mean or serial **un**correlation that gives all the zero vectors?? Serial correlation enables this conclusion because all expectations of different time steps' innovations multiplied are correlations which are zero. $\large E(\boldsymbol{a}_t \boldsymbol{a}_{t-1}) = \boldsymbol{0}$: 

    [in this algebra, can eliminate teh steps r - mu on LHS since E(r~,a) is same as cov(r,a) since r~ = r - mu and a = a - its mean=0]
    
    $$\large  \begin{align} 
    \boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^0 \boldsymbol{a}_t + \boldsymbol{\phi}^1 \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots \\
    & = \boldsymbol{a}_t + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots \\
    \boldsymbol{\tilde{r}}_{t-1} & = \boldsymbol{a}_{t-1} + \boldsymbol{\phi} \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-3} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-4} + \cdots \\
    \boldsymbol{r}_{t-1} - \boldsymbol{\mu} & = \boldsymbol{a}_{t-1} + \boldsymbol{\phi} \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-3} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-4} + \cdots \\
    \boldsymbol{r}_{t-1} \boldsymbol{a}_t' - \boldsymbol{\mu} \boldsymbol{a}_t' & = \boldsymbol{a}_{t-1} \boldsymbol{a}_t' + \boldsymbol{\phi} \boldsymbol{a}_{t-2} \boldsymbol{a}_t' + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-3} \boldsymbol{a}_t' + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-4} \boldsymbol{a}_t' + \cdots \\
    E(\boldsymbol{r}_{t-1} \boldsymbol{a}_t') - E(\boldsymbol{\mu} \boldsymbol{a}_t') & = E(\boldsymbol{a}_{t-1} \boldsymbol{a}_t') + E(\boldsymbol{\phi} \boldsymbol{a}_{t-2} \boldsymbol{a}_t') + E(\boldsymbol{\phi}^2 \boldsymbol{a}_{t-3} \boldsymbol{a}_t') + E(\boldsymbol{\phi}^3 \boldsymbol{a}_{t-4} \boldsymbol{a}_t') + \cdots \\
    E(\boldsymbol{r}_{t-1} \boldsymbol{a}_t') - \boldsymbol{\mu} E(\boldsymbol{a}_t') & = E(\boldsymbol{a}_{t-1} \boldsymbol{a}_t') + \boldsymbol{\phi} E(\boldsymbol{a}_{t-2} \boldsymbol{a}_t') + \boldsymbol{\phi}^2 E(\boldsymbol{a}_{t-3}\boldsymbol{a}_t') + \boldsymbol{\phi}^3 E(\boldsymbol{a}_{t-4} \boldsymbol{a}_t') + \cdots \\
    E(\boldsymbol{r}_{t-1} \boldsymbol{a}_t') - \boldsymbol{\mu} (\boldsymbol{0}) & = (\boldsymbol{0}) + \boldsymbol{\phi} (\boldsymbol{0}) + \boldsymbol{\phi}^2 (\boldsymbol{0}) + \boldsymbol{\phi}^3 (\boldsymbol{0}) + \cdots \\
    \text{Cov}(\boldsymbol{r}_{t−1}, \boldsymbol{a}_t) - (\boldsymbol{0}) & = (\boldsymbol{0}) + (\boldsymbol{0}) + (\boldsymbol{0}) + (\boldsymbol{0}) + \cdots \\
    \text{Cov}(\boldsymbol{a}_t,\boldsymbol{r}_{t−1}) = \boldsymbol{0} \end{align}$$  
    
    ![image-6.png](attachment:image-6.png)
    
    - In fact, $\large \boldsymbol{a}_t$ is not correlated with $\large \boldsymbol{r}_{t−\ell} \text{ for all } \ell > 0$. [Above, all subscripts are reduced by 1 in the 2nd modification to the prior expression above, but all subscripts could equivalently have been reduced by $\large \ell$.]  
    - For this reason, $\large \boldsymbol{a}_t$ is referred to as the **shock or innovation** of the series at time t. 
    - Similar to the **univariate case**, $\large \boldsymbol{a}_t$ is uncorrelated with the past value $\large \boldsymbol{r}_{t−j} (j>0)$ for **all time series models** [not just this VAR(1) model as pointed out in the last bullet point]. 

- Second, [without resubscripting, employing the same steps employed for the first point, with the same confusion?], postmultiply the expression by $\large \boldsymbol{a}_t'$, take expectation, and use the fact of [zero expected value for and] no serial correlations in the $\large \boldsymbol{a}_t$ process to obtain:

    $$\large  \begin{align} 
    \boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^0 \boldsymbol{a}_t + \boldsymbol{\phi}^1 \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots \\
    & = \boldsymbol{a}_t + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots \\
    \boldsymbol{r}_t - \boldsymbol{\mu} & = \boldsymbol{a}_t + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots \\
    \boldsymbol{r}_t \boldsymbol{a}_t' - \boldsymbol{\mu} \boldsymbol{a}_t' & = \boldsymbol{a}_t \boldsymbol{a}_t' + \boldsymbol{\phi} \boldsymbol{a}_{t-1} \boldsymbol{a}_t' + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} \boldsymbol{a}_t' + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} \boldsymbol{a}_t' + \cdots \\
    E(\boldsymbol{r}_t \boldsymbol{a}_t') - E(\boldsymbol{\mu} \boldsymbol{a}_t') & = E(\boldsymbol{a}_t \boldsymbol{a}_t') + E(\boldsymbol{\phi} \boldsymbol{a}_{t-1} \boldsymbol{a}_t') + E(\boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} \boldsymbol{a}_t') + E(\boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} \boldsymbol{a}_t') + \cdots \\
    E(\boldsymbol{r}_t \boldsymbol{a}_t') - \boldsymbol{\mu} E(\boldsymbol{a}_t') & = E(\boldsymbol{a}_t \boldsymbol{a}_t') + \boldsymbol{\phi} E(\boldsymbol{a}_{t-1} \boldsymbol{a}_t') + \boldsymbol{\phi}^2 E(\boldsymbol{a}_{t-2}\boldsymbol{a}_t') + \boldsymbol{\phi}^3 E(\boldsymbol{a}_{t-3} \boldsymbol{a}_t') + \cdots \\
    E(\boldsymbol{r}_t \boldsymbol{a}_t') - \boldsymbol{\mu} (\boldsymbol{0}) & = E(\boldsymbol{a}_t \boldsymbol{a}_t') + \boldsymbol{\phi} (\boldsymbol{0}) + \boldsymbol{\phi}^2 (\boldsymbol{0}) + \boldsymbol{\phi}^3 (\boldsymbol{0}) + \cdots \\
    \text{Cov}(\boldsymbol{r}_t, \boldsymbol{a}_t') - (\boldsymbol{0}) & = E(\boldsymbol{a}_t \boldsymbol{a}_t') + (\boldsymbol{0}) + (\boldsymbol{0}) + (\boldsymbol{0}) + \cdots \\
    \text{Cov}(\boldsymbol{r}_t, \boldsymbol{a}_t) & = E(\boldsymbol{a}_t \boldsymbol{a}_t') \\
    \text{Cov}(\boldsymbol{r}_t, \boldsymbol{a}_t) & = \boldsymbol{\Sigma}
\end{align}$$ 

    ![image-7.png](attachment:image-7.png)
    
- Third,for a VAR(1) model, $\large \boldsymbol{r}_t$ depends on the past innovation $\large \boldsymbol{a}_{t-j}$ with coefficient matrix $\large \boldsymbol{\phi}^{j}$. 
    - For such dependence to be meaningful, $\large \boldsymbol{\phi}^{j}$ must converge to zero as $\large j \longrightarrow \infty$. This means that the k eigenvalues of $\large \boldsymbol{\phi}$ must be less than 1 in modulus; otherwise, $\large \boldsymbol{\phi}^{j}$ will either explode [$\large \boldsymbol{\phi} \text{ eigenvalues } ≥ 1$ in modulus] or converge to a nonzero matrix [non zero matrix is bad because we are summing these and if the j-th power of the $\large \boldsymbol{\phi}$ matrix isnt zero then the sum will grow endlessly (not explode but grow).] as $\large j \longrightarrow \infty$. 
    - As a matter of fact, the requirement that all **eigenvalues of are less than 1** in modulus is the **necessary and sufficient condition for weak stationarity** of $\large \boldsymbol{r}_t$ provided that the covariance matrix of $\large \boldsymbol{a}_t$ exists.  [See note below on characteristic equation / polynomial.] Notice that this stationarity condition reduces to that of the univariate AR(1) case in which the condition is $\large |\phi| < 1$ [Damn, is this determinant or modulus?]. 

![image-12.png](attachment:image-12.png)


[Copied from the VAR(p) section below: This may be the answer to the quandry vexxing me.  Inside unit circle has been eigenvalues of $\large \boldsymbol{\phi}$ which is obtained by determinant of $\large \boldsymbol{\phi} - \boldsymbol{I}$.  Outside the unit circle has been the determinant of $\large \boldsymbol{\phi}$ which is simply 1 + |$\large \boldsymbol{\phi} - \boldsymbol{I}$|.]


"Furthermore, because [see this in action in my example below]:

$$\large |\lambda \boldsymbol{I} − \boldsymbol{\phi}| = \lambda^k |\boldsymbol{I} − \boldsymbol{\phi} \frac{1}{\lambda}|$$

![image-11.png](attachment:image-11.png)

... the eigenvalues of $\large \boldsymbol{\phi}$ are the inverses of the zeros of the determinant $\large |\boldsymbol{I} − \boldsymbol{\phi}B|$".

[https://www.adelaide.edu.au/mathslearning/ua/media/120/evalue-magic-tricks-handout.pdf
"The determinant $\large |\lambda \boldsymbol{I} − \boldsymbol{A}|$ (for unknown $\large \lambda$) is called the **characteristic polynomial** of A. (The zeros of this polynomial are the eigenvalues of A.) The equation $\large |\lambda \boldsymbol{I} − \boldsymbol{A}| = 0$ is called the **characteristic equation** of A. (The solutions of this equation are the eigenvalues of A.)"]

[The eigenvalues of $\large \boldsymbol{\phi}$ are the zeros of the determinant $\large |\boldsymbol{\phi} - \lambda \boldsymbol{I}|$ which are the inverses of the determinant $\large |\boldsymbol{I} − \boldsymbol{\phi}B|$ where I believe Tsay's denotes B as the unknowns and not the backshift operator.]

[Using Tsay's convention:]

$\large \boldsymbol{\phi} =
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}:\\
\begin{align}
\text{det}(\lambda \boldsymbol{I} - \boldsymbol{\phi}) 
& = |\lambda \boldsymbol{I} - \boldsymbol{\phi}| \\
& = det \begin{bmatrix}
\lambda-.8&.3\\
.2&\lambda-.7
\end{bmatrix} \\
& = \lambda^2 - \frac{3}{2}\lambda + \frac{1}{2} \\
& = (\lambda - 1)(\lambda - \frac{1}{2})\\
& \large \longrightarrow \lambda_1 = 1, \lambda_2 = \frac{1}{2}
\end{align}$

[Strang LA5 uses the reverse:]

$\large \boldsymbol{\phi} =
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}:\\
\begin{align}
\text{det}(\boldsymbol{\phi} - \lambda \boldsymbol{I}) 
& = |\boldsymbol{\phi} - \lambda \boldsymbol{I}| \\
& = det \begin{bmatrix}
.8-\lambda&.3\\
.2&.7-\lambda
\end{bmatrix} \\
& = \lambda^2 - \frac{3}{2}\lambda + \frac{1}{2} \\
& = (\lambda - 1)(\lambda - \frac{1}{2})\\
& \large \longrightarrow \lambda_1 = 1, \lambda_2 = \frac{1}{2}
\end{align}$

[and now the other equation]

$\large \boldsymbol{\phi} =
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}:\\
\begin{align}
\text{det}(\boldsymbol{I} - \boldsymbol{\phi}\boldsymbol{B}) 
& = |\boldsymbol{I} - \boldsymbol{\phi}\boldsymbol{B}| \\
& = det \begin{bmatrix}
1-.8B&-.3B\\
-.2B&1-.7B
\end{bmatrix} \\
& = (0.56)B^2 - (1.5) B + 1 - (0.06 B^2) \\
& = (0.5)B^2 - (1.5) B + 1 \\
& = (.5B - 1)(B - 1)\\
& \large \longrightarrow B_1 = 2, B_2 = 1
\end{align}$

[the formula above in action:]

$\begin{align}
\lambda^2 - \frac{3}{2}\lambda + \frac{1}{2} & = (\lambda^2)(\frac{1}{2} \frac{1}{\lambda^2} - \frac{3}{2}\frac{1}{\lambda} + 1)\\
& = (\lambda^2)(\frac{1}{2} B^2 - \frac{3}{2}B + 1)\\
\end{align}$  


Thus, an equivalent **sufficient and necessary condition** for **stationarity** of $r_t$ is that all zeros of the determinant $\large |\boldsymbol{\phi}(B)|$ are greater than one in modulus; that is, all zeros are outside the unit circle in the complex plane. 

[note that the term |$\large \boldsymbol{\phi} (B)$| may denote |$\large \boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1|$ = |$\large \boldsymbol{I} − \boldsymbol{\phi}_1 B$| when k = 2 (from lower in (8.2.3):]

$$\large \boldsymbol{\phi} (B) = \boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p$$

[end note]


- Fourth, using the expression, we obtain [the concurrent covariance of $\large \boldsymbol(r)_t = \boldsymbol{\Gamma}_0$ by squaring the first equation, taking the expectation, assuming $\large \text{Cov}(\boldsymbol{a}_{t-j},\boldsymbol{a}_{t-i}) = 0$, for all i ≠ j]:

$$ \begin{align}
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^0 \boldsymbol{a}_t + \boldsymbol{\phi}^1 \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots\\
(\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi}^0 \boldsymbol{a}_t + \boldsymbol{\phi}^1 \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots)^2\\
\boldsymbol{\tilde{r}}_t \boldsymbol{\tilde{r}}_t' & = (\boldsymbol{a}_t + \boldsymbol{\phi} \boldsymbol{a}_{t-1} + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} + \cdots)^2\\
& = \boldsymbol{a}_t \boldsymbol{a}_t' + \boldsymbol{\phi} \boldsymbol{a}_{t-1}(\boldsymbol{\phi} \boldsymbol{a}_{t-1})' + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} (\boldsymbol{\phi}^2 \boldsymbol{a}_{t-2})' + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3}(\boldsymbol{\phi}^3 \boldsymbol{a}_{t-3})' + \cdots\\
& = \boldsymbol{a}_t \boldsymbol{a}_t' + \boldsymbol{\phi} \boldsymbol{a}_{t-1} \boldsymbol{a}_{t-1}' \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} \boldsymbol{a}_{t-2}' (\boldsymbol{\phi}^2)' + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} \boldsymbol{a}_{t-3}' (\boldsymbol{\phi}^3)' + \cdots\\
E(\boldsymbol{\tilde{r}}_t \boldsymbol{\tilde{r}}_t' & = \boldsymbol{a}_t \boldsymbol{a}_t' + \boldsymbol{\phi} \boldsymbol{a}_{t-1} \boldsymbol{a}_{t-1}'\boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} \boldsymbol{a}_{t-2}' (\boldsymbol{\phi}^2)' + \boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} \boldsymbol{a}_{t-3}' (\boldsymbol{\phi}^3)' + \cdots)\\
E(\boldsymbol{\tilde{r}}_t \boldsymbol{\tilde{r}}_t') & = E(\boldsymbol{a}_t \boldsymbol{a}_t') + E(\boldsymbol{\phi} \boldsymbol{a}_{t-1} \boldsymbol{a}_{t-1}'\boldsymbol{\phi}') + E(\boldsymbol{\phi}^2 \boldsymbol{a}_{t-2} \boldsymbol{a}_{t-2}' (\boldsymbol{\phi}^2)') + E(\boldsymbol{\phi}^3 \boldsymbol{a}_{t-3} \boldsymbol{a}_{t-3}' (\boldsymbol{\phi}^3)') + \cdots\\
\text{Cov}(\boldsymbol{r}_t) & = E(\boldsymbol{a}_t \boldsymbol{a}_t') + \boldsymbol{\phi} E(\boldsymbol{a}_{t-1} \boldsymbol{a}_{t-1}') \boldsymbol{\phi}' + \boldsymbol{\phi}^2 E(\boldsymbol{a}_{t-2}' \boldsymbol{a}_{t-2}') (\boldsymbol{\phi}^2)' + \boldsymbol{\phi}^3 E(\boldsymbol{a}_{t-3} \boldsymbol{a}_{t-3}') (\boldsymbol{\phi}^3)' + \cdots\\
& = \text{Cov}(\boldsymbol{a}_t, \boldsymbol{a}_t) + \boldsymbol{\phi} \text{Cov}(\boldsymbol{a}_{t-1}, \boldsymbol{a}_{t-1}) \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \text{Cov}(\boldsymbol{a}_{t-2}, \boldsymbol{a}_{t-2}) (\boldsymbol{\phi}^2)' + \boldsymbol{\phi}^3 \text{Cov}(\boldsymbol{a}_{t-3}, \boldsymbol{a}_{t-3})(\boldsymbol{\phi}^3) + \cdots \\
\boldsymbol{\Gamma}_0 & = \boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots & \\
& = \sum_{i=0}^{\infty} \boldsymbol{\phi}^i \boldsymbol{\Sigma} (\boldsymbol{\phi}^i)'
\end{align}$$,

![image-10.png](attachment:image-10.png)

where 
- $\large \boldsymbol{\phi}^0 = I$, the k × k identity matrix. [Think Tsay points this out because $\large \boldsymbol{\Sigma}$ stands alone on the LHS of this equation, but is really pre- and post-multiplied by $\large \boldsymbol{\phi}^0 = I$

![image-9.png](attachment:image-9.png)


[$\large \boldsymbol{\Sigma}$ relates $\large \boldsymbol{(a_t, a_t)}$, and as just proved also relates $\large \boldsymbol{(r_t, a_t)}$, but also proved these covariances are zero: $\large \boldsymbol{(a_t, a_{t-1})}$, and $\large \boldsymbol{(r_{t-1}, a_t)}$.  Above shows that to relate $\large \boldsymbol{r_t, r_{t-1}}$ one needs to mulitply $\large \boldsymbol{\Sigma}$ on both sides by $\large \boldsymbol{\phi}$ powered once for each step backwards in time, all the way back in time and then sum up those $\large \boldsymbol{\phi}$ factored $\large \boldsymbol{\Sigma}$ to get $\large \boldsymbol{\Gamma_0}$.  So $\large \boldsymbol{\Gamma_0}$ is the accumulation of all the concurrent covariances adjusted by component parameters (partials) $\large \boldsymbol{\phi}$ that measure 1-time step lag partial sensitivities. Notice is it the same $\large \boldsymbol{\phi}$ for 1 time step or 10 time steps.]  

To find $\large \boldsymbol{\Gamma}_{\ell}$ = the **cross covariance matrix**:
- postmultiply $\large \boldsymbol{\tilde{r}}_{t-\ell}$ to equation (8.11),
- take expectation,
- use the results $\large \text{Cov}(\boldsymbol{a}_t, \boldsymbol{r}_{t-j}) = E(\boldsymbol{a}_t, \boldsymbol{\tilde{r}}_{t-j}) = 0$ for j > 0

![image-8.png](attachment:image-8.png)

$$\large \begin{align}
\boldsymbol{\tilde{r}}_t & = \boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-1} + \boldsymbol{a}_t & (8.11)\\
\boldsymbol{\tilde{r}}_t \boldsymbol{\tilde{r}}_{t-\ell}' & = \boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-1} \boldsymbol{\tilde{r}}_{t-\ell}' + \boldsymbol{a}_t \boldsymbol{\tilde{r}}_{t-\ell}' & \\
E(\boldsymbol{\tilde{r}}_t \boldsymbol{\tilde{r}}_{t-\ell}' & = \boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-1} \boldsymbol{\tilde{r}}_{t-\ell}' + \boldsymbol{a}_t \boldsymbol{\tilde{r}}_{t-\ell}') & \\
E(\boldsymbol{\tilde{r}}_t \boldsymbol{\tilde{r}}_{t-\ell}') & = E(\boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-1} \boldsymbol{\tilde{r}}_{t-\ell}') + E(\boldsymbol{a}_t \boldsymbol{\tilde{r}}_{t-\ell}') & \\
E(\boldsymbol{\tilde{r}}_t \boldsymbol{\tilde{r}}_{t-\ell}') & = E(\boldsymbol{\phi} \boldsymbol{\tilde{r}}_{t-1} \boldsymbol{\tilde{r}}_{t-\ell}') + (\boldsymbol{0})  & \\
E(\boldsymbol{\tilde{r}}_t \boldsymbol{\tilde{r}}_{t-\ell}') & = \boldsymbol{\phi} E(\boldsymbol{\tilde{r}}_{t-1} \boldsymbol{\tilde{r}}_{t-\ell}'), \ell > 0  & \\
\text{Cov}(\boldsymbol{r}_t, \boldsymbol{r}_{t-\ell}) & = \boldsymbol{\phi} \text{Cov}(\boldsymbol{r}_{t-1}, \boldsymbol{r}_{t-\ell}), \ell > 0  & \\
\boldsymbol{\Gamma}_{\ell} & = \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 1}, \ell > 0  & (8.12)
\end{align}$$

![image-13.png](attachment:image-13.png)

![image-14.png](attachment:image-14.png)

where:
$\large \boldsymbol{\Gamma}_{j}$ is the lag-j cross-covariance matrix of $\large \boldsymbol{r}_t$. Again, this result is a **generalization of that of a univariate AR(1) process**. 


[FOR NOW IGNORE THIS: Gamma is cross covariance matrix (crossing time) relating component assets movement direction and distance.  And Phi relates how these 2 measured relationships (cross time and among products measured by Gamma) change across time from ell minus one to ell, or as we are about to see from t=0 to t=ell.  So Gamma is in a way the first derivative of rt with respect to time (one row vs many columms) and component asset (one row vs one column): as Gamma_ell) first row multiplies a return vector_t, the first product of that inner product sum is cross (time) covariance of (direction and distance) for that component (row) vs itself (column) (i,i) on the diagonal, the next product of the inner product sum is cross (time and compponent) covariance of (distance and direction) between the obbject (row, dependent) product vs the subject (col, leader).  The inner product sum of these products is the direction and distance (nudge) effect of time and coproduct on the object product.  Phi is just saying how that inner product of sums' effect (nudge) on the object product changes over time.  I THINK.  All this note is Phi as this 2nd deriv wrt time from ell minus one to ell; now will make it from zero to ell. Phi is the derivative of Gamma wrt time:] 

By resubscripting (8.12), and repeated substitutions in (8.12), Tsay shows that:

$$\large \begin{align}
\boldsymbol{\Gamma}_{\ell} & = \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 1}, \ell > 0  & (8.12)\\
\boldsymbol{\Gamma}_{\ell-1} & = \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 2}, \ell > 0  & \\
\boldsymbol{\Gamma}_{\ell} & = \boldsymbol{\phi} ( \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 2} ) & \\
& = \boldsymbol{\phi}^2 \boldsymbol{\Gamma}_{\ell - 2}  & \\
\boldsymbol{\Gamma}_{\ell-2} & = \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 3}, \ell > 0  & \\
\boldsymbol{\Gamma}_{\ell} & = \boldsymbol{\phi}^2 ( \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 3} ) & \\
& = \boldsymbol{\phi}^3 \boldsymbol{\Gamma}_{\ell - 3} \ldots & \\
\boldsymbol{\Gamma}_{\ell} & = \boldsymbol{\phi}^{\ell} \boldsymbol{\Gamma}_{\ell - \ell}, \ell > 0  & \\
& = \boldsymbol{\phi}^{\ell} \boldsymbol{\Gamma}_{0}, \ell > 0  & \\
\end{align}$$

![image-15.png](attachment:image-15.png)


[NOTICE:]    

$\boldsymbol{\Gamma}_{\ell} = \boldsymbol{\phi}^{\ell} \boldsymbol{\Gamma}_{0} = \boldsymbol{\phi}^{\ell} [\boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots ]$  


To obtain the **crosscorrelation matrix** 
- Pre- and post-multiply (8.12) by $\large \boldsymbol{D}^{−1/2}$:

$$\large \begin{align}
\boldsymbol{\Gamma}_{\ell} & = \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 1}, \ell > 0  & (8.12)\\
\boldsymbol{D}^{−1/2} \boldsymbol{\Gamma}_{\ell} \boldsymbol{D}^{−1/2} & = \boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 1} \boldsymbol{D}^{−1/2} & \\
\boldsymbol{\rho}_{\ell} & = \boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 1} \boldsymbol{D}^{−1/2} & \\
\boldsymbol{\rho}_{\ell} & = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}( \boldsymbol{I} ) \boldsymbol{\Gamma}_{\ell - 1} \boldsymbol{D}^{−1/2} & \\
\boldsymbol{\rho}_{\ell} & = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}( \boldsymbol{D}^{1/2} \boldsymbol{D}^{−1/2} ) \boldsymbol{\Gamma}_{\ell - 1} \boldsymbol{D}^{−1/2} & \\
& = (\boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{D}^{1/2}) \boldsymbol{\rho}_{\ell-1} & \\
& = \boldsymbol{\Upsilon} \boldsymbol{\rho}_{\ell-1} & \\
\end{align}$$

![image-18.png](attachment:image-18.png)

where:
- $\large \boldsymbol{\Upsilon} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{D}^{1/2}$

![image-17.png](attachment:image-17.png)


[NOTICE:]    

$\boldsymbol{\Gamma}_{\ell} = \boldsymbol{\phi}^{\ell} \boldsymbol{\Gamma}_{0} = \boldsymbol{\phi}^{\ell} [\boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots ]$ 

$\boldsymbol{D}^{−1/2}\boldsymbol{\Gamma}_{\ell}\boldsymbol{D}^{−1/2} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell} \boldsymbol{\Gamma}_{0}\boldsymbol{D}^{−1/2} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell} [\boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots ] \boldsymbol{D}^{−1/2}$ 

$\boldsymbol{\rho}_{\ell} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell} \boldsymbol{\Gamma}_{0}\boldsymbol{D}^{−1/2} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell} [\boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots ] \boldsymbol{D}^{−1/2}$ 

$\boldsymbol{\rho}_{\ell} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell} (\boldsymbol{I}) \boldsymbol{\Gamma}_{0} \boldsymbol{D}^{−1/2} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell}  (\boldsymbol{I})  [\boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots ] \boldsymbol{D}^{−1/2}$ 

$\boldsymbol{\rho}_{\ell} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell} ( \boldsymbol{D}^{1/2} \boldsymbol{D}^{−1/2} )  \boldsymbol{\Gamma}_{0} \boldsymbol{D}^{−1/2} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell}  ( \boldsymbol{D}^{1/2} \boldsymbol{D}^{−1/2} )  [\boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots ] \boldsymbol{D}^{−1/2}$ 

$\boldsymbol{\rho}_{\ell} = (\boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell}  \boldsymbol{D}^{1/2} ) ( \boldsymbol{D}^{−1/2}   \boldsymbol{\Gamma}_{0} \boldsymbol{D}^{−1/2} ) = ( \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell}  \boldsymbol{D}^{1/2} ) ( \boldsymbol{D}^{−1/2} )  [\boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots ] \boldsymbol{D}^{−1/2} )$ 

$\boldsymbol{\rho}_{\ell} = (\boldsymbol{\Upsilon}^{\ell} ) ( \boldsymbol{\rho}_{0} ) = ( \boldsymbol{\Upsilon}^{\ell}  ) ( \boldsymbol{D}^{−1/2} )  [\boldsymbol{\Sigma} + \boldsymbol{\phi} \boldsymbol{\Sigma} \boldsymbol{\phi}' + \boldsymbol{\phi}^2 \boldsymbol{\Sigma} (\boldsymbol{\phi}^2)' + \cdots ] \boldsymbol{D}^{−1/2} )$ 

$\boldsymbol{\rho}_{\ell} = (\boldsymbol{\Upsilon}^{\ell} ) ( \boldsymbol{\rho}_{0} ) = ( \boldsymbol{\Upsilon}^{\ell}  ) ( \boldsymbol{D}^{−1/2} )  [\boldsymbol{\Gamma_0}] \boldsymbol{D}^{−1/2} )$ 

$\boldsymbol{\rho}_{\ell} = (\boldsymbol{\Upsilon}^{\ell} ) ( \boldsymbol{\rho}_{0} ) = ( \boldsymbol{\Upsilon}^{\ell}  ) ( \boldsymbol{\rho_0})$ 

[... and ...]

$\large \boldsymbol{\Upsilon}^{\ell} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}^{\ell} ( \boldsymbol{D}^{1/2}$

[... because one can repeatedly insert $\large (\boldsymbol{I}) = ( \boldsymbol{D}^{1/2} \boldsymbol{D}^{−1/2} )$ in between each of the powers of $\large \boldsymbol{\phi}^{\ell}$ to get the powers of $\large \boldsymbol{\Upsilon}^{\ell}$ and the $\large \boldsymbol{D}^{1/2} \boldsymbol{D}^{−1/2}$ cancel.  Moving on ...]

Consequently, the cross correlation matrix (CCM) of a VAR(1) model satisfies:

$$ \large \begin{align} 
\boldsymbol{\rho}_{\ell} & = \boldsymbol{\Upsilon}^1 \boldsymbol{\rho}_{\ell-1}\\
& = \boldsymbol{\Upsilon}^{\ell} \boldsymbol{\rho}_{\ell-\ell} \\
& = \boldsymbol{\Upsilon}^{\ell} \boldsymbol{\rho}_{0}
\end{align}$$

![image-16.png](attachment:image-16.png)

[Rho now is cross correlation matrix (crossing time) relating component assets movement direction only (not distance).  And Upsilon relates how these 2 measured relationships (cross time and among products measured by Rho) change across time from ell minus one to ell, or from t=0 to t=ell.  So Rho is in a way the first derivative of rt with respect to time (one row vs many columms) and component asset (one row vs one column): as Rho_ell first row multiplies a return vector_t, the first product of that inner product sum is cross (time) correlation of (only direction and not distance) for that component (row) vs itself (column) (i,i) on the diagonal. The next product of the inner product sum is cross (time and compponent) correlation of (not distance but only direction) between the obbject (row, dependent) product vs the subject (col, leader).  The inner product sum of these products is only the direction and no distance (same force of nudge for all directional relationships) effect of time and coproduct on the object product.  Upsilon is just saying how that inner product of sums' effect (unit nudge) on the object product changes over time.  I THINK.  All this note is to say that Upsilon is this 2nd deriv of rt wrt time from ell minus one to ell; or from zero to ell.  Upsilon is the derivative of Rho wrt time. Since Rho is only direction, multiplying Rho times return vector rt is meaningless; it has no meaning.  need to multiply this Rho rt product by the individual variances of the related products (another inner product because all subject, leader component variances need to effect nudge in distance terms a single object, dependent cells:]

[So to go from Phi (distance and direction) to Upsilon (direction only) need to divide (-1 exponent = divide in one product and inverse in vector space) by the distance (stdev = sqrt(variances) = 1/2 so that both products can contribute since Upsilon is in squared return space) of each leader product on dependent product.]

$\large \boldsymbol{\Upsilon} = \boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{D}^{1/2}$

[So to go from Upsilon (no distance but only direction second deriv) to Phi (distance and direction across time second deriv) need to multiply (+1 exponent = multiply in one product and inverse invertse in vector space) by the distance (stdev = sqrt(variances) = 1/2 so that both products can contribute since Phi is in squared return space) of each leader product on dependent product.]

[this is the gymnastics of (8.12) replayed to be clearer on switching from Lambda to Rho and from Phi to Upsilon, both transformation that remove distance from the direction+distance relationship measurements.]

$$\large \begin{align}
\boldsymbol{\Gamma}_{\ell} & = \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 1}\\
\boldsymbol{D}^{−1/2} \boldsymbol{\Gamma}_{\ell} \boldsymbol{D}^{−1/2} & = \boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{\Gamma}_{\ell - 1} \boldsymbol{D}^{−1/2} \\
\boldsymbol{\rho}_{\ell} & = ( \boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{D}^{1/2} ) (\boldsymbol{D}^{−1/2}  \boldsymbol{\Gamma}_{\ell - 1} \boldsymbol{D}^{−1/2}) \\
& = (\boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{D}^{1/2}) \boldsymbol{\rho}_{\ell-1} \\
& = \boldsymbol{\Upsilon} \boldsymbol{\rho}_{\ell-1} \\
\end{align}$$

LHS: removing distance from Upsilon to make phi

$\large \begin{align} \boldsymbol{\Upsilon} & = \boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{D}^{1/2}\\
\boldsymbol{D}^{1/2} \boldsymbol{\Upsilon} \boldsymbol{D}^{-1/2} & = \boldsymbol{D}^{1/2}\boldsymbol{D}^{−1/2} \boldsymbol{\phi} \boldsymbol{D}^{1/2}  \boldsymbol{D}^{-1/2}\\
\boldsymbol{D}^{1/2} \boldsymbol{\Upsilon}  \boldsymbol{D}^{−1/2} & = \boldsymbol{\phi} 
\end{align}$

LHS: adding distance to rho to make covariance Gamma

$$\large \begin{align}
\boldsymbol{\rho}_{\ell} & = \boldsymbol{D}^{−1/2} \boldsymbol{\Gamma}_{\ell} \boldsymbol{D}^{−1/2} \\
\boldsymbol{D}^{−1/2} \boldsymbol{\rho}_{\ell} \boldsymbol{D}^{1/2} & = \boldsymbol{D}^{−1/2} \boldsymbol{D}^{−1/2} \boldsymbol{\Gamma}_{\ell} \boldsymbol{D}^{−1/2} \boldsymbol{D}^{−1/2}  \\
\boldsymbol{\Gamma}_{\ell} & = \boldsymbol{D}^{−1/2} \boldsymbol{\rho}_{\ell} \boldsymbol{D}^{−1/2}  \\
\end{align}$$

Nonsense

[after reviewing my material which follows, the above translates to "... the eigenvalues of $\large \boldsymbol{\phi}$ are the inverses of the zeros of the determinant $\large |\boldsymbol{I} − \boldsymbol{\phi}B|$ which are the inverses of the eigenvalues of $\large \boldsymbol{\phi}B$ where $\large \boldsymbol{B}$ is the inverse of an eigenvalue of $\large \boldsymbol{\phi}$":]

[or stated more clearly "... the eigenvalues of $\large \boldsymbol{\phi}$ are the inverses of the zeros of the determinant $\large |\boldsymbol{I} − \boldsymbol{\phi}(\frac{1}{\lambda})|$ which are the inverses of the eigenvalues of $\large \boldsymbol{\phi}(\frac{1}{\lambda})$ or inverses of the eigenvalues of $\large \boldsymbol{\phi}{\lambda}^{-1}$" and thus $\large \boldsymbol{B}$ is the inverse of an eigenvalue of $\large \boldsymbol{\phi}$ and where it says eigenvalue of $\large \boldsymbol{\phi}$ must be less than 1, the eigenvalue or zero of determinant of $\large \boldsymbol{\phi}B$ must be greater than 1:]


[so using Strang's example, where k = 2, and one $\lambda = \frac{1}{2}$, we end up with zeros because we inserted the correct eigenvalue $\lambda = \frac{1}{2}$:]

$\large |\lambda \boldsymbol{I} − \boldsymbol{A}| = \lambda^2|\boldsymbol{I} - A(\frac{1}{\lambda})|\\
\large |(\frac{1}{2}) \boldsymbol{I} − \boldsymbol{A}| = (\frac{1}{2})^2|\boldsymbol{I} - A(\frac{1}{\frac{1}{2}})|\\
\large |(\frac{1}{2}) \boldsymbol{I} − 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}| = 
(\frac{1}{2})^2|\boldsymbol{I} - 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}
(\frac{1}{\frac{1}{2}})|\\
\large |(\frac{1}{2}) 
\begin{bmatrix}
1&\\
&1
\end{bmatrix} − 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}| = 
(\frac{1}{2})^2|
\begin{bmatrix}
1&\\
&1
\end{bmatrix} - 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}(2)|\\
\large | 
\begin{bmatrix}
.5&\\
&.5
\end{bmatrix} − 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}| = 
(\frac{1}{2})^2|
\begin{bmatrix}
1&\\
&1
\end{bmatrix} - 
\begin{bmatrix}
1.6&.6\\
.4&1.4
\end{bmatrix}|\\
\large | 
\begin{bmatrix}
-.3&-.3\\
-.2&-.2
\end{bmatrix}| = 
(\frac{1}{2})^2|
\begin{bmatrix}
-.6&-.6\\
-.4&-.4
\end{bmatrix}|\\
\large (-.3)(-.2)-(-.3)(-.2) = 
(\frac{1}{2})^2(-.6)(-.4)-(-.6)(-.4)$

[But if instead, we leave $\lambda$ as a variable] 

$\large |\lambda \boldsymbol{I} − \boldsymbol{A}| = \lambda^2|\boldsymbol{I} - A(\frac{1}{\lambda})|\\
\large |(\lambda) \boldsymbol{I} − 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}| = 
(\lambda)^2|\boldsymbol{I} - 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}
(\frac{1}{\lambda})|\\
\large |(\lambda) 
\begin{bmatrix}
1&\\
&1
\end{bmatrix} − 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}| = 
(\lambda)^2|
\begin{bmatrix}
1&\\
&1
\end{bmatrix} - 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}(\frac{1}{\lambda})|\\
\large | 
\begin{bmatrix}
\lambda&\\
&\lambda
\end{bmatrix} − 
\begin{bmatrix}
.8&.3\\
.2&.7
\end{bmatrix}| = 
(\lambda)^2|
\begin{bmatrix}
1&\\
&1
\end{bmatrix} - 
\begin{bmatrix}
.8\frac{1}{\lambda}&.3\frac{1}{\lambda}\\
.2\frac{1}{\lambda}&.7\frac{1}{\lambda}
\end{bmatrix}|\\
\large | 
\begin{bmatrix}
\lambda-.8&-.3\\
-.2&\lambda-.7
\end{bmatrix}| = 
(\lambda)^2|
\begin{bmatrix}
1-.8\frac{1}{\lambda}&-.3\frac{1}{\lambda}\\
-.2\frac{1}{\lambda}&1-.7\frac{1}{\lambda}
\end{bmatrix}|$



## 8.2.3 Vector AR(p) Models
The generalization of VAR(1) to VAR(p) models is straightforward. 

The [k component dimensioned] time series $\large \boldsymbol{r}_t$ follows a VAR(p) model if it satisfies:

$$\large \begin{align} \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \cdots + \boldsymbol{\phi}_p \boldsymbol{r}_{t−p} + \boldsymbol{a}_t, \;\; p > 0, & (8.13) \end{align}$$

![image.png](attachment:image.png)

where 
- $\large \boldsymbol{\phi}_0$ is defined as before
- $\large \boldsymbol{a}_t$ is defined as before
- $\large \boldsymbol{\phi}_j$, [1 ≤ j ≤ p] are k × k matrices. 

[excerpted from above]

where:
- $\large \boldsymbol{\phi}_0$ is a k-dimensional **vector**,
- $\large \boldsymbol{\phi}$ is a k × k **matrix**, 
- $\large \{\boldsymbol{a}_t\}$ is a sequence of 
    - **serially uncorrelated** 
    - **random vectors** 
    - with mean zero and 
    - with covariance matrix $\large \boldsymbol{\Sigma}$ 
        - that is, in application, required to be positive definite; otherwise, the dimension of $r_t$ can be reduced. 
        - [imagine that if $\large \boldsymbol{\Sigma}$ is not positive definite, then one of $\large \boldsymbol{\Sigma}$'s eigenvalues that is zero might pair with an eigenvector and reduce $\large \boldsymbol{r}_t$. Is this right?]
    - In the literature, $\large \boldsymbol{a}_t$ is often **assumed** to be **multivariate normal**.

[end excerpt]

The VAR(p) model can be written using the back-shift operator B: 

$$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{a}_t,$$

![image-2.png](attachment:image-2.png)

where 
- $\boldsymbol{I}$ is the k × k identity matrix. 

This VAR(p) model representation using the back-shift operator can be written in a compact form as:

$$\large \boldsymbol{\phi} (B) \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{a}_t$$

![image-5.png](attachment:image-5.png)

where ... 

$$\large \boldsymbol{\phi} (B) = \boldsymbol{I} − \boldsymbol{\phi}_1 B − \cdots − \boldsymbol{\phi}_p B^p$$


[... or my representation ...]

$$\large \boldsymbol{\phi} (B) = \boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p$$

![image-4.png](attachment:image-4.png)

... is a matrix polynomial. 

If $\large \boldsymbol{r}_t$ is weakly stationary, then we have:

$$\large \boldsymbol{\mu} = E( \boldsymbol{r}_t) = (\boldsymbol{I} −  \boldsymbol{\phi}_1 − \cdots −  \boldsymbol{\phi}_p)^{−1}  \boldsymbol{\phi}_0 = [ \boldsymbol{\phi}(1)]^{−1}  \boldsymbol{\phi}_0$$

![image-3.png](attachment:image-3.png)

provided that the inverse exists. 

[My algebra obtains the above this way:]

$$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{a}_t,$$

take expectation:  

$$\large E[(\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) \boldsymbol{r}_t] = E(\boldsymbol{\phi}_0 + \boldsymbol{a}_t),$$

[can factor B out only when $\large \boldsymbol{r}_t$ is weakly stationary, I think...]

$$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) E(\boldsymbol{r}_t) = E(\boldsymbol{\phi}_0 + \boldsymbol{a}_t),$$

[RHS:  $\large E(\boldsymbol{a}_t) = 0; E(\boldsymbol{\phi}_0) = \boldsymbol{\phi}_0$:]

$$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) E(\boldsymbol{r}_t) = \boldsymbol{\phi}_0 + \boldsymbol{0} = \boldsymbol{\phi}_0 $$

[If invertible:]

$$\large E(\boldsymbol{r}_t) = (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p)^{-1} \boldsymbol{\phi}_0 $$

[I think this is VAR(1) model form:]

$\large [ \boldsymbol{\phi}(1)]^{−1}  \boldsymbol{\phi}_0$

![image-3.png](attachment:image-3.png)

Let ...

$\large \boldsymbol{\tilde{r}}_t = \boldsymbol{r}_t − \boldsymbol{\mu}$ 

The VAR(p) model becomes [a function of $\large \boldsymbol{\tilde{r}}_t$ this way (my algebra):]

[the original model]

$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{a}_t,$

[expected value of it]

$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) E( \boldsymbol{r}_t ) = E (\boldsymbol{\phi}_0 + \boldsymbol{a}_t),$

$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) E( \boldsymbol{r}_t ) = \boldsymbol{\phi}_0 $

[subtract expected value from original model:]

$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) \boldsymbol{r}_t - (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) E( \boldsymbol{r}_t ) = \boldsymbol{\phi}_0 + \boldsymbol{a}_t - \boldsymbol{\phi}_0$

[factor the LHS and reduce the RHS:]

$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) [\boldsymbol{r}_t - E( \boldsymbol{r}_t )] = \boldsymbol{a}_t$

$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) [\boldsymbol{r}_t - \boldsymbol{\mu})] = \boldsymbol{a}_t$

[replace with $\large \boldsymbol{\tilde{r}}_t$ on the LHS:]

$\large (\boldsymbol{I} B^0 − \boldsymbol{\phi}_1 B^1 − \cdots − \boldsymbol{\phi}_p B^p) \boldsymbol{\tilde{r}} = \boldsymbol{a}_t$

[multiply through on LHS:]

$\large \boldsymbol{I} B^0 \boldsymbol{\tilde{r}} − \boldsymbol{\phi}_1 B^1 \boldsymbol{\tilde{r}} − \cdots − \boldsymbol{\phi}_p B^p \boldsymbol{\tilde{r}} = \boldsymbol{a}_t$

$\large \boldsymbol{\tilde{r}} − \boldsymbol{\phi}_1 B^1 \boldsymbol{\tilde{r}} − \cdots − \boldsymbol{\phi}_p B^p \boldsymbol{\tilde{r}} = \boldsymbol{a}_t$

[Move all but $\large \boldsymbol{\tilde{r}}$ to RHS:]

$\large \boldsymbol{\tilde{r}} = \boldsymbol{\phi}_1 B^1 \boldsymbol{\tilde{r}} + \cdots + \boldsymbol{\phi}_p B^p \boldsymbol{\tilde{r}} + \boldsymbol{a}_t$

[Multiply backshift times mean-corrected returns:]

$\large \boldsymbol{\tilde{r}} = \boldsymbol{\phi}_1 \boldsymbol{\tilde{r}}_{t-1} + \cdots + \boldsymbol{\phi}_p \boldsymbol{\tilde{r}}_{t-p} + \boldsymbol{a}_t \;\; (8.14)$

![image-6.png](attachment:image-6.png)

[Notice that algebra eliminates $\large \boldsymbol{\phi}_0$ which makes sense since mean is subtracted from return.]

Using [the mean-adjusted return] this equation and the same techniques as those for VAR(1) models, Tsay obtains that:
- $\large \text{Cov}(\boldsymbol{r}_t,\boldsymbol{a}_t) = \boldsymbol{\Sigma}$ = the covariance of $\large \boldsymbol{a}_t$ 
- $\large \text{Cov}(\boldsymbol{r}_{t-\ell},\boldsymbol{a}_t) = 0$ for $\large \ell > 0$.
- $\large \boldsymbol{\Gamma}_{\ell} = \boldsymbol{\phi}_1 \boldsymbol{\Gamma}_{\ell - 1} + \cdots + \boldsymbol{\phi}_p \boldsymbol{\Gamma}_{\ell - p}$ for $\large \ell > 0$.

![image-7.png](attachment:image-7.png)

The last property [for $\large \boldsymbol{\Gamma}_{\ell}$] is called **the moment equations of a VAR(p) model**. [Here, should state this in terms of $\large \boldsymbol{\Sigma}$.]  It is a multivariate version of the **Yule–Walker equation of a univariate AR(p) model**. In terms of CCM [cross corelation matrices], the moment equations become:

$$\large \boldsymbol{\rho}_{\ell} = \boldsymbol{\Upsilon}_1 \boldsymbol{\rho}_{\ell - 1} + \cdots + \boldsymbol{\Upsilon}_p \boldsymbol{\rho}_{\ell - p} \text{ for } \ell > 0$$

![image-8.png](attachment:image-8.png)

where:

$$\large \boldsymbol{\Upsilon}_i = \boldsymbol{D}^{−1/2} \boldsymbol{\phi}_i \boldsymbol{D}^{1/2}$$

![image-9.png](attachment:image-9.png)

[So cool: $\large \boldsymbol{\Upsilon}_i$ does for $\large \boldsymbol{\rho}$ what $\large \boldsymbol{\phi}_i$ does for $\large \boldsymbol{\Gamma}$ and $\large \boldsymbol{r_t}$, excepting I think with one of those there's powering as look back furtther in time.]

One approach to understanding properties of the VAR(p) model in equation (8.13) ...

$$\large \begin{align} \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \cdots + \boldsymbol{\phi}_p \boldsymbol{r}_{t−p} + \boldsymbol{a}_t, \;\; p > 0, & (8.13) \end{align}$$

... is to make use of the results of the VAR(1) model in Eq. (8.8). 

(8.8)

$$\large  \begin{align} \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{\phi} \boldsymbol{r}_{t-1} + \boldsymbol{a}_t  \;\; & (8.8) \end{align}$$

This can be achieved by transforming the VAR(p) model of $\large \boldsymbol{r}_t$ into a **kp-dimensional** VAR(1) model. 

Specifically, let $\large \boldsymbol{x}_t$ and $\large \boldsymbol{b}_t$ be two **kp-dimensional processes**: 

$$\large \boldsymbol{x}_t = (\boldsymbol{\tilde{r}}_{t-p+1}', \boldsymbol{\tilde{r}}_{t-p+2}', \ldots, \boldsymbol{\tilde{r}}_{t}')'$$ 

$$\large \boldsymbol{b}_t = (0, \ldots, 0, \boldsymbol{a}_t')'$$

[Those are bold k-dimensional zero's in $\large \boldsymbol{b}_t$; I think they should be represented as transposed so that $\large \boldsymbol{b}_t$ is kp dimensional: p-1 of k-dimensional transposed zero vectors, $\boldsymbol{0}_1', \ldots, \boldsymbol{0}_{p-1}'$ followed by a single k-dimensional $\boldsymbol{a}_t'$ vector. This helps when visualizing the $\boldsymbol{\Sigma}$ covariance matrix below.]:

[$\large \boldsymbol{x}_t$ is a vector of $\large \boldsymbol{\tilde{r}}_{t-p+i}$ vectors where i ranges from 1 to p $\large \longrightarrow \boldsymbol{\tilde{r}}_{t-p+1}, \ldots, \boldsymbol{\tilde{r}}_{t-p+p}$.  Each of the p $\large \boldsymbol{\tilde{r}}_{t-p+i}$ vectors (inside of $\large \boldsymbol{x}_t$) is k-dimensional, one dimension for for each component asset. Hence, $\large \boldsymbol{x}_t$ is kp-dimensioned.  $\large \boldsymbol{x}_t$) is used on the RHS of (8.15) below with subscript **t-1** so that it ranges from $\large \boldsymbol{\tilde{r}}_{t-p+(i=1)-1} \text{ to } \boldsymbol{\tilde{r}}_{t-p+(i=p)-1}$ or $\large \boldsymbol{\tilde{r}}_{t-p} \text{ to } \boldsymbol{\tilde{r}}_{t-1}$.]

$$\large \begin{align} \boldsymbol{x}_t & = (\boldsymbol{\tilde{r}}_{t-p+1}', \boldsymbol{\tilde{r}}_{t-p+2}', \ldots, \boldsymbol{\tilde{r}}_{t-p+p}')\\
& = (\boldsymbol{\tilde{r}}_{t-p+1}', \boldsymbol{\tilde{r}}_{t-p+2}', \ldots, \boldsymbol{\tilde{r}}_{t}')\\
\boldsymbol{x}_{t-1} & = (\boldsymbol{\tilde{r}}_{t-p+1-1}', \boldsymbol{\tilde{r}}_{t-p+2-1}', \ldots, \boldsymbol{\tilde{r}}_{t-p+p-1}')\\
& = (\boldsymbol{\tilde{r}}_{t-p+1-1}', \boldsymbol{\tilde{r}}_{t-p+2-1}', \ldots, \boldsymbol{\tilde{r}}_{t-p+p-1}')\\
& = (\boldsymbol{\tilde{r}}_{t-p}', \boldsymbol{\tilde{r}}_{t-p+1}', \ldots, \boldsymbol{\tilde{r}}_{t-1}')\end{align}$$ 

The mean of $\large \boldsymbol{b}_t$ is zero.

The covariance matrix of $\large \boldsymbol{b}_t$ is a kp x kp matrix with zero everywhere except for the lower right corner, which is $\large \boldsymbol{\Sigma}$ [describing the covariances of $\large \boldsymbol{a}_t$]. 

The VAR(p) model for $\large \boldsymbol{r}_t$ can then be written in the form:

$$\large \begin{align} \boldsymbol{x}_t = \boldsymbol{\phi}^* \boldsymbol{x}_{t-1} + \boldsymbol{b}_t, \;\; & (8.15) \end{align}$$

![image-10.png](attachment:image-10.png)

where $\large \boldsymbol{\phi}^*$ is a kp x kp matrix:

[Base on way it's denoted, think more apt to say $\large \boldsymbol{\phi}^*$ is a p x p block matrix. Each of the $\large p^2$ components of $\large \boldsymbol{\phi}^*$ is a k x k matrix.]

$$\large \boldsymbol{\phi}^* = \begin{bmatrix}
\boldsymbol{0} & \boldsymbol{I} & \boldsymbol{0} & \boldsymbol{0} & \cdots &\boldsymbol{0} \\ 
\boldsymbol{0} & \boldsymbol{0} & \boldsymbol{I} & \boldsymbol{0} & \cdots &\boldsymbol{0} \\
\vdots & \vdots & \vdots &  & \cdots & \vdots \\ 
\\
\boldsymbol{\phi}_p & \boldsymbol{\phi}_{p-1} & \boldsymbol{\phi}_{p-2} & \boldsymbol{\phi}_{p-3} & \cdots & \boldsymbol{\phi}_1 \\ 
\end{bmatrix}$$

![image-11.png](attachment:image-11.png)

where [block matrixes are used]
- each $\large \boldsymbol{0}$ is a k x k zero matrix
- each $\large \boldsymbol{I}$ is a k x k identity matrix

kp x kp dimensioned $\large \boldsymbol{\phi}^*$ multiplies $\large \boldsymbol{x}_{t-1}$; here are the first and last components of the resulting kp-dimentioned vector (a vector with p components each with a k-dimensioned vector in it).

**first component**
$\large \boldsymbol{\phi}^*$'s first row x second column that contains an Identity matrix multiplies $\large \boldsymbol{x}_{t-1}$'s second component $\large \boldsymbol{r}_{t-p+2-1}$ = $\large \boldsymbol{r}_{t-p+1}$.  This places the $\large \boldsymbol{r}_{t-p+1}$ vector in to the 1st element of the resulting $\large \boldsymbol{\phi}^* \boldsymbol{x}_{t-1}$ vector that is summed with $\large \boldsymbol{b}_t$

**last component**
$\large \boldsymbol{\phi}^*$'s last row x first column that is a MATRIX $\large \boldsymbol{\phi}_p$ (subscripted p) that multiplies $\large \boldsymbol{x}_{t-1}$'s first component that is a VECTOR $\large \boldsymbol{r}_{t-p+1-1}$ = $\large \boldsymbol{r}_{t-p}$ (also subscripted "t-p"), resulting in a VECTOR = $\large \boldsymbol{\phi}_p \boldsymbol{r}_{t-p}$ that is added together with the other p VECTORS = $\large \boldsymbol{\phi}_{p-i} \boldsymbol{r}_{t-p+1-i}$ where i ranges 1 to p, and that summed set of vectors is placed in the last component of the VECTOR $\large \boldsymbol{\phi}^* \boldsymbol{x}_{t-1}$ with p components each a VECTOR with k components. 

In the literature, $\large \boldsymbol{\phi}^*$ is called the **companion matrix of the matrix polynomial** $\large \boldsymbol{\phi}(B)$ [because it represents all elements of $\large \boldsymbol{\phi}(B)$ in a way that VAR(p) fits in a VAR(1) model.].

[If e.g. p is 3 for a VAR(p) model and k is 2 for two assets in each return vector:] 

$\large \begin{align} \boldsymbol{x}_t 
& = ( (\boldsymbol{\tilde{r}}_{t-p+1})', (\boldsymbol{\tilde{r}}_{t-p+2})', (\boldsymbol{\tilde{r}}_{t-p+3})' )' \\
& = ( (\boldsymbol{\tilde{r}}_{t-3+1})', (\boldsymbol{\tilde{r}}_{t-3+2})', (\boldsymbol{\tilde{r}}_{t-3+3})' )' \\
& = ( (\tilde{r}_{t-3+1,i=1}, \tilde{r}_{t-3+1,i=2})', (\tilde{r}_{t-3+2,i=1}, \tilde{r}_{t-3+2,i=2})', (\tilde{r}_{t-3+1,i=3}, \tilde{r}_{t-3+3,i=2})' )'\\
& = ( (\tilde{r}_{t-3+1,1}, \tilde{r}_{t-3+1,2})', (\tilde{r}_{t-3+2,1}, \tilde{r}_{t-3+2,2}', (\tilde{r}_{t-3+1,3}, \tilde{r}_{t-3+3,2})' )'\\
& = ( (\tilde{r}_{t-2,1}, \tilde{r}_{t-2,2})', (\tilde{r}_{t-1,1}, \tilde{r}_{t-1,2})', (\tilde{r}_{t,3}, \tilde{r}_{t,2})' )'\\
\end{align}$

[And in (8.15), resubscript to:]

$\large \begin{align} \boldsymbol{x}_{t-1} 
& = ( (\boldsymbol{\tilde{r}}_{t-p+1-1})', (\boldsymbol{\tilde{r}}_{t-p+2-1})', (\boldsymbol{\tilde{r}}_{t-p+3-1})' )' \\
& = ( (\boldsymbol{\tilde{r}}_{t-3+1-1})', (\boldsymbol{\tilde{r}}_{t-3+2-1})', (\boldsymbol{\tilde{r}}_{t-3+3-1})' )' \\
& = ( (\tilde{r}_{t-3+1-1,i=1}, \tilde{r}_{t-3+1-1,i=2})', (\tilde{r}_{t-3+2-1,i=1}, \tilde{r}_{t-3+2-1,i=2})', (\tilde{r}_{t-3+1-1,i=1}, \tilde{r}_{t-3+3-1,i=2})' )'\\
& = ( (\tilde{r}_{t-3+1-1,1}, \tilde{r}_{t-3+1-1,2})', (\tilde{r}_{t-3+2-1,1}, \tilde{r}_{t-3+2-1,2})', (\tilde{r}_{t-3+1-1,1}, \tilde{r}_{t-3+3-1,2})' )'\\
& = ( \tilde{r}_{t-3,1}, \tilde{r}_{t-3,2})', (\tilde{r}_{t-2,1}, \tilde{r}_{t-2,2})', (\tilde{r}_{t-1,1}, \tilde{r}_{t-1,2})' )'\\
\end{align}$

[$\large \boldsymbol{\phi}^*$ is a kp x kp = 3\*2 x 3\*2 matrix because e.g. there are p = 3 number of $\large \boldsymbol{\phi_j}$ in bottom row and each of those quantifies a relationship between k = 2 products:]

$$\large \begin{align} 
\boldsymbol{\phi}^* 
&= \begin{bmatrix}
\boldsymbol{0} & \boldsymbol{I} & \boldsymbol{0} \\ 
\boldsymbol{0} & \boldsymbol{0} & \boldsymbol{I} \\
\vdots & \vdots & \vdots \\ 
\\
\boldsymbol{\phi}_{\ell=p-0=3} & \boldsymbol{\phi}_{\ell=p-1=2} & \boldsymbol{\phi}_{\ell=p-2=1} \\ 
\end{bmatrix} \\
&= \begin{bmatrix}
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}& \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}& \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \\ 
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix} \\
\begin{bmatrix} \phi_{11}(\ell=p) & \phi_{12}(\ell=p) \\ \phi_{21}(\ell=p) & \phi_{22}(\ell=p) \end{bmatrix} & \begin{bmatrix} \phi_{11}(\ell=p-1) & \phi_{12}(\ell=p-1) \\ \phi_{21}(\ell=p-1) & \phi_{22}(\ell=p-1) \end{bmatrix} & \begin{bmatrix} \phi_{11}(\ell=p-2) & \phi_{12}(\ell=p-2) \\ \phi_{21}(\ell=p-2) & \phi_{22}(\ell=p-2) \end{bmatrix}  \\ 
\end{bmatrix}\\
&= \begin{bmatrix}
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}& \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}& \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \\ 
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix} \\
\begin{bmatrix} \phi_{11}(\ell=p=3), & \phi_{12}(\ell=p=3) \\ \phi_{21}(\ell=p=3) & \phi_{22}(\ell=p=3) \end{bmatrix} & \begin{bmatrix} \phi_{11}(\ell=(p=3)-1), & \phi_{12}(\ell=(p=3)-1) \\ \phi_{21}(\ell=(p=3)-1) & \phi_{22}(\ell=(p=3)-1) \end{bmatrix} & \begin{bmatrix} \phi_{11}(\ell=(p=3)-2) & \phi_{12}(\ell=(p=3)-2) \\ \phi_{21}(\ell=(p=3)-2) & \phi_{22}(\ell=(p=3)-2) \end{bmatrix}  \\ 
\end{bmatrix}\\
&= \begin{bmatrix}
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}& \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}& \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \\ 
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix} \\
\begin{bmatrix} \phi_{11}(\ell=3-0=3) & \phi_{12}(\ell=3-0=3) \\ \phi_{21}(\ell=3-0=3) & \phi_{22}(\ell=3-0=3) \end{bmatrix} & \begin{bmatrix} \phi_{11}(\ell=3-1) & \phi_{12}(\ell=3-1) \\ \phi_{21}(\ell=3-1=2) & \phi_{22}(\ell=3-1=2) \end{bmatrix} & \begin{bmatrix} \phi_{11}(\ell=3-2=1) & \phi_{12}(\ell=3-2=1) \\ \phi_{21}(\ell=3-2=1) & \phi_{22}(\ell=3-2=1) \end{bmatrix}  \\ 
\end{bmatrix}\\
&= \begin{bmatrix}
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}& \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}& \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \\ 
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix} \\
\begin{bmatrix} \phi_{11}(3) & \phi_{12}(3) \\ \phi_{21}(3) & \phi_{22}(3) \end{bmatrix} & \begin{bmatrix} \phi_{11}(2) & \phi_{12}(2) \\ \phi_{21}(2) & \phi_{22}(2) \end{bmatrix} & \begin{bmatrix} \phi_{(11}(1) & \phi_{12}(1) \\ \phi_{21}(1) & \phi_{(22}(1) \end{bmatrix}  \\ 
\end{bmatrix}
\end{align}$$

[So $ \large \boldsymbol{\phi}^* \boldsymbol{x}_{t-1} $ looks like:

$$ \large 
\begin{bmatrix}
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}& \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}& \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \\ 
\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix} \\
\begin{bmatrix} \phi_{11}(3) & \phi_{12}(3) \\ \phi_{21}(3) & \phi_{22}(3) \end{bmatrix} & \begin{bmatrix} \phi_{11}(2) & \phi_{12}(2) \\ \phi_{21}(2) & \phi_{22}(2) \end{bmatrix} & \begin{bmatrix} \phi_{(11}(1) & \phi_{12}(1) \\ \phi_{21}(1) & \phi_{(22}(1) \end{bmatrix} \\ 
\end{bmatrix}
\begin{bmatrix}
\begin{bmatrix} \tilde{r}_{t-3,1}\\ \tilde{r}_{t-3,2})\end{bmatrix}\\ \begin{bmatrix}(\tilde{r}_{t-2,1} \\ \tilde{r}_{t-2,2})\end{bmatrix}\\ \begin{bmatrix} \tilde{r}_{t-1,1} \\ \tilde{r}_{t-1,2} \end{bmatrix} \end{bmatrix}$$


Equation (8.15) is a VAR(1) model for $\large \boldsymbol{x}_t$, which contains $\large \boldsymbol{r}_t$ [isnt it $\large \boldsymbol{\tilde{r}}_t$?] as its last k-components.  [$\large \boldsymbol{x}_{t-1}$ contains $\large \boldsymbol{r}_{t-1}$ as its last k-components in the equation directly above.] 

The results of a VAR(1) model shown in the previous section can now be used to derive properties of the VAR(p) model via equation (8.15). 

$$\large \begin{align} \boldsymbol{x}_t = \boldsymbol{\phi}^* \boldsymbol{x}_{t-1} + \boldsymbol{b}_t, \;\; & (8.15) \end{align}$$

For example, from the definition, $\large \boldsymbol{x}_t$ is weakly stationary if and only if $\large \boldsymbol{r}_t$ is weakly stationary. Therefore, the necessary and sufficient condition of weak stationarity for the VAR(p) model in (8.13) ...

$$\large \begin{align} \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \cdots + \boldsymbol{\phi}_p \boldsymbol{r}_{t−p} + \boldsymbol{a}_t, \;\; p > 0, & (8.13) \end{align}$$

... is that all eigenvalues of $\large \boldsymbol{\phi}^*$ in equation (8.15) ... 

$$\large \begin{align} \boldsymbol{x}_t = \boldsymbol{\phi}^* \boldsymbol{x}_{t-1} + \boldsymbol{b}_t, \;\; & (8.15) \end{align}$$

... are less than 1 in modulus. It is easy to show that 

$\large | \boldsymbol{I} - \boldsymbol{\phi}^* B | = | \boldsymbol{\phi} (B)|$. 

![image-12.png](attachment:image-12.png)

Therefore, similar to the VAR(1) case, the necessary and sufficient condition is equivalent to all zeros of the determinant $\large | \boldsymbol{\phi} (B)|$ being outside the unit circle.

[Therefore, similar to the VAR(1) case, the necessary and sufficient condition is equivalent to all zeros of the determinant $\large |\boldsymbol{I} - \boldsymbol{\phi}^* (B)|$ being outside the unit circle, too?]

[This may be the answer to the quandry vexxing me.  Inside unit circle has been eigenvalues of $\large \boldsymbol{\phi}$ which is obtained by determinant of $\large \boldsymbol{\phi} - \boldsymbol{I}$.  Outside the unit circle has been the determinant of $\large \boldsymbol{\phi}$ which is simply 1 + |$\large \boldsymbol{\phi} - \boldsymbol{I}$|.]

Of particular relevance to financial time series analysis is the **structure of the coefficient matrices** $\large \boldsymbol{\phi}_{\ell}$ of a VAR(p) model. 

[Think $\large \ell$ is the correct subscript for $\large \boldsymbol{\phi}$ as is for $\large \boldsymbol{\rho}$ and $\large \boldsymbol{\Gamma}$].

- For instance, if the (i,j)th element $\large \phi_{ij}(\ell)$ of $\large \boldsymbol{\phi}_{\ell}$ is zero for all $\large \ell$, then $\large r_{it}$ does not depend on the past values of $\large r_{jt}$. The structure of the coefficient matrices $\large \boldsymbol{\phi}_{\ell}$ thus provides information on the lead–lag relationship between the components of $\large{r}_t$.


## 8.2.4 Building a VAR(p) Model
As before, iterative a procedure to build a vector AR model for a time series:
- order specification
- estimation
- model checking  

[**Order specification**:]

The **partial autocorrelation function** of a **univariate** series **generalizes** to specify the order p of a **vector** series. [Tsay is going to estimate sequentially higher and higher order models to specify order, in a similar way thagt PAC function works.]

Consider these **consecutive** VAR models:

$$\large \begin{align} 
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{a}_t & \\
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{\phi}_2 \boldsymbol{r}_{t−2} + \boldsymbol{a}_t & \\
& \vdots & \\
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{\phi}_i \, \boldsymbol{r}_{t−i} + \cdots + \boldsymbol{a}_t & (8.16) \\
& \vdots & \\
\end{align}$$

![image.png](attachment:image.png)

Model parameters can be estimated by the **ordinary least-squares** (OLS) method which in **multivariate statistical analysis** is termed **the multivariate linear regression estimation** ; see Johnson and Wichern (1998).

For the i-th equation in equation (8.16)
- let $\large \boldsymbol{\widehat{\phi}}_j^{(i)}$ be the OLS estimate of $\large \boldsymbol{\phi}_j$ 
- let $\large \boldsymbol{\widehat{\phi}}_0^{(i)}$ be the OLS estimate of $\large \boldsymbol{\phi}_0$

where 
- the superscript (i) is used to denote that the estimates are for a VAR(i) model. 

[This way we know 
- how didstant the i-th estimated parameter is from the parameter that looks furthest back.
- how far into the sequence of consecutively higher order VAR models (8.16) we have tested.]

Then the residual is:

$$\large \boldsymbol{\widehat{a}}_t^{(i)} = \boldsymbol{r}_t - \boldsymbol{\widehat{\phi}}_0^{(i)} - \boldsymbol{\widehat{\phi}}_1^{(i)} \boldsymbol{r}_{t-1} - \cdots - \boldsymbol{\widehat{\phi}}_i^{(i)} \boldsymbol{r}_{t-i}.$$

![image-2.png](attachment:image-2.png)

For i = 0, 
[VAR(i=0): $\large \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{a}_t$?], 
the residual is defined as:

$\large \boldsymbol{\widehat{r}}_t^{(0)} = \boldsymbol{r}_t - \boldsymbol{\bar{r}}$

![image-10.png](attachment:image-10.png)

where 
- $\large \boldsymbol{\bar{r}}$ is the sample mean of $\large\boldsymbol{r}_t$. 

[Wonder why not use my model above; then $\large \boldsymbol{\widehat{a}}_t^{(0)} = \boldsymbol{r}_t - \boldsymbol{\widehat{\phi}}_0^{(0)}$ where $\large \boldsymbol{\widehat{\phi}}_0^{(0)} =  \boldsymbol{\bar{r}}$?]  

The **residual covariance matrix** is defined [per each VAR(i) model order level i] as:

$$\large \begin{align} \boldsymbol{\widehat{\Sigma}}_i = \frac{1}{T - 2i - 1} \sum_{t=i+1}^T \boldsymbol{\widehat{a}}_t^{(i)} (\boldsymbol{\widehat{a}}_t^{(i)})'. \;\;i ≥ 0 & \;\; (8.17) \end{align}$$

![image-9.png](attachment:image-9.png)

[As was the case when definiing $\boldsymbol{\Sigma}$ via expectatoins, this equation sums the matrices that are formed by the average of the outer products of each time step's residual vector against itself: i.e. the average time steps' components' residuals = the vector $\boldsymbol{\widehat{a}}_t^{(i)}$ = the vectors of VAR(i)-model estimated - actual returns across all time steps t beginning at t = i + 1, i.e. beginning at one time step beyond t = i = the order of the model considered so that there is enough data to compute that first residual vector. reminder: i is not a power, but just the order of the model considered.][notice DOF = T - 2i -1][also notice there's no mean computation; so must **assume** zero for the innovatoin as usual.]

To specify the order p, one tests the hypothesis $\large H_0: \boldsymbol{\phi_{\ell}} = \boldsymbol{0}$ versus the alternative hypothesis $\large H_a : \boldsymbol{\phi_{\ell}} ≠ \boldsymbol{0}$ sequentially for $\large \ell$ = 1, 2, ... 

![image-11.png](attachment:image-11.png)

For example, using the **first equation** in Eq. (8.16), ...


$$\large \begin{align} 
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{a}_t & \\
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{\phi}_2 \boldsymbol{r}_{t−2} + \boldsymbol{a}_t & \\
& \vdots & \\
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{\phi}_i \, \boldsymbol{r}_{t−i} + \cdots + \boldsymbol{a}_t & (8.16) \\
& \vdots & \\
\end{align}$$

... we can test the hypothesis $\large H_0 : \boldsymbol{\phi_1} = \boldsymbol{0}$ versus the
alternative hypothesis $\large H_a : \boldsymbol{\phi_1} ≠ \boldsymbol{0}$. The test statistic [for i = 1] is:

$$ \large M(1) = −\left( T − k − \frac{5}{2} \right) ln \left( \frac{|\boldsymbol{\widehat{\Sigma}}_1|}{|\boldsymbol{\widehat{\Sigma}}_0|}\right),$$

![image-8.png](attachment:image-8.png)

where 
- $\large \boldsymbol{\widehat{\Sigma}}_i$ is defined in equation (8.17) \
- |A| denotes the determinant of the matrix A. 
- ["the superscript (i) is used to denote that the estimates are for a VAR(i) model."]
- ["The product of the n eigenvalues of A is the same as the determinant of A." So, this is the ratio of the product of the eigenvalues of the covariance matrices of a high order model / a zero order model.]
- [Hey, wait a minute.  What is $\large \boldsymbol{\widehat{\Sigma}}_0$? the residual of i = 0 model is only actual r vs computed mean of r.]

Under some regularity conditions, the test statistic M(1) [for the i=1 case] is asymptotically a chi-squared distribution with $\large k^2$ degrees of freedom; see *Tiao and Box (1981)*.

In general, 
Tsay suggests using the i-th and (i − 1)th equations in equation (8.16) ...

$$\large \begin{align} 
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{a}_t & \\
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{\phi}_2 \boldsymbol{r}_{t−2} + \boldsymbol{a}_t & \\
& \vdots & \\
\boldsymbol{r}_t = & \, \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1 \boldsymbol{r}_{t−1} + \boldsymbol{\phi}_i \, \boldsymbol{r}_{t−i} + \cdots + \boldsymbol{a}_t & (8.16) \\
& \vdots & \\
\end{align}$$

... to test $\large H_0 : \boldsymbol{\phi}_i = 0$ versus $\large H_a : \boldsymbol{\phi}_i ≠ 0$; that is, testing a VAR(i) model versus a VAR(i − 1) model. 

The general test statistic is:

$$ \large \begin{align} M(i) = −\left( T − k − i - \frac{3}{2} \right) ln \left( \frac{|\boldsymbol{\widehat{\Sigma}}_i|}{|\boldsymbol{\widehat{\Sigma}}_{i-1}|}\right), \;\; & (8.18) \end{align}$$

![image-7.png](attachment:image-7.png)

Asymptotically, M(i) is distributed as a chi-squared distribution with $\large k^2$ degrees of freedom.

Alternatively, the **Akaike information criterion** (AIC) or its variants can be used to select the order p. 

**Assume** that $\large \boldsymbol{a}_t$ is **multivariate normal** and consider the i-th equation in (8.16). 

One can estimate the model by the **maximum-likelihood** (ML) method. 
- For AR models, the **OLS)) estimates $\large \boldsymbol{\widehat{\phi}}_0$ and $\large \boldsymbol{\widehat{\phi}}_j$ are equivalent to [their] **(conditional) ML** estimates. 
- However, [for AR models] there are differences between the [OLS and (conditional) ML] estimates of $\large \boldsymbol{\Sigma}$ 
    - The [conditional] ML estimate of $\large \boldsymbol{\Sigma}$ is

$$\large \begin{align} \boldsymbol{\tilde{\Sigma}}_i = \frac{1}{T} \sum_{t=i+1}^T \boldsymbol{\widehat{a}}_t^{(i)} [\boldsymbol{\widehat{a}}_t^{(i)}]'. \;\;\;\; (8.19) \end{align}$$

![image-6.png](attachment:image-6.png)

[Looks the same as OLS except the degrees of freedom are T for ML > T - 2i -1 for OLS.]

The AIC of a VAR(i) model under the normality assumption is defined as:

$$\large \text{AIC(i)} = ln(| \boldsymbol{\tilde{\Sigma}}_i |) + \frac{2 k^2 i}{T}$$

![image-5.png](attachment:image-5.png)

For a given vector time series, the procedure is to select the AR order p such that:

$$\large \text{AIC(p)} = min_{0 ≤ i ≤ p_0}  \text{AIC(i)}$$, 
where 
- $\large p_0$ is a prespecified positive integer.
- [Notation means: select p by finding the i between 0 and $p_0$ that minimizes AIC(i)]

Other information criteria available for VAR(i) models are:

$$\large \text{BIC(i)} = ln(| \boldsymbol{\tilde{\Sigma}}_i |) + \frac{k^2i \, ln(T)}{T}$$

![image-3.png](attachment:image-3.png)

$$\large \text{HQ(i)} = ln(| \boldsymbol{\tilde{\Sigma}}_i |) + \frac{2k^2i \, ln[ln(T)]}{T}$$

![image-4.png](attachment:image-4.png)

The HQ criterion is proposed by Hannan and Quinn (1979).

### Example 8.4. 

Assuming that the bivariate series of monthly log returns of IBM stock and the S&P 500 index from example 8.1 follows a VAR model, Tsay applies the M(i) statistics and AIC to the data. Table 8.3 shows the results of these statistics. Both statistics indicate that a VAR(5) model might be adequate for the data. Table 8.3's caption says "The 5% and 1% critical values of a chi-squared distribution with 4 degrees of freedom are 9.5 and 13.3." The M(i) statistics are marginally significant at lags 1, 3, and 5 at the 5% level [10.76, 10.34, 12.07 are all > 9.5]. The minimum of AIC [6.782] occurs at order 5. For this particular instance, the M(i) statistic is only marginally significant at the 1% level when i = 2 [M(2) = 13.41 > 13.3 for 1% significance.], confirming the previous observation that the dynamic linear dependence between the two return series is weak.

[So. seems Tsay favors the 5% significance level which allows him to choose amongst lags 1, 3, and 5, and AIC selelcts lag 5.]

![image.png](attachment:image.png)

Asymptotically, M(i) is distributed as a chi-squared distribution with $\large k^2$ degrees of freedom. 


### Estimation and Model Checking

[Estimation:]
For a specified VAR model, one can estimate the parameters using either the **OLS method** or the **ML method**. The **two methods are asymptotically equivalent**. Under some regularity conditions, the **estimates are asymptotically normal**; see Reinsel (1993). [Means that the estimates are normally distributed around true values.]

[Model checking for adequacy:]
A fitted model should then be checked carefully for any possible inadequacy. The $\large Q_k(m)$ statistic can be applied to the residual series to check the assumption that there are **no serial or cross correlations in the residuals**. For a fitted VAR(p) model, the $\large Q_k(m)$ statistic of the residuals is **asymptotically a chi-squared distribution** with $\large k^2m − g$ **degrees of freedom**, where g is the number of estimated parameters in the AR coefficient matrices; see Lutkepohl (2005).


k number of assets

m tested periods of serial correlation for Q stat

g number of periods looking back in model + 1 for q_0 ?

### Example 8.4 (Continued). 

![image.png](attachment:image.png)

Table 8.4(a) shows the estimation results of a VAR(5) model for the bivariate series of monthly log returns of IBM stock and the S&P 500 index. The specified model is in the form:

$$\large \boldsymbol{r}_t = \boldsymbol{\phi}_0 + \boldsymbol{\phi}_1\boldsymbol{r}_{t−1} + \boldsymbol{\phi}_2\boldsymbol{r}_{t−2} + \boldsymbol{\phi}_3\boldsymbol{r}_{t−3} + \boldsymbol{\phi}_5\boldsymbol{r}_{t−5} + \boldsymbol{a}_t, \;\; (8.20)$$

![image-2.png](attachment:image-2.png)

where 
- the first component of $\large \boldsymbol{r}_t$ denotes IBM stock returns. 
- this particular instance does not use AR coefficient matrix at lag 4 because of the weak serial dependence of the data. 

In general, when the M(i) statistics and the AIC criterion specify a VAR(5) model, all five AR lags should be used. 

Table 8.4(b) shows the estimation results after some statistically insignificant parameters are set to zero. [6 **coefficient** parameters are not set to zero; so is this the 6 that Tsay is using to set the DOF for the Chi-square test? or is it i=0-5 subscripts for $\large \boldsymbol{\phi}_i$?] [Therefore?], The Qk(m) statistics of the residual series for the fitted model in Table 8.4(b) give Q2(4) = 16.64 and Q2(8) = 31.55. Since the fitted VAR(5) model has **six parameters *in* the AR coefficient matrices**, these two Qk(m) statistics are distributed asymptotically as a chi-squared distribution with degrees of freedom, 10 and 26 [$\large (k=2)^2(m=4)-(g=6) = 10$] and [$\large (k=2)^2(m=8)-(g=6) = 26$], respectively. The p-values of the test statistics are 0.083 and 0.208, and hence the fitted model is adequate [cannot reject the null hypothesis of no serial correlation] at the 5% significance level. As shown by the univariate analysis [elsewhere?], the return series are likely to have **conditional heteroscedasticity**. We discuss **multivariate volatility** in Chapter 10.

From the fitted model in Table 8.4(b), Tsay makes the following observations: 
- (a) The concurrent correlation coefficient between the two innovational series is ...

$$ \large \frac{\boldsymbol{\Sigma}_{21 or 12}}{\sqrt{\boldsymbol{\Sigma}_{11}\boldsymbol{\Sigma}_{22}}} = \frac{24}{48 × 30} = 0.63$$

... which, as expected, is close to the sample correlation coefficient between $\large r_{1t}$ and $\large r_{2t}$ . 

- (b) The two log return series have positive and significant means, implying that the log prices of the two series had an upward trend over the data span. [See the SCA Demonstration output below.]
- (c) The model shows that

$$\large IBM_t = 1.0 + 0.13 SP5_{t−1} − 0.09 SP5_{t−2} + 0.09 SP5_{t−5} + a_{1t} $$
$$\large SP5_t = 0.4 + 0.08 SP5_{t−1} − 0.06 SP5_{t−3} + 0.09 SP5_{t−}5 + a_{2t} $$ 


Consequently, at the 5% significance level, 
- there is a unidirectional dynamic relationship from the monthly S&P 500 index return to the IBM return. If the S&P 500 index represents the U.S. stock market, then IBM return is affected by the past movements of the market.
- However, past movements of IBM stock returns do not significantly affect the U.S. market, even though the two returns have substantial concurrent correlation. 

Finally, the fitted model can be written as ...

$$\large
\begin{bmatrix} IBM_t \\ SP5_t \end{bmatrix}
=
\begin{bmatrix} 1.0 \\ 0.4 \end{bmatrix}
+
\begin{bmatrix} 0.13 \\ 0.08 \end{bmatrix} SP5_{t-1}
-
\begin{bmatrix} 0.09 \\ 0 \end{bmatrix} SP5_{t-2}
-
\begin{bmatrix} 0 \\ 0.06 \end{bmatrix} SP5_{t-3}
+
\begin{bmatrix} 0.09 \\ 0.09 \end{bmatrix} SP5_{t-5}
+
\begin{bmatrix} a_{1t} \\ a_{2t} \end{bmatrix}
$$

![image-3.png](attachment:image-3.png)

... indicating that $\large SP5_t$ is the driving factor of the bivariate series.


### Forecasting
Treating a properly built model as the **true model**, one can apply the **same techniques** as those in the univariate analysis to produce **forecasts** and **standard deviations of** the associated **forecast errors**. 

For a **VAR(p)** model, 

the **1-step-ahead** forecast at the **time origin** **h** is: 

[Actual would be denoted $\large \boldsymbol{r}_{h+1}$; this is forecast notation.]

$$\large \boldsymbol{r}_h(1) = \boldsymbol{\phi}_0 + \sum_{i=1}^p \boldsymbol{\phi}_i \boldsymbol{r}_{h+1-i}$$

![image.png](attachment:image.png)

and the associated forecast error is:

$$\large \boldsymbol{e}_h(1) = \boldsymbol{a}_{h+1}$$

![image-2.png](attachment:image-2.png)

[ is that $\large \boldsymbol{r}_{h+1} = \boldsymbol{r}_h(1) + \boldsymbol{a}_{h+1} \longrightarrow \boldsymbol{a}_{h+1} = \boldsymbol{r}_{h+1} - \boldsymbol{r}_h(1) $? or is it $\large [\boldsymbol{r}_t - \boldsymbol{r}_h(1)]$ as shown below? ]

and the covariance matrix of the forecast error is:

$$\large \boldsymbol{\Sigma}$$

and for the 2-step ahead forecasts, substitute $\large \boldsymbol{r}_{h+1}$ by its forecast $\large \boldsymbol{r}_h(1)$ and start the summation 2 steps back instead of 1 step back, in order to obtain:

$$\large \boldsymbol{r}_h(2) = \boldsymbol{\phi}_0 + \boldsymbol{r}_h(2) + \sum_{i=2}^p \boldsymbol{\phi}_i \boldsymbol{r}_{h+1-i}$$

![image-3.png](attachment:image-3.png)

[ you have the data denoted $\large \boldsymbol{r}_{h+1-i}$ ]

and the associated forecast error is:

$$\large \boldsymbol{e}_h(2) = \boldsymbol{a}_{h+2} + \boldsymbol{\phi}_1 [\boldsymbol{r}_t - \boldsymbol{r}_h(1)] = \boldsymbol{a}_{h+2} + \boldsymbol{\phi}_1 \boldsymbol{a}_{h+1}$$

![image-4.png](attachment:image-4.png)

[ Is this an error?  Shouldn't that be $\large [\boldsymbol{r}_{h+1} - \boldsymbol{r}_h(1)]$ not $\large [\boldsymbol{r}_t - \boldsymbol{r}_h(1)]$ or are we relying on the only info we have at time t? ]

The covariance matrix of the forecast error [assume Tsay means "for 2-step ahead forecast"] is:

$$\large \boldsymbol{\Sigma} + \boldsymbol{\phi}_1 \boldsymbol{\Sigma} \boldsymbol{\phi}_1'$$

![image-5.png](attachment:image-5.png)

[but he doesnt give this new covariance matrix a name, though it is consistent with how the previous covariance matrix was assembled as an aggregation of previous ones.]


If $\large \boldsymbol{r}_t$ is weakly stationary, then 
- the $\large \ell$-step-ahead forecast $\large \boldsymbol{r}_h(\ell)$ converges to its mean vector $\large \boldsymbol{\mu}$ as the forecast horizon $\large \ell$ increases and 
- the covariance matrix of its forecast error converges to the covariance matrix of $\large \boldsymbol{r}_t$ [Isnt that $\large \boldsymbol{\Gamma}$?].

![image-6.png](attachment:image-6.png)

Table 8.5 provides 1-step- to 6-step-ahead forecasts of IBM and S&P 500 monthly percentage log returns at the forecast origin h = 996 obtained by the refined VAR(5) model in Table 8.4(b). 
- The standard errors of the forecasts converge to the sample standard errors 7.03 and 5.53.  [See the SCA Demonstration output below for those identical figures.]

Summary of building a VAR model in three steps: 
- 1. Use the test statistic M(i) or information criterion to identify the order
- 2. Estimate the specified model by using the least-squares method and, (if needed), reestimate the model by removing statistically insignificant parameters
- 3. Use the Qk(m) statistic of the residuals to check the adequacy of a fitted model. 
- 4. Other characteristics of the residual series, such as conditional heteroscedasticity and outliers, can also be checked for adequacy of the model. 

If the fitted model is adequate, then 
- obtain forecasts and 
- make inference concerning the dynamic relationship between the variables.

SCA perform this section's analysis: commands used include miden, mtsm, mest, and mfore, where the prefix m stands for multivariate. 

![image-7.png](attachment:image-7.png)
![image-8.png](attachment:image-8.png)
![image-9.png](attachment:image-9.png)
![image-10.png](attachment:image-10.png)