## Chapter 8 Appendix A: Review of vectors and matrices

[All these images are broken - see "AFTS3 from 3.6"]

Review of algebra and properties of vectors and matrices:

An m x n real-valued matrix is an m x n array of real numbers. 

For example:

$$\large A = 
\begin{bmatrix}
2 & 5 & 8\\
-1 & 3 & 4
\end{bmatrix}
$$


is a 2 x 3 matrix with 2 rows and 3 columns.  

Generally, an m x n matrix is written:

(8.46)

$$\large A \equiv [a_{ij}] =
\begin{bmatrix}
a_{11} & a_{12} & \dots & a_{1,n-1} & a_{1n} \\
a_{21} & a_{22} & \dots & a_{2,n-1} & a_{2n} \\
\vdots & \vdots &  & \vdots & \vdots \\
a_{m1} & a_{m2} & \dots & a_{m,n-1} & a_{mn} \\
\end{bmatrix}
$$

where:
- m and n are positive integers denoting row dimension and column dimension of A
- $a_{ij}$ is referred to as the (i,j)th element of A.  
- $a_{ii}$ are the **diagonal** elements of the matrix.
- an m x 1 matrix forms an m-dimensional **column vector**.
- 1 x n matrix is an n-dimensional row vector.
- in literature, a vector is often meant to be a column vector.
- if m = n then the matrix is a sqaure matrix. 
- if $a_{ij} = 0 for i ≠ j$ then matrix A is **a diagonal matrix**
- if $a_{ij}$ = 0 for i ≠ j and $a_{ii} = 1$ for all i, then A is the m x m **identity matrix**, commonly denoted by $I_m$ or **I** if the dimension is clear.
- the notation $A' = [a_{ij}'$ denotes the transpose of A:
    - from the definition, $a_{ij}' = a_{ji}$
    - (A')' = A
    - if A' = A then A is a **symmetric matrix**.
- the n x m matrix A' is the **transpose** of the matrix A [notice m is still the first subscript]:

$$\large A' =
\begin{bmatrix}
a_{11} & a_{21} & \dots & a_{m-1,1} & a_{m1} \\
a_{12} & a_{22} & \dots & a_{m-1,2} & a_{m2} \\
\vdots & \vdots &  & \vdots & \vdots \\
a_{1n} & a_{2n} & \dots & a_{m-1,n} & a_{mn} \\
\end{bmatrix}
$$


For example:

$$\large
\begin{bmatrix}
2 & -1 \\
5 & 3 \\
8 & 4 \\
\end{bmatrix}
\text{ is the transpose of }
\begin{bmatrix}
2 & 5 & 8\\
-1 & 3 & 4
\end{bmatrix}
$$


### Basic operations

Suppose $A = [a_{ij}]_{\text{m x n}}$ and $C = [a_{ij}]_{\text{p x q}}$ are two matrices with dimensions given in their subscripts.  Let *b* be a real number.  Define basic matrix operations:

- Addition: $A + C = [a_{ij} + c_{ij}]_{\text{m x n}}$ if m = p and n = q.
- Subtraction: $A - C = [a_{ij} - c_{ij}]_{\text{m x n}}$ if m = p and n = q.
- Scalar multiplication: $bA = [ba_{ij}]_{\text{m x n}}$
- Multiplication: $AC = [ \sum_{v=1}^n a_{iv}c_{vj} ]_{\text{m x n}}$ provided n = p.

When the dimensions of matrices satisfy the condition for multiplication to take place, the two matrices are said to be **conformable**.  For example:


$$\large
\begin{align}
\begin{bmatrix}
2 & 1 \\
1 & 1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 2 & 3\\
-1 & 2 & -4
\end{bmatrix}
&=
\begin{bmatrix}
2 \times 1 - 1 \times 1 & 2 \times 2 + 1 \times 2 & 2 \times 3 - 1 \times 4\\
1 \times 1 - 1 \times 1 & 1 \times 2 + 1 \times 2 & 1 \times 3 - 1 \times 4
\end{bmatrix}\\
&=
\begin{bmatrix}
1 & 6 & 2\\
0 & 4 & -1
\end{bmatrix}
\end{align}
$$

Important rules of matrix operations include:
- (a) (AC)' = C'A'
- (b) AC ≠ CA in general

### Inverse, Trace, Eigenvalue and Eigenvector

A square matrix $A_{\text{m x m}}$ is **nonsingular** or **invertible** 
- if there exists a unque matrix $C_{\text{m x m}}$ such that $AC = I_m$ = the m x m identity matrix.  
- **C** is called the inverse matrix of **A** 
- **C** is denoted C = $A^{-1}$

The **trace** of $A_{\text{m x m}}$ is the sum of its diagonal elememts:

$$\large \text{tr(A) = }\sum_{i=1}^m a_{ii}$$

[the above works if i < j when matrix is short and wide, but if j<i and tall and thin, then I would suggest: 
                                     
$$\large \text{tr(A) = }\sum_{j=1}^m a_{jj}$$

- $\large tr(A + C) = tr(A) + tr(C)$
- $\large tr(A) = tr(A')$
- $\large tr(AC) = tr(CA)$ provided that the two matrices are conformable.

A number $\large \lambda$ and an *m x 1 vector **b*** possibly complex valued are a **right eigenvalue** and **eigenvector** pair of the matrix **A** if **Ab = $\gamma$b**:
- there are m possible eigenvalues for the matrix **A**
- for a real-valued matrix **A** complex eigenvalues occur in conjugated pairs.
- the matrix **A** is nonsingular IFF all eigenvalues are nonzero.
- denote eigenvalues 

$\large \{\lambda_i| i = 1, \ldots, m\}$:

- [relationship of trace to eigenvalues:]

    $$\large \text{tr(A) =} \sum_{i=1}^m a_{ii} = \sum_{i=1}^m \lambda_i$$
    

- **determinant** of matrix A can be defined as 

    $$\large \vert A \lvert = \prod_{i=1}^m \lambda_i$$
    
   
Graybill (1969) is the reference text for this material and in particular the determininant.

- the **rank** of a matrix $A_{\text{m x m}}$ is the number of nonzero eigenvalues of the symmetric matrix **AA'**.  [So multiply AA' and find out the number of nonzero eigenvalues and you've the rank of A.]

- For a nonsingular matrix A, the inverse of the transpose equals the transpose of the inverse:

$$\large (A^{-1})' = (A')^{-1}$$

### Positive-Definite matrix

A square matrix A (m x m) is a **positive definite** matrix if 
- A is symmetric and
- all eigenvalues of A are positive

Alternatively, a [rectangular] matrix A is a **positive definite** matrix if for any nonzero m-dimensional vector b, **b'Ab** > 0.  [To satidfy myself that b'Ab is a a real value and not a matrix, I considered this:
- if b can multiply from right then $n_A = m_b$
- if b' can multiply from left then $m_b = A_m$
- so **b'A** is [1 x m][m x n] = [1 x n] and **b'Ab** is [1 x m][m x n][n x 1] = 1x1 and can be evaluated as > 0 or not.]

Properties of **positive definite matrix A**
- all eigenvlues of **A** are real and positive
- the matrix can be decomposed in what is referred to as the **spectral decompostion** of the matrix A:

$$\large A = P \Lambda P'$$


where:
- $\large \mathbf{\Lambda}$ is diagonal matrix consisting of all eigenvalues [$\lambda_i$] of A 
- $\large \mathbf{P}$ is an m x m matrix of the m right eigenvectors of A.

It is common to write the eigenvalues as:

$$\large \lambda_1 ≥ \lambda_1 ≥  \cdots ≥ \lambda_m$$

and the eigenvectors as 

$$\large e_1, \cdots, e_m$$

such that 

$$\large Ae_i = \lambda_i e_i$$ 

... and the eigenvectors are of unit length and orthogonal --> the matrix P [that holds them] is referred to as an **orthogonal matrix**.  [vectors are normalized to unit length by the $\lambda_i$ in $\Lambda$.]

$$\large e_i'e_i = 1$$

$$\large e_i'e_j = 0 \text{ if i ≠ j and if the eigenvalues are distinct}$$ 

For example, the matrix $\large \Sigma$ is **positive definite**:

$$\large \Sigma =
\begin{align}
\begin{bmatrix}
2 & 1 \\
1 & 1 \\
\end{bmatrix}
\end{align}$$


- these simple calculations show that 
    - 3 and 1 are eigenvalues of $\large \Sigma$ 
    - with normalzed egenvectors $\large (\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}})'$ and $\large (\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}})'$ respectively:

[here are those simlpe calculations that show the eigenvalues and eigenvectors relating to each other and to $\large \Sigma$]:

$$\large
\begin{bmatrix}
2 & 1 \\
1 & 2 \\
\end{bmatrix}
\begin{bmatrix}
1 \\
1 \\
\end{bmatrix}=
3
\begin{bmatrix}
1 \\
1 \\
\end{bmatrix}$$


$$\large
\begin{bmatrix}
2 & 1 \\
1 & 2 \\
\end{bmatrix}
\begin{bmatrix}
1 \\
-1 \\
\end{bmatrix}=
1
\begin{bmatrix}
1 \\
1 \\
\end{bmatrix}$$

Spectral decomposition holds if one can verify that


$$\large
\begin{bmatrix}
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\
\end{bmatrix}
\begin{bmatrix}
2 & 1 \\
1 & 2 \\
\end{bmatrix}
\begin{bmatrix}
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\
\end{bmatrix}
=
\begin{bmatrix}
2 & 1 \\
1 & 2 \\
\end{bmatrix}
$$

[where:]
- [the eigenvectors are normalized to unit length:]  
$\large \sqrt{(\frac{1}{\sqrt{2}})^2 + (\frac{1}{\sqrt{2}})^2} = 1$   
$\large \sqrt{(\frac{1}{\sqrt{2}})^2 + (-\frac{1}{\sqrt{2}})^2} = 1$ 

- [the eigenvectors are orthogonal:]   
$\large (\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}})' (\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}) = 0$


#### Cholesky decomposition

For a symmetric matrix **A** there exists 
- a lower triangular matrix **L** with diagonal elements being 1, and
- a diagonal matrix **G** such that [references Strang (1980):

$$\large A = LGL'$$

where:
- if **A** is positive definite, then the diagonal elements of **G** are positive [and the decomposition proceeds further] to what is called the **Cholesky decomposition** of A:

$$\large \mathbf A = L\sqrt{G}\sqrt{G}L' = (L\sqrt{G})(L\sqrt{G})'$$

- $\large \mathbf L \sqrt{G}$ is **also** a lower triangular matrix
- the square root is taken element by element.
- the decomposition shows that a positive-definite matrix **A** can be diagonalized as 

$$\large \mathbf L^{-1}A(L')^{-1} = L^{-1} A (L^{-1})' = G$$

[where G is the collection of eigenvalues $\lambda_i$ that can replace matrix A as was stated above: $\large \mathbf Ab = \lambda b$. So, it makes sense that rearranging the Cholesky's decomposition that decomposes A into its eigenvectors and eigenvalues would result in diagonalized A in the form of the collection of $\large \lambda_i$ denoted $\large \Lambda$.]

Since **L** is a lower triangular matrix with unit diagonal elements, $\large L^{-1}$ is also a lower triangular matrix with unit diagonal elements.  Consider e.g. the prior 2 x 2 matrix $\large \Sigma$ for which it's easy to verify that these satisfy $\large \mathbf \Sigma = LGL'$

$$\large 
\mathbf{L} = 
\begin{bmatrix}
1.0 & 0.0 \\
0.5 & 1.0 \\
\end{bmatrix}
\text{ and }
\mathbf{G} = 
\begin{bmatrix}
2.0 & 0.0 \\
0.5 & 1.5 \\
\end{bmatrix}
$$

Additionally:

$$\large 
\mathbf{L}^{-1} = 
\begin{bmatrix}
1.0 & 0.0 \\
-0.5 & 1.0 \\
\end{bmatrix}
\text{ and }
\mathbf L^{-1}\Sigma(L^{-1})' = G$$

### Vectorization and Kronecker Product

Writing an m x n matrix **A** in its columns as:

$$\large \mathbf A = [a_1, \ldots, a_n]$$, 

[here, you are looking at the matrix represented by matrix columns denoted by $\mathbf a_1, \ldots, a_n$ in a list denoted by brackets.]

define **the stacking operation** ["*vectorization*"] as: 

$$\large \text{vec(A) } = (a_1', a_2', \ldots, a_m')'$$ 

[here you are looking at the matrix rows denoted by $\mathbf a_1', \ldots, a_m'$ containted in a vector denoted by $(\ldots)$'.

which is an *mn x 1* vector.

[Parentheses () denote vector and brackets [] denote matrices: His example supports the second, not this first bullet here:
- $\large \text{vec(A) } = (a_1', a_2', \ldots, a_m')' = ((a_{11}, a_{12}, \ldots, a_{1n}),(a_{21}, a_{22}, \ldots, a_{2n}), \ldots, (a_{m1}, a_{m2}, \ldots, a_{mn}))'$  
- $\large \text{vec(A) } = (a_1', a_2', \ldots, a_n')' = ((a_{11}, a_{21}, \ldots, a_{m1}),(a_{12}, a_{22}, \ldots, a_{m2}), \ldots, (a_{1n}, a_{2n}, \ldots, a_{mn}))'$ 
- This is not a matrix but a vector that has a quantity of rows = m*n and 1 column.  
- In these representation directly above, ' outside the $(\ldots)$ indicates transpose because it is a vector on its side when written in the book, but it is mn x 1 or mn tall and 1 thin.]

For 2 matrices $\large \mathbf{A}_{\text{m x n}}$ and $\large \mathbf{C}_{\text{p x q}}$, the **Kronecker product** between **A** and **C** is:

$$\large \mathbf{A} \bigotimes \mathbf{C} = 
\begin{bmatrix}
a_{11}\mathbf{C} & a_{12}\mathbf{C} & \cdots & a_{1n}\mathbf{C}&\\
a_{21}\mathbf{C} & a_{22}\mathbf{C} & \cdots & a_{2n}\mathbf{C}&\\
\vdots & \vdots & & \vdots\\
a_{m1}\mathbf{C} & a_{m2}\mathbf{C} & \cdots & a_{mn}\mathbf{C}&\\
\end{bmatrix}_{\text{mp x nq}}$$

For example, 

$$\large 
\mathbf{A} = 
\begin{bmatrix}
2 & 1 \\
-1 & 3 \\
\end{bmatrix}, \;\;
\mathbf{C} = 
\begin{bmatrix}
4 & -1 & 3 \\
-2 & 5 & 2 \\
\end{bmatrix}
$$

Then ...

$\large \text{vec} (\mathbf{A}) \text{= (2, -1, 1, 3)'}$

$\large \text{vec} (\mathbf{C}) \text{= (4, -2, -1, 5, 3, 2)'}$


$$\begin{align} \large \mathbf{A} \bigotimes \mathbf{C} &= 
\begin{bmatrix}
2 \times \begin{bmatrix}4 & -1 & 3 \\ -2 & 5 & 2 \end{bmatrix} & 1 \times \begin{bmatrix}4 & -1 & 3 \\ -2 & 5 & 2 \end{bmatrix} \\
-1 \times \begin{bmatrix}4 & -1 & 3 \\ -2 & 5 & 2 \end{bmatrix} & 3 \times \begin{bmatrix}4 & -1 & 3 \\  -2 & 5 & 2 \end{bmatrix} \end{bmatrix}\\
&= \begin{bmatrix}
\begin{bmatrix}8 & -2 & 6 \\ -4 & 10 & 4 \end{bmatrix} & \begin{bmatrix}4 & -1 & 3 \\ -2 & 5 & 2 \end{bmatrix} \\
\begin{bmatrix}-4 & 1 & -3 \\ 2 & -5 & -2 \end{bmatrix} & \begin{bmatrix}12 & -3 & 9 \\  -6 & 15 & 6 \end{bmatrix} \end{bmatrix}\\
&=\begin{bmatrix}
8 & -2 & 6 & 4 & -1 & 3 \\
-4 & 10 & 4 & -2 & 5 & 2 \\
-4 & 1 & -3 & 12 & -3 & 9 \\
2 & -5 & -2 & -6 & 15 & 6 
\end{bmatrix}
\end{align}
$$

Assuming that the dimensions are appropriate, the following properties are available for the two operators:
1. $\large \mathbf{A} \bigotimes \mathbf{C} ≠ \mathbf{C} \bigotimes \mathbf{A} \text{ in general.}$   
2. $\large (\mathbf{A} \bigotimes \mathbf{C})' = \mathbf{A}' \bigotimes \mathbf{C}' $.
3. $\large \mathbf{A} \bigotimes (\mathbf{C + D}) =  \mathbf{A} \bigotimes \mathbf{C} + \mathbf{A} \bigotimes \mathbf{D}$.
4. $\large (\mathbf{A} \bigotimes \mathbf{C}) (\mathbf{F} \bigotimes \mathbf{G}) = (\mathbf{AF}) \bigotimes (\mathbf{CG})$.
5. $\large \text{If A and C are invertible, then } (\mathbf{A} \bigotimes \mathbf{C})^{-1} = \mathbf{A}^{-1} \bigotimes \mathbf{C}^{-1}$
6. $\large \text{For square matrices A and C, tr}(\mathbf{A} \bigotimes \mathbf{C}) = tr(\mathbf{A}) tr(\mathbf{C})$
7. vec(**A** + **C**) = vec(**A**) + vec(**C**).
8. $\large \text{vec(ABC)} = (C' \bigotimes A) \text{vec(B).}$
9. tr(**AC**) = vec(**C**')' vec(**A**) = vec(**A**')'vec(**C**).
10. $\large \begin{align} \text{tr(ABC)} &= \text{vec}(\mathbf{A}')(\mathbf{C}' \bigotimes \mathbf{I}) \text{vec}(\mathbf{B}) &= \text{vec}(\mathbf{A'})' (\mathbf{I} \bigotimes \mathbf{B})  \text{vec}(\mathbf{C})\\
&= \text{vec}(\mathbf{B}')(\mathbf{A}' \bigotimes \mathbf{I}) \text{vec}(\mathbf{C}) &= \text{vec}(\mathbf{B'})' (\mathbf{I} \bigotimes \mathbf{C})  \text{vec}(\mathbf{A})\\
&= \text{vec}(\mathbf{C}')(\mathbf{B}' \bigotimes \mathbf{I}) \text{vec}(\mathbf{A}) &= \text{vec}(\mathbf{C'})' (\mathbf{I} \bigotimes \mathbf{A})  \text{vec}(\mathbf{B}) \end{align}$


Multivariate statistical analysis often deals with symmetric matrices.  It is therefore convenient to generalize the stacking operation to the **half-stacking** operation that consists of elements on or below the main diagonal.  Specificaslly, for a symmetric square matrix $\large \mathbf{A} = [a_{ij}]_{\text{k x k}}$ define:

$$\large vech(\mathbf{A}) = (\mathbf{a}_{1.}', \mathbf{a}_{2*}', \ldots, \mathbf{a}_{k*}')'$$

where:
- $\large \mathbf{a}_1$ is the first column of **A** 
- $\large \mathbf{a}_{i*} = (a_{ii}, a_{i+1,i}, \ldots, a_{ki})'$ is a (k - i + 1) dimensional vector.
- The dimension of vech(**A**) is $\large k\frac{(k + 1)}{2}$

For example, suppose that k = 3.  Then $\large vech(\mathbf{A}) = (a_{11}, a_{21}, a_{31}, a_{22}, a_{32}, a_{33})'$ which is a six dimensional vector. [Matrix **A** would be a diagonal matrix that looks like this:]

$\large \begin{matrix}
a_{11} & & \\
a_{21} & a_{22} & \\ 
a_{31} & a_{32} & a_{33}
\end{matrix}$