# Linear Algebra Review

In this section, I will add problems reviewing linear algebra concepts that are helpful to know in statistics.

# Notation

A scalar is a real number in $\mathbb {R}$. A vector is a list or array of real numbers in $\mathbb{R}^{d}$ where $d$ is the length of the list/array. A matrix is a 2-dimensional array of numbers. We will denote scalars with lowercase unbolded numbers ($c$), vectors with lowercase bolded letters ($\pmb{v}$), and matrices with uppercase unbolded letters ($X$).



## Basic Operations for Vectors

Let $\pmb {v}$ be a $d$-dimensional vector ($\pmb {v} \in {\mathbb R}^{d}$). Denote the entries of the vector to be: 
$\pmb{v} = \begin{pmatrix} v_{1} \\ \vdots \\ v_{d} \end{pmatrix}$

### Addition

Let $\pmb{v} \in \mathbb{R}^{d}$ and $\pmb{w} \in \mathbb{R}^{d}$. $(\pmb{v} + \pmb{w})_{i} = \pmb{v}_{i} + \pmb{w}_{i}$, where $\pmb{v}+\pmb{w} \in \mathbb{R}^{d}$. More explicitly, we have

$$\pmb{v} + \pmb{w} = \begin{pmatrix} v_{1} \\ \vdots \\ v_{d} \end{pmatrix} + \begin{pmatrix} w_{1} \\ \vdots \\ w_{d} \end{pmatrix} = \begin{pmatrix} v_{1} + w_{1} \\ \vdots \\ v_{d} + w_{d} \end{pmatrix}$$


### Scalar multiplication

When we multiply a scalar $c$ to $\pmb v$, we have that $c\pmb{v}$ is also a real vector whose entries are $(c\pmb{v})_{i} = c\pmb{v}_{i}$ for $i \in [1,{d}]$.

### Transpose

The transpose of $\pmb{v}$ is $\pmb{v}' =  \begin{pmatrix} v_{1} & ... & v_{d} \end{pmatrix}$.

### Multiplication of two vectors

Inner product: Let $\pmb{v} \in \mathbb{R}^{d}$ and $\pmb{w} \in \mathbb{R}^{d}$. Then $\pmb{v} \cdot \pmb{w} = \pmb{v}'\pmb{w} = \sum_{i=1}^{d} v_{i}w_{i}$. The result of an inner product between two vectors results in a scalar, and the dimensions of the two vectors must be equal.

$$\pmb{v}' \pmb{w} = \begin{pmatrix} v_{1}  & ... & v_{d} \end{pmatrix}  \begin{pmatrix} w_{1} \\ \vdots \\ w_{d} \end{pmatrix} = v_{1}  w_{1} + v_{2}  w_{2} + ... + v_{d} w_{d}$$

Outer product: Let $\pmb{v} \in \mathbb{R}^{d}$ and $\pmb{w} \in \mathbb{R}^{d}$. Then the outer product is $\pmb{v}\pmb{w}'$, which is a $d \times d$ matrix whose entry on the $i$th row and $j$th column is $(\pmb{v}\pmb{w}')_{ij} = v_{i}w_{j}$.


$$\pmb{v}\pmb{w}' = \begin{pmatrix} v_{1} \\ \vdots \\ v_{d} \end{pmatrix} \begin{pmatrix} w_{1} & ... & w_{d} \end{pmatrix} =\begin{pmatrix} v_{1} w_{1}  & v_{1} w_{2}&... & v_{1} w_{d} \\ v_{2} w_{1} &  v_{2} w_{2} & ... & v_{2} w_{d} \\ \vdots &  \vdots & \ddots & \vdots \\ v_{d} w_{1} &  v_{d} w_{2} & ... & v_{d} w_{d}  \end{pmatrix} $$

### Norm

The norm of a vector is the inner product of a vector $\pmb{v}$ and itself. $\| \pmb{v} \|_{2} = \sqrt{\sum_{i=1}^{d}v_{i}^{2}}$


## Basic Operations for Matrices 

Let $c$ be a scalar. 

### Addition

Let $A$ be a $n \times m$ matrix and $B$ be a $n \times m$ matrix. Then

$$A+B = 
\begin{pmatrix} 
a_{11} & a_{12} & ... & a_{1m} \\ 
a_{21} & a_{22} & ... & a_{2m} \\ 
\vdots &\vdots  & \ddots & \vdots  \\ 
a_{n1} & a_{n2} & ... & a_{nm} \end{pmatrix} + 
\begin{pmatrix} 
b_{11} & b_{12} & ... & b_{1m} \\ 
b_{21} & b_{22} & ... & b_{2m} \\ 
\vdots &\vdots  & \ddots & \vdots  \\ 
b_{n1} & b_{n2} & ... & b_{nm} \end{pmatrix}
 = 
\begin{pmatrix} 
a_{11} + b_{11} & a_{12} + b_{12} & ... & a_{1m} + b_{1m} \\ 
a_{21} + b_{21} & a_{22} + b_{22} & ... & a_{2m} + b_{2m} \\ 
\vdots & \vdots  & \ddots & \vdots  \\ 
a_{n1} + b_{n1} & a_{n2} + b_{n2} & ... & a_{nm}+ b_{nm} \end{pmatrix}$$

### Scalar multiplication

$$cA = \begin{pmatrix} 
ca_{11} & ca_{12} & ... & ca_{1m} \\ 
ca_{21} & ca_{22} & ... & ca_{2m} \\ 
\vdots &\vdots  & \ddots & \vdots  \\ 
ca_{n1} & ca_{n2} & ... & ca_{nm} \end{pmatrix}$$

### Transpose

The transpose of $A$, denoted as $A'$ is:

$$A' = 
\begin{pmatrix} 
a_{11} & a_{12} & ... & a_{n1} \\ 
a_{12} & a_{22} & ... & a_{n2} \\ 
\vdots &\vdots  & \ddots & \vdots  \\ 
a_{1m} & a_{2m} & ... & a_{nm} \end{pmatrix}$$

### Multiplication

Multiplying a matrix and a vector:

$$A \pmb{v} = \begin{pmatrix} 
a_{11} & a_{12} & ... & a_{1m} \\ 
a_{21} & a_{22} & ... & a_{2m} \\ 
\vdots &\vdots  & \ddots & \vdots  \\ 
a_{n1} & a_{n2} & ... & a_{nm} \end{pmatrix} \begin{pmatrix} v_{1} \\ \vdots \\ v_{d} \end{pmatrix}
= \begin{pmatrix} 
a_{11} v_{1} + a_{12} v_{2} + ... + a_{1m} v_{d}\\ 
a_{21} v_{1} + a_{22} v_{2} + ... + a_{2m} v_{d}\\ 
\vdots  \\ 
a_{n1} v_{1} + a_{n2} v_{2} + ... + a_{nm} v_{d}\end{pmatrix}$$

Multiplying a matrix and another matrix:

Let $A$ be a $n \times m$ matrix and $B$ be a $m \times p$ matrix.

$$AB = 
\begin{pmatrix} 
a_{11} & a_{12} & ... & a_{1m} \\ 
a_{21} & a_{22} & ... & a_{2m} \\ 
\vdots &\vdots  & \ddots & \vdots  \\ 
a_{n1} & a_{n2} & ... & a_{nm} \end{pmatrix} 
\begin{pmatrix} 
b_{11} & b_{12} & ... & b_{1} \\ 
b_{21} & b_{22} & ... & b_{2p} \\ 
\vdots &\vdots  & \ddots & \vdots \\ 
b_{m1} & b_{m2} & ... & b_{mp} \end{pmatrix}= 
\begin{pmatrix} 
\sum_{k=1}^{d} a_{1k} b_{k1} & \sum_{k=1}^{d} a_{1k} b_{k2} & ... & \sum_{k=1}^{d} a_{1k} b_{km} \\ 
\sum_{k=1}^{d} a_{2k} b_{k1} & \sum_{k=1}^{d} a_{2k} b_{k2} & ... & \sum_{k=1}^{d} a_{2k} b_{km} \\ 
\vdots &\vdots  & \ddots & \vdots  \\ 
\sum_{k=1}^{d} a_{nk} b_{k1} & \sum_{k=1}^{d} a_{nk} b_{k2}& ... &  \sum_{k=1}^{d} a_{nk} b_{kp} \end{pmatrix} 
$$

Also, matrix multiplication is distributive but not commutative. 

### Inverse

If $A \in \mathbb{R}^{d \times d}$, $A^{-1}$ is a $d \times d$ matrix such that $AA^{-1} = A^{-1} A=I_{d}$.

Let $A, B$ be invertible matrices. Then the product of two invertible matrices is also invertible and

$$(AB)^{-1} = B^{-1}A^{-1}$$

### Inverting a $2 \times 2$ matrix: 

Let $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$.
Then, we have that $A^{-1} = \frac{1}{ad - bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$.


### Special Matrices

 - **Square matrices:** If the number of rows equals the number of columns, then the matrix is square

 - **Idempotent matrices:** If $A^{2} = A$, then $A$ is idempotent.

 - **Symmetric matrices:** If $A^{T} = A$, then $A$ is symmetric.

 - **Diagonal matrices:** If $A_{ij} = 0$ whenever $i \neq j$ (off the diagonal from upper left to bottom right), then $A$ is diagonal.

 - **Identity matrices:** A special square, symmetric, and diagonal matrix is the identity matrix, or a matrix whose only nonzero entries are 1's along the diagonal.
 
 - **Positive definite matrices:** Let $A$ be a $d\times d$ symmetric matrix such that $\pmb{v}' A {\pmb v} > 0$ for all nonzero $\pmb{ v} \in {\mathbb R}^{d}$. Then $A$ is positive definite.

 - **Positive semi-definite matrices:** Let $A$ be a $d\times d$ symmetric matrix such that $\pmb{v}' A {\pmb v} \geq 0$ for all nonzero $\pmb{ v} \in {\mathbb R}^{d}$. Then $A$ is positive semi-definite.
 
 - **Negative definite matrices:** Let $A$ be a $d\times d$ symmetric matrix such that $\pmb{v}' A {\pmb v} < 0$ for all nonzero $\pmb{ v} \in {\mathbb R}^{d}$. Then $A$ is negative definite.

 - **Negative semi-definite matrices:** Let $A$ be a $d\times d$ symmetric matrix such that $\pmb{v}' A {\pmb v} \leq 0$ for all nonzero $\pmb{ v} \in {\mathbb R}^{d}$. Then $A$ is negative semi-definite.
 
### Eigenvalues and eigenvectors

If $A \pmb v = c \pmb v$ where $A$ is a $n \times d$ matrix and $\pmb v$ is a nonzero vector in $\mathbb{R}^{d}$, then $c$ is an eigenvalue of $A$ with corresponding eigenvector $\pmb v$.

Note: The space of eigenvectors corresponding to eigenvalue $0$ are in the null space or kernel of the matrix $A$.

### Linear Algebra Exercises

Find the eigenvalues and eigenvectors of $X$ where
$$X = \begin{pmatrix} -1 & 4 \\ -2 & 5\end{pmatrix}$$

**Answer:**

$$X - \lambda I = \begin{pmatrix} -1-\lambda & 4 \\ -2 & 5 -\lambda \end{pmatrix}$$

$$det( X - \lambda I ) = ( -1-\lambda )( 5 -\lambda ) + 8 = -5 - 4\lambda + \lambda^{2} + 8 = \lambda^{2} -4\lambda +3 = (\lambda -3)(\lambda - 1)$$

Then, the eigenvalues are $3$ and $1$.

To find our eigenvectors, $v_{1}, v_{2}$, we use the fact that $Xv_{1} = 3v_{1}$ and $X v_{2} = 1 v_{2}$

$$\begin{pmatrix} -1 & 4 \\ -2 & 5\end{pmatrix} \begin{pmatrix} v_{11}  \\ v_{12}\end{pmatrix} = \begin{pmatrix} 3v_{11}  \\ 3v_{12}\end{pmatrix}$$

$$\begin{pmatrix} -1 & 4 \\ -2 & 5\end{pmatrix} \begin{pmatrix} v_{21}  \\ v_{22}\end{pmatrix} = \begin{pmatrix} v_{21}  \\ v_{22}\end{pmatrix}$$



# Linear Algebra in Statistics

We will extend the notion of a random variable into random vectors. The random vector is a vectors where each entry of the vector is a random variable, and a random matrix is a matrix where each entry of the matrix is a random variable. Just like how we can find the expectation of a random variable, we can find the expectation of a random vector.

Let $X$ be a random vector with dimension $d$ ($X \in \mathbb{R}^{d}$). In other words, let $X = \begin{pmatrix} X_{1} \\ \vdots \\ X_{d} \\ \end{pmatrix}$

Then

$$\mathbb{E}[X] = \mathbb{E}  \begin{bmatrix} X_{1} \\ \vdots \\ X_{d} \\ \end{bmatrix}  = \begin{pmatrix} \mathbb{E}[X_{1}] \\ \vdots \\ \mathbb{E}[X_{d}]  \end{pmatrix}$$

Furthermore, the expectation of a random matrix $X \in \mathbb{R}^{d \times d}$ as
the matrix whose entries are the expectations of the entries of $X$, and the $ij$-th entry would be $(\mathbb{E}[X])_{ij} = \mathbb{E}[X_{ij}]$.


We can also define a covariance of a random vector. Let $X \in \mathbb{R}^{d}$. Then $Cov(X) = Cov(X,X) = \mathbb{E}[(X- \mathbb{E}(X) )(X- \mathbb{E}(X) )']$. We can use matrix multiplication and express this in terms of the individual components of $X$ as follows:

$$Var(X) = Cov(X, X) = \begin{bmatrix} \mathbb{E}(X_{1} - \mathbb{E}(X_{1}))^{2} & \dots & \mathbb{E}(X_{1} - \mathbb{E}(X_{1}) )(X_{d} - \mathbb{E}(X_{d})) \\
\dots & \dots & \dots \\ \mathbb{E}(X_{d} - \mathbb{E}(X_{d}))(X_{1} - \mathbb{E}(X_{1})) & \dots & \mathbb{E}(X_{d} - \mathbb{E}(X_{d}))^{2} \end{bmatrix}$$

Furthermore, consider $X \in \mathbb{R}^{d}$ and $Y \in \mathbb{R}^{p}$. Then $Cov(X, Y) = \mathbb{E}[(X- \mathbb{E}(X) )(Y- \mathbb{E}(Y) )']$.


---

Let $A$ be a $n \times d$ constant matrix and $X$ be a $d$-dimensional random vector. Then $E[AX]= AE[X]$.

Furthermore, $Cov(AX) = A Cov(X) A'$


---

## Multiple Linear Regression

In simple linear regression, we predict an outcome variable from one predictor variable. However, in multiple linear regression, we predict an outcome variable from several predictor variables. In this set up, we assume that each observation's response variable can be found by some linear combination of many predictors, plus an error term. In particular, we believe that for the $i$-th observation in our sample, $y_{i} = \beta_{0} + \beta_{1}x_{i, 1} + \dots + \beta_{d}x_{i, d} + \epsilon_{i}$ where $y_{i}$ is the $i$th observation's value of the response variable, $x_{i, 1}, \dots, x_{i, d}$ are the $i$-th observation's predictor variables, and $\epsilon_{i}$ is the error.


Now, assume we have $n$ observations in our sample, and $d$ variables of interest. We can represent our problem using matrix notation: 

$$\pmb{Y} = X\pmb{\beta} + \pmb{\epsilon}$$

where:

$$\pmb{Y} = \begin{bmatrix} y_{1} \\ \dots \\ y_{n} \end{bmatrix}, X = \begin{bmatrix} 1 & x_{1,1} & \dots & x_{1,d} \\ 1 & x_{2,1} & \dots & x_{2,d} \\ \dots & \dots & \dots & \dots \\ 1 & x_{n,1} & \dots & x_{n,d} \end{bmatrix}, \pmb{\beta} = \begin{bmatrix} \beta_{0} \\ \dots \\ \beta_{d} \end{bmatrix} \pmb{\epsilon} = \begin{bmatrix} \epsilon_{1} \\ \dots \\ \epsilon_{n} \end{bmatrix}$$
    
Since we have $d$ predictors, we want to find $\beta_{0}, \beta_{1}, ..., \beta_{d}$ such that we minimize some cost function. Let this be the sum of squared errors for convenience. Then we can write the sum as: $\sum_{i=1}^{n}(y_{i} - (\beta_{0} + \beta_{1}x_{i,1} + \dots + \beta_{d}x_{i,d}))^{2} = \| \pmb{Y} - X\pmb{\beta} \|_{2}^{2}$.

We can use matrix calculus to analytically solve for $\hat \beta_{0}, \dots, \hat{\beta}_{d}$:

$$\arg \max_{\pmb \beta}(\pmb{Y} - X\pmb{\beta})'(\pmb{Y} - X\pmb{\beta}) = \arg \max_{\pmb \beta} [\pmb{Y}'\pmb{Y} - 2 \pmb{Y}' X \pmb{\beta} + \pmb{\beta}' X' X \pmb{\beta}]$$

Taking the derivative with respect to $\beta$, we get the normal equations: $X' \pmb Y = X'X \pmb{\beta}$

Then, if $X$ is full rank, then $(X'X)^{-1}$ is invertible and $$\hat{\pmb{\beta}} = (X'X)^{-1}X'\pmb{Y}$$


To conduct inference in the context of (multiple) linear regression, we assume the following: 

1. There is a linear relationship between the repsonse and predictors ($Y = X\beta + \epsilon$)

2. The $\epsilon_{i}$ in $\epsilon$ are IID with $E[\epsilon_{i}]$ and $Var(\epsilon_{i}) = \sigma^{2}$. 

3. The errors $\epsilon_{i}$ are IID $N(0, \sigma^{2})$ (homoskdasticity).

Actually, the homoskedasticity is not necessary to do inference, but it makes our calculations much easier. When heteroskedasticity is present, our usual t-tests are not valid. First, let's examine how we would do inference in the multiple regression homoskedastic case.

Recall that the $\pmb Y$ and $\pmb \beta$ in the multiple regression case are vectors while $X$ is a matrix. We have $n$ observations and $p$ predictors and we include the intercept. Quick check: what are the dimensions of $\pmb Y, \pmb \beta, X$?

We want to find the following: $Cov(\hat{\pmb{\beta}} ) = Cov(\hat{\pmb{\beta}} , \hat{\pmb{\beta}})$.

As a quick check, under this setup, what is the $Cov( \pmb {\beta})$? It's the $0$ vector since we are $\pmb \beta$ are the true values/coefficients. Notice that the $\hat{\pmb{\beta}}$ is a random vector since it is a function of $\pmb Y$, so $Cov(\hat{\pmb{\beta}} )$ is not necessarily $0$.

$$Cov({\hat \beta}) = Cov((X' X)^{-1} X' Y) = [(X' X)^{-1} X'] Cov(Y) [(X' X)^{-1} X']'$$

Notice that $Cov(Y) = Cov(X\beta + \epsilon) = \sigma^{2} I$. 

$$= [(X' X)^{-1} X'] \sigma^{2} I X (X' X)^{-1} =\sigma^{2} (X' X)^{-1} X' X (X' X)^{-1} =\sigma^{2} (X' X)^{-1}$$



$$Cov(\hat{\pmb{\beta}} ) =Cov \begin{pmatrix} {\hat \beta}_{0} \\ {\hat \beta}_{1} \\  \vdots \\ {\hat \beta}_{p} \end{pmatrix} = \begin{pmatrix} Var({\hat \beta}_{0}) & Cov({\hat \beta}_{0}, {\hat \beta}_{1}) & ... & Cov({\hat \beta}_{0}, {\hat \beta}_{p}) \\ Cov({\hat \beta}_{1}, {\hat \beta}_{0}) & Var({\hat \beta}_{1}) & ... & Cov({\hat \beta}_{1}, {\hat \beta}_{p}) \\  \vdots & ... & \ddots & \vdots \\ Cov( {\hat \beta}_{p}, {\hat \beta}_{0}) & ... & ... & Var({\hat \beta}_{p} ) \end{pmatrix}$$

This tells us that the diagonal values this matrix ($\sigma^{2} (X' X)^{-1}$) contain the $Var({\hat \beta}_{j})$ for $j \in \{0, 1,..., p\}$.

Since we do not know $\sigma^{2}$, we have to estimate it with ${\hat \sigma}^{2} =\frac{ \sum_{i=1}^{n} r_{i}}{n-p}$ where $r_{i} = y_{i} - x_{i}'\beta = y_{i} - ({\hat \beta}_{0} + {\hat \beta}_{1}x_{i1} + ... + {\hat \beta}_{p} x_{ip})$ (the residuals).

Then, since we are estimating the $Cov(\hat{\pmb{\beta}} )$, we will denote it with ${\widehat {Cov}}(\hat{\pmb{\beta}} )$.

Then, we have

$${\widehat {Cov}}(\hat{\pmb{\beta}} ) = \begin{pmatrix} {\widehat {Var}}({\hat \beta}_{0}) & {\widehat {Cov}}({\hat \beta}_{0}, {\hat \beta}_{1}) & ... & {\widehat {Cov}}({\hat \beta}_{0}, {\hat \beta}_{p}) \\ {\widehat {Cov}}({\hat \beta}_{1}, {\hat \beta}_{0}) & {\widehat {Var}}({\hat \beta}_{1}) & ... & {\widehat {Cov}}({\hat \beta}_{1}, {\hat \beta}_{p}) \\  \vdots & ... & \ddots & \vdots \\ {\widehat {Cov}}( {\hat \beta}_{p}, {\hat \beta}_{0}) & ... & ... & {\widehat {Var}}({\hat \beta}_{p} ) \end{pmatrix}$$



### Exercises

1. Show that $Cov(AX) = A Cov(X) A'$:

**Answer:**

$$Cov(AX) = { \mathbb E}[(AX-\mathbb E[AX](AX-\mathbb E[AX]))'] = \mathbb E[A(X-\mathbb E[X]) (X-\mathbb E[X])'A' ]= A \mathbb{E}[(X- \mathbb{E}(X) )(X- \mathbb{E}(X) )'] A' = A Cov(X) A'$$


2. Heteroskedasticity

Now, consider the following model:

$$y_{i} = \beta_{1}+ \beta_{2} x_{i2} + ...+\beta_{d} x_{id} + \epsilon_{i} ~~~~ \forall ~ i \in \{1,..., n\}$$

Notice that $\epsilon_{i} \overset{iid}{\sim} N(0, i)$.


Under this model, what is $Cov(\hat{\pmb{\beta}} )$?


**Answer:**


$$Cov(\hat{\pmb{\beta}} ) = Cov((X^{T}X)^{-1} X^{T} Y ) = (X^{T}X)^{-1} X^{T} Cov(Y)  X^{T} (X^{T}X)^{-1}$$

Note: There are lots of ways to deal with heteroskedasticity, including transforming our outcome variable and weighted regression. Another common way to address heteroskedasticty is to use heteroskedasticity-consistent (HC) standard errors, also known as the Eicker-Huber-White standard errors.


3. Consider the random vector $X = \begin{pmatrix} X_{1} \\ X_{2} \\ X_{3} \\ X_{4} \\ X_{5} \end{pmatrix}$ with mean vector and variance-covariance matrix:

$$\mu = \begin{bmatrix} 2 \\ 4 \\ -1 \\ 3 \\ 0 \end{bmatrix}$$

$$\Sigma = \begin{bmatrix} 4, & -1 & 1/2 & -1/2 & 0 \\ -1 & 3 & 1 & -1 & 0 \\ 1/2 & 1 & 6 & 1 & -4 \\ -1/2 & -1 & 1 & 4 & 0 \\ 0 & 0 & -4 & 0 & 2\end{bmatrix}$$

Partition $X$ as $X^{(1)} = \begin{bmatrix} X_{1} \\ X_{2} \end{bmatrix}$ and $X^{(2)} = \begin{bmatrix} X_{3} \\ X_{4} \\ X_{5} \end{bmatrix}$. Let $A = \begin{bmatrix} -1 & 1 \\ 1 & 1/2 \end{bmatrix}$ and $B = \begin{bmatrix} 1 & 1 & 1/2 \\ -2 & 1& -2 \end{bmatrix}$

a. Find ${\mathbb E} [X^{(1)}]$

**Answer:**

$${\mathbb E} [X^{(1)}] = \begin{bmatrix} 2 \\ 4 \end{bmatrix}$$

b. Find ${\mathbb E} [A X^{(1)}]$

**Answer:**

$${\mathbb E} [A X^{(1)}] = A{\mathbb E} [ X^{(1)}] = \begin{bmatrix} -1 & 1 \\ 1 & 1/2 \end{bmatrix} \begin{bmatrix} 2 \\ 4 \end{bmatrix}= \begin{bmatrix} 2 \\ 4 \end{bmatrix}$$

c. Find $Cov(X^{(1)})$

**Answer:**

We can notice that $Cov(X^{(1)}) = {\mathbb E}[ ( X^{(1)} - {\mathbb E}[X^{(1)}] )  ( X^{(1)} - {\mathbb E}[X^{(1)}] )']= {\mathbb E}\left(\begin{bmatrix} X_{1} - 2 \\ X_{2} - 4 \end{bmatrix} \begin{bmatrix} X_{1} - 2 & X_{2} - 4 \end{bmatrix} \right)= {\mathbb E}\begin{bmatrix} (X_{1} - 2)(X_{1} - 2) & (X_{1} - 2)(X_{2} - 4)  \\ (X_{2} - 4)(X_{1} - 2) &(X_{2} - 4)(X_{2} - 4) \end{bmatrix} = ...$

or

$Cov(X^{(1)}) = \begin{bmatrix} Cov(X_{1}, X_{1}) & Cov(X_{1}, X_{2}) \\ Cov(X_{2}, X_{1}) & Cov(X_{2}, X_{2})  \end{bmatrix}= \begin{bmatrix} 4 & -1 \\ -1 &  3  \end{bmatrix}$


d. Find $Cov(A X^{(1)})$

**Answer:**

$$Cov(A X^{(1)}) = A Cov(X^{(1)}) A' = \begin{bmatrix} -1 & 1 \\ 1 & 1/2 \end{bmatrix} \begin{bmatrix} 4 & -1 \\ -1 &  3  \end{bmatrix} \begin{bmatrix} -1 & 1 \\ 1 & 1/2 \end{bmatrix}' = \begin{bmatrix} 9 & -3 \\ -3 & 3.75 \end{bmatrix}$$


e. Find ${\mathbb E} (X^{(2)} )$

**Answer:**


$${\mathbb E} (X^{(2)} )= \begin{bmatrix} -1 \\ 3 \\ 0 \end{bmatrix}$$

f. Find ${\mathbb E} (B X^{(2)})$

**Answer:**

$${\mathbb E} (B X^{(2)}) = B{\mathbb E} ( X^{(2)}) = 
\begin{bmatrix} 1 & 1 & 1/2 \\ -2 & 1& -2 \end{bmatrix}
\begin{bmatrix} -1 \\ 3 \\ 0 \end{bmatrix} = \begin{bmatrix} 2 \\ 5 \end{bmatrix}$$


g. Find $Cov( X^{(2)} )$

**Answer:**

$$Cov( X^{(2)} )= \begin{bmatrix} 6 & 1 & -4 \\ 1 & 4 & 0 \\ -4 & 0 & 2 \end{bmatrix}$$


h. Find $Cov(B X^{(2)} )$

**Answer:**

$$BCov( X^{(2)} )B' = \begin{bmatrix} 1 & 1 & 1/2 \\ -2 & 1& -2 \end{bmatrix} \begin{bmatrix} 6 & 1 & -4 \\ 1 & 4 & 0 \\ -4 & 0 & 2 \end{bmatrix} \begin{bmatrix} 1 & 1 & 1/2 \\ -2 & 1& -2 \end{bmatrix}'=\begin{bmatrix} 8.5 & 1 \\ 1 & 0 \end{bmatrix}$$


i. Find $Cov(X^{(1)},  X^{(2)})$

**Answer:**

$$Cov(X^{(1)},  X^{(2)}) = {\mathbb E}[ (X^{(1)} -{\mathbb E}(X^{(1)}) ) (X^{(2)} -{\mathbb E}(X^{(2)}) )' ] = {\mathbb E}\left[ \begin{pmatrix} X_{1}-{\mathbb E}[X_{1}] \\ X_{2}- {\mathbb E}[X_{2}] \end{pmatrix} \begin{pmatrix} X_{3}- {\mathbb E}[X_{3}] & X_{4} - {\mathbb E}[X_{4}] & X_{5} - {\mathbb E}[X_{5}]\end{pmatrix} \right]$$
$$= {\mathbb E}\begin{bmatrix} Cov(X_{1}, X_{3}) & Cov(X_{1}, X_{4}) &Cov(X_{1}, X_{5}) \\ Cov(X_{2}, X_{3}) & Cov(X_{2}, X_{4}) &Cov(X_{2}, X_{5})  \end{bmatrix}=\begin{bmatrix} 1/2 & -1/2 & 0 \\ 1 & -1 & 0  \end{bmatrix}$$

j. Find $Cov( A X^{(1)}, B X^{(2)} )$

**Answer:**

$$Cov( A X^{(1)}, B X^{(2)} ) = A Cov( X^{(1)}, X^{(2)} )B' =
\begin{bmatrix} -1 & 1 \\ 1 & 1/2 \end{bmatrix}
\begin{bmatrix} 1/2 & -1/2 & 0 \\ 1 & -1 & 0  \end{bmatrix}
\begin{bmatrix} 1 & 1 & 1/2 \\ -2 & 1& -2 \end{bmatrix}'=
\begin{bmatrix} 0 & -1.5 \\ 0 & -3 \end{bmatrix}
$$
