# Linear Algebra

In this appendix we introduce some basic components of <a href="https://en.wikipedia.org/wiki/Linear_algebra">Linear algebra</a> that are encountered in the fields of machine learning, data analysis and computational statistics. Linear algebra and in particular 
<a href="https://en.wikipedia.org/wiki/Numerical_linear_algebra">computational linear algebra</a> plays
a key role in almost all the material that is discussed in these notes.
In computational linear algebra we are concerned with the solution of linear systems of equations.
We can express such a system in the form

$$A \mathbf{x} = \mathbf{b}$$

We will see under which conditions such a system is solvable. In any case, we when solve such a system
we are interested in 

- accuracy; the algorithm stability as well as how well-conditioned the problem is play a crucial role
- efficiency; in general we are interested in large systems of equations

In general, the system of equations given above, will have a unique solution if and only if 

$$det(A) \neq 0$$

This condition implies that $A$ has linearly independent rows/columns and that the matrix $A$ is invertible.
In this case the system has a unique solution given by

$$ \mathbf{x} = A^{-1}\mathbf{b}$$

Notice however that for the systems we are interested in, computing the matrix inverse i.e. $A^{-1}$ is either
computationally expensive or not feasible. So, although in theory we have a nice respresentation of the unique solution,
in practice this may not always be very useful. When 

$$det(A) = 0$$

the system may have infinite solutions or none. When $\mathbf{b} \in ~range(A)$
then the system has an inifinite number of solutions. Whereas when $\mathbf{b} \notin ~range(A)$
the system has no solutions.

## Matrix decomposition

Matrix decomposition, or matrix factorisation, is a way to somehow reduce a matrix into some simpler to use constiturent
component. Thus, in general, the aim of matrix decomposition is to simplify matrix operations.
There are many was to decompose a matrix but in this appendix we will look into the following three techniques


- LU factorisation
- QR factorisation
- Cholesky decomposition
- SVD factorisation
- Eigenvalue decomposition


### LU factorisation

When dealing with square matrices, <a href="https://en.wikipedia.org/wiki/LU_decomposition">LU factorisation</a> is an approach we can use to factor a matrix into
its upper and lower tringular matrices i.e.

\begin{equation}
A = LU
\end{equation}

where $L$ is the lower triangular matrix and $U$ the upper traingular.

For the matrices of interest in compuational statistics, LU decomposition is found using numerical methods.


#### LU factorisation with partial pivoting

These methods however can  fail when a matrix cannot be decomposed. Hence, numerical software, implement
LU decomposition with partial pivoting. In this case the matrix $A$ is decomposed into

\begin{equation}
A = LUP
\end{equation}

In this approach to LU decomposition, the rows of the matrix $A$ are re-ordered to simplify the decomposition process.
The $P$ matrix specifies a way to permute the result or return the result to the original order. 

### QR factorisation 

LU decomposition is suitable only for square matrices. Frequently however this may not be the case.
Consider for example a dataset $D$ with 10000 rows indicating the number of data points available and 10
columns indicating the number of features. <a hre="https://en.wikipedia.org/wiki/QR_decomposition"> QR factorisation</a>
can be used to decompose such matrices into constitutive components. In particular, the QR method will decompose
the matrix $A$ as


\begin{equation}
A=QR
\end{equation}

where $Q$ is an <a href="https://en.wikipedia.org/wiki/Orthogonal_matrix">orthonormal matrix</a> and $R$ 
an upper triangular matrix. If the matrix $A$ is invertible, then the factorization is unique if we require the diagonal elements of $R$ to be positive. The QR decomposition can be used to solve the linear least squares problem and is the basis for a particular eigenvalue algorithm, the QR algorithm [2].


### Cholesky decomposition

The <a href="https://en.wikipedia.org/wiki/Cholesky_decomposition">Cholesky decomposition</a> is suitable for matrices that are symmetric and positive definite i.e. all values are greater than zero. In this approach the matrix $A$ is decomposed into the product of a lower triangular matrix $L$ and its conjugate transpose i.e.


\begin{equation}
A=LL^T
\end{equation}


This method is  useful for efficient numerical solutions, e.g., Monte Carlo simulations. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

Note that the decomposition can also be written in terms of upper trinagular matrices i.e.

\begin{equation}
A=U^TU
\end{equation}

### Singular value decomposition

One of the most important matrix factorization techniques is 
the <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singular value decomposition</a> most often abbreviated as SVD. The reason why is so popular lies on the fact that it is the foundation for many other computational techniques. For example, just to name a few: 

- Computing pseudo-inverses
- Obtaining low-rank matrix approximations
- Dynamic mode decomposition
- Proper orthogonal ecomposition
- Principal components analysis

For a complex matrix $A \in \mathbb{C}^{n\times m}$, its SVD is

$$A = U\Sigma V^{*}$$

where $V^{*}$ is the complex conjugate transpose. Both $U$ and $V$ are <a href="https://en.wikipedia.org/wiki/Unitary_matrix">unitary matrices</a> that is the following holds 

$$UU^{*} = U^{*}U = I$$

In general, if a matrix $W$ is a real matrix i.e. its entries are real numbers, then $W^{*} = W^T$. Thus, if $A \in \mathbb{R}^{n \times m}$ the matrices $U$ and $V$ are real orthogonal matrices i.e. 

$$UU^{T} = U^{T}U = I$$

The matrix $\Sigma$ is a diagonal matrix with real and nonnegative entries on the diagonal. The entries $\Sigma_{ii}$ are called the singular values of $A$. The number of the non-zero singular values corresponds to the rank of the matrix $A$. Given the popularity of the SVD method, it is not surpsising that most linear algebra libraries provide a way to perform it.  The following script shows how to compute the SVD in Python using ```numpy```

In [1]:

import numpy as np
X = np.random.rand(10 , 10)
U, S, V = np.linalg.svd(X, full_matrices=True)
# or doing economy SVD
U, S, V = np.linalg.svd(X, full_matrices=False)


You can find the documentation at <a href="https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html">numpy.linalg.svd</a>.
SVD is a factorization of a real or complex matrix that generalizes the eigendecomposition of a square normal matrix to any $m \times n$ matrix via an extension of the polar decomposition.

### Eigenvalue  decomposition

Eigendecomposition is the factorization of a matrix $A$ into a canonical form. 
The matrix is represented in terms of its eigenvalues and eigenvectors. Eigenvalue decomposition is one of the most widedly used types of matrix decomposition. Recall that a vector $\mathbf{v}$ is an eigenvector of a matrix $A$ if it satisfies the following


$$A\mathbf{v} = \lambda \mathbf{v}$$

where $\lambda$ is an eigenvalue of $A$. What the equation above tells us is that the projection of the vector $\mathbf{v}$ is actually equal to the vector itself scaled by a scalar value. In other words, an eigenvector is in the direction as the vector $A\mathbf{v}$.
The eignevalue determines if the vector is stretched or shrunk or reversed or left unchanged.

We can represent the matrix $A$ as


$$A=Q\Lambda Q^T$$

where $Q$ is a matrix comprised of the eigenvectors of $A$ and $\Lambda$  is the diagonal matrix comprised of the eigenvalues.

Eigenvectors are unit vectors, which means that their length or magnitude is equal to 1.0.  
Eigenvalues are scalar values which when used to multiply an eigenvector, they specify its magnitude
For example, a negative eigenvalue may reverse the direction of the eigenvector as part of scaling it. A matrix
that has only positive eigenvalues is referred to as a positive definite matrix, whereas if the 
eigenvalues are all negative, it is referred to as a negative definite matrix.

## References

1. <a href="https://github.com/fastai/numerical-linear-algebra">Computational Linear Algebra for Coders</a> GitHub repository with material related to Computational linear algebra with a focus on Machine Learning.
2. <a href="https://en.wikipedia.org/wiki/QR_decomposition">QR factorisation</a>
3. <a href="https://en.wikipedia.org/wiki/Cholesky_decomposition">Cholesky decomposition</a>
4. <a href="https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix">Eigenvalue decomposition</a>