# Lecture 7: Linear equations; Scalar, Vector and Matrix derivatives


Based on: Geoff Gordon's  [Matrices and Linear Operators lecture notes](https://qna.cs.cmu.edu/#/pages/view/184) and Zico Kolter's [Linear Algebra Review and Reference notes](http://www.cs.cmu.edu/~zkolter/course/linalg/linalg_notes.pdf).


Factorization such as $A=BC$ are useful if the matrices $B$ and $C$ have useful properties, such as being simple to invert. 

We saw 3 examples of factorizations:


#### LU decomposition 


LU composition factorizes a matrix into two triangular matrices, one lower triangular and one upper triangular (we consider square matrices):

$$
\text{example } A\in \mathbb{R^{4\times 4}}\\
~\\
A = LU\\
~\\
L = \left[
    \begin{array}{cccc}
    l_{11} & 0 &&0   \\
    l_{21} & l_{22} &0&0  \\
    l_{31} & l_{32} & l_{33}&0  \\
    l_{41} & l_{42} &l_{43}&l_{44}  
    \end{array}
    \right], ~ ~ ~
  U= \left[
    \begin{array}{cccc}
    u_{11}   & u_{12}  & u_{13}  & u_{14}    \\
    0 &u_{22}  & u_{23}  & u_{24}     \\
    0 &0& u_{33}  & u_{34}    \\
   0 &0&0 & u_{44}     
    \end{array}
    \right].
    $$
    
 If A is symmetric and positive definite, then the LU factorization can be simplified to:
 $$
 A = LL^\top\\,
 $$
 which is the Cholesky decomposition.
 
 
#### Eigenvalue decomposition 

A symmetric matrix can be decomposed into:
$$
A = U\Lambda U^T
$$

where $U$ is an orthogonal matrix where each column $i$ is the $i$'s eigenvector of $A$ and $\Lambda$ a diagonal matrix with diagonal entry $i$ corresponding to the $i$'s eigenvalue of A. The eigenvalues are sorted from the largest to the smallest. The rank of A is equal to the number of non-zero eigenvalues. The $i^{th}$ eigenvector and eigenvalue or $A$ are related by:

$$
Au_i = \lambda_iu_i \\
$$

#### Singular value decomposition 

Any matrix $A$ can be written as:
$$
A = USV^T
$$

Where the columns of $U$ are pairwise orthogonal, consisting of the left singular vectors of A, $S$ is a diagonal matrix with the singular values of $A$ as diagonal elements and $V$ is an orthogonal matrix with the right singular vectors of $A$ as columns. All singular values are $\ge0$, and the $i^{th}$ left singular vector, singular value and right singular vector are related by:

$$
Av_i = s_iu_i \\
\text{and}\\
A^\top u_i = s_iv_i \\
$$

If $A$ is a symmetric matrix, then there is a one-to-one correspondance between it's eigenvectors, left singular vectors and right singular vectors (up to sign flipping). The singular values are equal to the absolute value of the eigenvalues. The eigenvalue and singular value decomposition are very closely related in this case.
 


### Solving linear systems 

The matrix factorization methods allow us to find better ways to invert a matrix. We compare here the behavior of usual matrix inversion with matrix inversion using the factorization.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import multivariate_normal
# from scipy.linalg import toeplitz
import time

A = np.random.normal(size = (10,10))
# get symmetric matrix, (gram matrix)
A = A.T.dot(A)
# verify rank 
print(np.linalg.matrix_rank(A))

10


In [10]:
from numpy.linalg import inv, svd, cholesky, eig

inverse = inv(A)
u,s,vt = svd(A)
v= vt.T
s_inv = np.diag(1/s)
inverse_2 = v.dot(s_inv).dot(u.T)

print(inverse[:4,:4])

print(inverse_2[:4,:4])


[[ 46.08354502 -17.59423516 -36.09113964  28.80463522]
 [-17.59423516   6.97247579  13.99964044 -11.08290473]
 [-36.09113964  13.99964044  29.02334373 -22.85389976]
 [ 28.80463522 -11.08290473 -22.85389976  18.28457806]]
[[ 46.08354502 -17.59423516 -36.09113964  28.80463522]
 [-17.59423516   6.97247579  13.99964044 -11.08290473]
 [-36.09113964  13.99964044  29.02334373 -22.85389976]
 [ 28.80463522 -11.08290473 -22.85389976  18.28457806]]


### Pseudoinverse

If $A$ is not invertible, we can find an approximate solution to $Ax=B$ by setting:

$$
x = A^\dagger b\,
$$

where $ A^\dagger = VS^\dagger U^T $ and  $S^\dagger$   is a diagonal matrix with $S^{−1}_{ii}$ on the diagonal where possible, and zero where $S_{ii}=0$.

# Matrix differentials



### Quick review

The derivative of $x^3−2x^2$ is:

* a.   $x^2−2x$
* b.    $3x^2−4x$
* c.   $3x^2$
* d.    $x^2−2$


The derivative of $\sin(3x)+\cos(x)$ is

* a. $3\sin(x)+\sin(x)$
* b. $−3\sin(3x)+\cos(x)$
* c. $3\sin(3x)−\cos(x)$
* d. $3\cos(3x)−\sin(x)$


The derivative of $\cos(x^2+2)$ is

* a. $−2x\sin(x^2+2)$
* b. $−\sin(2x)$
* c. $\cos(2x)$
* d. $x^2+2$


True or false: the derivative of $3x^2\cos(x)$ is $−6x\sin(x)$.

* a.  True
* b.  False



### Example total and partial derivative:
[from wikipedia](https://en.wikipedia.org/wiki/Partial_derivative#Geometry)

The volume of a cone depends on it's height $h$ and on it's radius $r$:
$$
V(r, h) = \frac{\pi r^2 h}{3}.
$$

The partial derivative of the volume with respect to $r$ is:
$$
{\displaystyle {\frac {\partial V}{\partial r}}={\frac {2\pi rh}{3}}.}
$$


The total derivative of the volume with respect to $r$ is:
$$
{\displaystyle {\frac {dV}{dr}}=\overbrace {\frac {2\pi rh}{3}} ^{\frac {\partial V}{\partial r}}+\overbrace {\frac {\pi r^{2}}{3}} ^{\frac {\partial V}{\partial h}}{\frac {d h}{d 
r}}}
$$

Product rule example