# Positive Definiteness, Cholesky Decompositions, and Equation Solving

In today's lab, we will figure out how to test whether a matrix is positive definite, and if so, find its Cholesky decomposition. We will then compare equation solving using Cholesky to previous decompositions, like LU and QR.

## Pre-Lab

Go back to Labs 3 and 4, and review the row-reduction code we wrote there. Specifically, make sure you carefully review the `rowredpivot(A)` function you wrote for Lab 3's homework and the `LUSolve(L,U,P,v)` function from Lab 4's homework.

> ## Make a copy of this notebook (File menu -> Make a Copy...)

## Positive Definiteness

**Question 1** Write down what it means for a matrix to be positive definite.   

A matrix x is considered positive definite if x^TAx > 0 for all nonzero real x.
There are 4 ways to test if it is positive definite. 
If it is positive definite, we can write a Cholesky Decomposition for it: A = LL^T (lower and upper triangular factorization)  
Similar to least squares with A*A^T *x = 0

If it is positive definite:  
      It has to be symmetric (A = A^T)

$$u\left(t\right)\ =e^{At}u_0\$$


**Question 2** The first thing you should have written down above is that the matrix needs to be *symmetric*. Write a function called `isSym(A)` that will returns `True` if $A$ is symmetric and `False` otherwise. Note that often, matrices generated other than by hand will have small differences, so instead of checking whether two matrices are equal using `np.array_equal(A,B)`, we want to check if they're very close using `np.allclose(A,B)`.

In [265]:
#make sure to check that is square, and then you can check for the transpose.
import numpy as np
from Qiureferencefunctions import rowaddmult,fwdsub,backsub,LU,LUSolve
def isSym(A):
    rows, cols = np.shape(A)
    if rows!=cols:
        return False
    return np.allclose(A,A.T)


**Question 3** You probably used a transpose in your code above. Is that always necessary? What kind of matrices can you say immediately aren't symmetric without taking a transpose? Add code to your function above that checks this before taking transponses.

**Question 4** *Sylvester's Criterion* says that an $n\times n$ symmetric matrix is positive definite if and only if all its leading minors are positive. That is:
  * The determinant of its top-left $1\times 1$ corner is positive (the determinant of a number is just the number itself);
  * The determinant of its top-left $2\times 2$ corner is positive;
  * The determinant of its top-left $3\times 3$ corner is positive;
  * $\ldots$
  * The deterimant of its top-left $(n-1)\times(n-1)$ corner is positive;
  * The deterimant of its top-left $n\times n$ corner (i.e. the whole matrix) is positive.
  
  Write a function `isPosDef(A)` that checks this. Use your `LUdet(A)` code from Lab 9 to compute the determinants. Be sure to check that the matrix is symmetric before you start computing anything! Note also that once you find any determinant that isn't positive, you're done! You can test your code using the following positive definite matrix: $$\begin{bmatrix}4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix}$$

In [111]:
# The leading minor are upper left square minors of an nxn matrix
# we can also look up numpy's np.linalg.det; loop over all the minors to find their determinants
# and check if any are negative; also if it is not symmetric you should also exit immediately

def isPosDef(A):
    if isSym(A)==False:
        return False
    rows, cols = np.shape(A)
    for i in range (1,rows+1):
        if np.linalg.det(A[:i,:i])<0:
            return False
    return True
A = np.array([[4,12,-16],[12,37,-43],[-16,-43,98]])
print(isPosDef(A))

True


**Question 5** If $A$ is any $n\times n$ matrix with all entries less than 1, then the matrix $B=\frac12\left(A+A^T\right) + nI_n$ (where $I_n$ is the $n\times n$ identity matrix is positive definite. 
  1. Write a function `genPosDef(n)` to generate an $n\times n$ random matrix $A$ (using `np.random.random(n,n)`), and turn it into a positive definite matrix using this formula.<br><br>
  1. For $n=100$, use both a random matrix and a matrix generated from your code to test your function from Question 4 for accuracy and timing.<br><br>
  1. You should find that your code generally works much faster on the random matrix. Why should this be the case? (If it isn't, go back and read the last sentence in the previous question, and refactor your code.)

In [5]:
#genPosDef(n) generates a positive definite matrix. 
def genPosDef(n):
    A = np.random.random((n,n))
    I = np.eye(n)
    B = .5* (A+ A.T) + n*I
    return B
B = genPosDef(100)

#testing:
A = np.random.random((100,100))
print(isPosDef(A))
%timeit isPosDef(A)
B = genPosDef(100)
print(isPosDef(B))
%timeit isPosDef(B)

False
242 µs ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True
10.3 ms ± 563 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Speeding up our Test

Regardless of the method we use to compute determinants, checking up to $n$ determinants is slow. There is a much faster way to check for positive definiteness! Namely, all these determinants being positive is equivalent to all the *pivots* of a matrix being positive. That is, if the left-most non-zero elements in each row of a row-reduced matrix are all positive, the matrix that was row-reduced is positive definite.

**Question 6** Before we code this idea, we need to deal with a few issues:
1. One of the main operations in row-reduction is swapping rows. When we do this, what happens to the determinant of the matrix? Why is that a problem? What can we do to reverse it?<br><br>
1. Do we need to carry out the row-reduction completely? Why or why not?<br><br>
1. To get RREF, we divided each row by its pivot. Why is this step unnecessary here?

**Question 7** 
1. By modifying your `rowredpivot(A)` function from Lab 2's homework, write a function named `isPosDefRowRed(A)` that tests whether a matrix is positive definite by row-reducing and checking each of its pivots. Be sure to incorporate all the elements of the previous question.<br><br>
1. Test your code using the positive definite matrix from Question 4.<br><br>
1. Repeat your timing tests from Question 5 using your new code. How does it compare?

In [84]:
#remember to multiply det by negative 1 if there is a row swap when we put it in rref; 
#if two different pivots have different signs at the beginning, we can quit the rref.
# we don't need to get to rref. 

def isPosDefRowRed(A):
    if isSym(A)==False:
        return False
    rows, cols = np.shape(A)
    copy = A.copy()
    pivotcol = 0
    pivotrow = 0
    i = 1
    negcount = 0;
    while((pivotcol<cols) & (pivotrow<rows)):
        while(i<rows):
            maxe = np.argmax(abs(copy[:,pivotcol]))
            if (maxe > pivotrow):
                copyrow = (copy[pivotrow]).copy();
                copy[pivotrow] = (copy[maxe]).copy();
                copy[maxe] = copyrow;
                negcount+=1;
            rowaddmult(copy,pivotrow,i,((-1*copy[i,pivotcol])/(copy[pivotrow,pivotcol])))
            i+=1            
        pivotcol+=1
        pivotrow+=1
        i = pivotrow+1
    numneg = 0;
    for i in range (0,rows):
        if copy[i,i]<0:
            numneg+=1;
    return numneg%2 ==negcount%2
A = np.random.random((5,5))
print(isPosDef(A))
#%timeit isPosDef(A)
B = genPosDef(2)
print(isPosDef(B))
#%timeit isPosDef(B)
print(isPosDefRowRed(A))

print(isPosDefRowRed(B))
#%timeit isPosDefRowRed(A)


False
True
False
True


## Cholesky Decomposition

If a matrix is positive definite, then we can decompose it as $A=LL^T$, where $L$ is lower-triangular (making $L^T$ upper-triangular). We will see that this a particularly efficient way of solving set of simultaneous equations $Ax=b$ when we know $A$ is positive definite.

**Question 8** Suppose that $A$ is a positive definite $4\times 4$ matrix with a Cholesky decomposition $LL^T$: 

$$A = \begin{bmatrix} a_{11} & a_{21} & a_{31} & a_{41} \\ a_{21} & a_{22} & a_{32} & a_{42} \\ a_{31} & a_{32} & a_{33} & a_{43} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} = \begin{bmatrix}L_{11} & 0 & 0 & 0 \\ L_{21} & L_{22} & 0 & 0 \\ L_{31} & L_{32} & L_{33} & 0 \\ L_{41} & L_{42} & L_{43} & L_{44} \end{bmatrix}\begin{bmatrix}L_{11} & L_{21} & L_{31} & L_{41} \\ 0 & L_{22} & L_{32} & L_{42} \\ 0 & 0 & L_{33} & L_{43} \\ 0 & 0 & 0 & L_{44} \end{bmatrix}$$

1. By writing out the matrix multiplication on the right of this equation, find formulas for each of the entries in $A$ in terms of the entries in $L$:
 * $a_{11}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$ <br><br>
 * $a_{21}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
 * $a_{31}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
 * $a_{22}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
 * $a_{32}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
 * $a_{33}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
 * $a_{41}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
 * $a_{42}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
 * $a_{43}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
 * $a_{44}=\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_$<br><br>
1. Of course, we already know the entries of $A$. What we really want is a way to find the entries of $L$. Solve each of your equations above for the corresponding entry of $L$. For example, solve the second equation for $L_{21}$.<br><br>

1. Notice that you can compute $L_{11}$ directly from $a_{11}$. Next, note that if you know $L_{11}$, you can compute $L_{21}$, and if you know $L_{21}$, you can compute $L_{22}$, and so on. Complete the following equations:<br><br>$$L_{ij} = \frac{1}{L_{jj}} \left(a_{ij} - \displaystyle\sum_{k=1}^{j=1} L_{ik} L_{jk} \right)\mbox{ for } j>i$$ <br><br>$$L_{jj} = \displaystyle\sqrt{a_{jj} - \sum_{k=1}^{j=1}(L_{j,k})^{2}}$$<br><br>

1. Note that you can compute an entry of $L$ if you already know all entries to the left of it and above it. Use this to write a function `Chol(A)` that returns the lower-triangular matrix $L$ in the Cholesky decomposition $A=LL^T$. Test your code by decomposing the $3\times 3$ matrix from question 4, as well as some random positive definite matrices (as in Question 5).

**Note:** The sums in the expressions for the entries of $L$ can be written as dot products. Doing so will greatly speed up your code.


In [260]:
#go across the row left to right, top to bottom, to find all the values of L.
#note that the sums are each dot products of different slices of rows. 
def Chol(A):
    rows, cols = np.shape(A)
    L = np.zeros((rows,rows))
    x = 1
    for i in range (0,cols):
        for j in range (0,x):
            if i==j:
                L[i,j] = np.sqrt(A[i,j] - L[i,:]@L[i,:])
            elif i!=j:
                L[i,j] = (A[i,j]-(L[i,:]@(L[j,:]).T))/(L[j,j])
        x+=1;
    return(L);
                
A = np.array([[4,12,-16],[12,37,-43],[-16,-43,98]])
B = genPosDef(6)
#print(A)
print(B)
L = Chol(B)
Lt= L.T
print(L)
print(L@Lt)
###THIS IS NOT CORRECT

[[6.14171799 0.42868411 0.20376974 0.41297851 0.78187976 0.18664375]
 [0.42868411 6.60802676 0.36847799 0.65008422 0.26623188 0.05204771]
 [0.20376974 0.36847799 6.61990503 0.43372838 0.36267301 0.56637653]
 [0.41297851 0.65008422 0.43372838 6.78754122 0.48817082 0.45067052]
 [0.78187976 0.26623188 0.36267301 0.48817082 6.02357875 0.5842287 ]
 [0.18664375 0.05204771 0.56637653 0.45067052 0.5842287  6.92995827]]
[[2.47824898 0.         0.         0.         0.         0.        ]
 [0.17297863 2.5647817  0.         0.         0.         0.        ]
 [0.08222327 0.13812291 2.56789143 0.         0.         0.        ]
 [0.16664125 0.24222679 0.15053967 2.584267   0.         0.        ]
 [0.31549686 0.08252463 0.12669277 0.1534416  2.42438338 0.        ]
 [0.07531275 0.01521386 0.21733111 0.15544762 0.20946598 2.6093643 ]]
[[6.14171799 0.42868411 0.20376974 0.41297851 0.78187976 0.18664375]
 [0.42868411 6.60802676 0.36847799 0.65008422 0.26623188 0.05204771]
 [0.20376974 0.36847799 6.619905

### Cholesky Test for Positive Definitness

**Question 9** Try to run your Cholesky function on a matrix that you know is not positive definite. You should get a warning and a nonsensical answer. By looking at your code, or your formulas from the previous question, can you tell where something went wrong?

**Question 10** In fact, a matrix is positive definite if and only if it has a Cholesky decomposition with no non-zero entries on the diagonal. Make a small modification to your code to raise a *ValueError* if the step you identified in Question 9 fails. The following code will raise a *ValueError*: 
```python
raise ValueError('Matrix is not Positive Definite!')
```

**Question 11** Explain why the following function tests whether a matrix is positive definite, then run timing tests to compare this to previous method of determining positive definiteness:
```python
def isPosDefChol(A):
    PosDef = True
    
    try:
        Chol(A)
    except:
        PosDef = False
        
    return PosDef
```

In [288]:
#If at any point Ljj = 0, then we know it's not positive definite; (if a diagonal entry = 0 the method must fail)
#cholesky is much faster than the other methods; if it is, you already compute this decomposition.
A = np.array([[4,12,-16],[12,37,-43],[-16,-43,8]]) #this is a non-positive definite matrix
#print(Chol(A)) #gets invalid # zeros
def Chol(A):
    rows, cols = np.shape(A)
    L = np.zeros((rows,rows))
    x = 1
    for i in range (0,cols):
        for j in range (0,x):
            if (i>0 and L[i-1,i-1]==0):
                    raise ValueError('Matrix is not Positive Definite!')
            if i==j:
                if (L[i,:]@L[i,:])>=A[i,j]:
                    raise ValueError('Matrix is not Positive Definite!')
                L[i,j] = np.sqrt(A[i,j] - (L[i,:]@L[j,:]))
            elif i!=j:
                
                L[i,j] = (A[i,j]-(L[i,:]@(L[j,:]).T))/(L[j,j])
        x+=1;
    
    return(L);

#print(Chol(A))

def isPosDefChol(A): #this works because if we don't get a valueerror while doing Chol(A), then we know the cholesky decomp exists
    PosDef = True
    try:
        Chol(A)
    except:
        PosDef = False
    return PosDef
A = np.random.random((5,5))
B = genPosDef(5)
print(A)
print(isPosDef(A))

print(isPosDefChol(A))
#%timeit isPosDefChol(A)

#print(isPosDefChol(B))
#%timeit isPosDefChol(B)

L = Chol(A)
Lt= L.T
#print(L)
print(L@Lt)

[[0.83723025 0.31171493 0.54522465 0.27949491 0.11742283]
 [0.37454834 0.97974014 0.81329792 0.27391087 0.96124852]
 [0.21576001 0.81287545 0.45899904 0.34177297 0.46977853]
 [0.93184471 0.57891904 0.66477344 0.85924974 0.52017873]
 [0.84756403 0.82596372 0.08209349 0.945837   0.26051055]]
False
False


ValueError: Matrix is not Positive Definite!

In [136]:
print(isPosDef(A))
%timeit isPosDef(A)

print(isPosDef(B))
%timeit isPosDef(B)

print(isPosDefRowRed(A))
%timeit isPosDefRowRed(A)

print(isPosDefRowRed(B))
%timeit isPosDefRowRed(B)

False
97.4 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True
151 µs ± 4.95 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
False
97.3 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True
139 µs ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## Solving Linear Equations using Cholesky Decomposition

We have seen (in Lab 4 and its homework) that we can solve a system of equations $Ax=v$ using LU decomposition. If $A$ is positive definite, we can also solve these equations by Cholesky decompostion in a similar way. While the necessity of being positive definite may seem onerous, many applications in fact have $A$ being as such. For example, in solving the normal equations for least squares ($A^TAv=A^Tv$), the matrix $A^TA$ is positive definite (as long as $A$ has full column rank).

**Question 12** Write a function called `CholSolve(A,v)` that uses the Cholesky decomposition to solve the equation $Ax=v$. Note that the Cholesky decomposition will fail (with a `ValueError`) if $A$ is not positive definite, so you don't need to build in a check for that. You can test your code with the following two three pairs of arrays and vectors. In all cases, your solutions should be pretty nice numbers:
```python
A = np.array([[ 28.,   8.,  -2.,   8.],
              [  8.,  31.,   4., -10.],
              [ -2.,   4.,  25.,  -8.],
              [  8., -10.,  -8.,  16.]])
               
v = np.array([ -40., -161., -112.,  66.])


A = np.array([[  97.,   -6.,   -8.,  -54.],
              [  -6.,   88.,  -66.,   -8.],
              [  -8.,  -66.,  187.,    6.],
              [ -54.,   -8.,    6.,  178.]]) 
              
v = np.array([ 193.,  -64., -677.,  374.])


A = np.array([[ 85.,  11.,   3.,  14.,   9.],
              [ 11.,  85.,  -3., -14.,  -9.],
              [  3.,  -3.,  69.,   2., -17.],
              [ 14., -14.,   2.,  68., -26.],
              [  9.,  -9., -17., -26.,  45.]])
v = np.array([-289.,  289., -279.,   42., -101.])
```

In [270]:
def CholSolve(A,v):
    L = Chol(A)
    Lt = L.T
    y = fwdsub(L,v)
    x = backsub(Lt,y)
    #print(L@Lt)
    return x
A = np.array([[ 28.,   8.,  -2.,   8.],
              [  8.,  31.,   4., -10.],
              [ -2.,   4.,  25.,  -8.],
              [  8., -10.,  -8.,  16.]])

v = np.array([ -40., -161., -112.,  66.])
print(CholSolve(A,v))


A = np.array([[  97.,   -6.,   -8.,  -54.],
              [  -6.,   88.,  -66.,   -8.],
              [  -8.,  -66.,  187.,    6.],
              [ -54.,   -8.,    6.,  178.]]) 

v = np.array([ 193.,  -64., -677.,  374.])


A = np.array([[ 85.,  11.,   3.,  14.,   9.],
              [ 11.,  85.,  -3., -14.,  -9.],
              [  3.,  -3.,  69.,   2., -17.],
              [ 14., -14.,   2.,  68., -26.],
              [  9.,  -9., -17., -26.,  45.]])
v = np.array([-289.,  289., -279.,   42., -101.])



[-0.60925875 -4.42040966 -3.61366768  0.64939837]


**Question 13** By using the `%timeit` magic function, compare the time it takes to solve the above sets of equations using your `LUSolve(L,U,P,v)` function vs. your new `CholSolve(A,v)` function. To make these comparable, use the wrapper function:
```python
def LUSolveDirect(A,v):
    L,U,P=LU(A)
    soln = LUSolve(L,U,P,v)
    return soln
```
Which method is quicker?

In [289]:
A = np.array([[ 85.,  11.,   3.,  14.,   9.],
              [ 11.,  85.,  -3., -14.,  -9.],
              [  3.,  -3.,  69.,   2., -17.],
              [ 14., -14.,   2.,  68., -26.],
              [  9.,  -9., -17., -26.,  45.]])
v = np.array([-289.,  289., -279.,   42., -101.])

#my LUSolve already is a wrapper function
%timeit LUSolveDirect(A,v)
%timeit CholSolve(A,v)
#CholSolve is 2x as fast

273 µs ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
236 µs ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
