# Notes on Terence Parr and Jeremy Howard's Matrix Calculus Course

---



The goal of these notes is to blend the math and the coding together so the concepts make sense, we understand the math, and I get stronger at going from code to math, and math to code

## Introduction

---



* The activation of a single unit in a neural net is done using dot product of an edge weight vector, **w**, with an input vector **x** plus a scalar bias (*b*)

* Also known as y = mx+b

![alt text](https://explained.ai/matrix-calculus/images/eqn-EEDCFA4252D0992243A283CE0EB777A6-depth003.31.svg)

Where n, i is how many layers we have. Can also be written as:

In [0]:
z = w * x + b

z is the unit's 'affine function' (linear function), is followed by a linear unit that clips negative values to zero.

![alt text](https://explained.ai/matrix-calculus/images/neuron.png =250x)


Each unit is a layer, where the activation of one unit is the input to the next layer's unit. 

* Training = choosing **w** and **b** so we get a desired output for all *n* inputs **x**

We need a loss function to 'grade' the final **activation(x)** with the **target(x) **for all inputs.

To minimize we use some variation on gradient decent. All require partial derivitive (gradient) of ***activation*** **(x)** with respect ot **w** and *b*.

Goal: Gradually tweak **w** and *b* so the loss gets close to zero.

We can derive the gradient by differentiating (getting the derivitive) the scalar version of a loss function, MSE

![alt text](https://explained.ai/matrix-calculus/images/blkeqn-A129949CD1EF7BE2CA8BD424D34F9930.svg)

In code:

The original:

In [0]:
def loss(N):
  y = 0
  for x in range(N):
    y += (targ(x) - actv(x))**2
  return y/N

The derivitive

In [0]:
def grad(N, w, b):
  y = 0
  for x in range(N):
    y += (targ(x) - actf(x))**2
  return y/N

def actf(x, w, b):
  y = 0
  for i in range(abs(x)):
    y += (w[i] * x[i] + b)
  return max(0, y)

Where **w** is a weight matrix, **x** is our input matrix, **b** is a bias

## Review: Scalar derivative rules

---

![alt text](https://i.imgur.com/VCZxHC9.png)





**Explaination:**

* **Power Rule**: Take the power, drop it in front (multiply it by the number in front), and reduce it by one

* **Sum Rule**: Take the derivative for f(x) and g(x), and sum them together

* **Difference Rule**:  Take the derivative for f(x) and g(x), and subtract them

* **Product Rule**: Given f(x) and g(x), the derivitive of f*g is the derivative of f(x), multiplied by g(x), summed with f(x) multiplied by the derivative of g(x)

     ![alt text](https://i.imgur.com/KVE8rHY.png)
     
* **Chain Rule**: Used for composite functions, we start from the inner-most function, and work our way out.

    ![alt_text](https://i.imgur.com/oTfTTpA.png)

## In Code:

### Power:

In [0]:
from sympy import *
import numpy as np

Symbol creates a variable of 'symbol' type

In [4]:
x = Symbol('x'); x

x

In [0]:
y = 3*x**2

In [0]:
yprime = y.diff(x)

In [8]:
yprime

6*x

In [0]:
def getDir(y):
  x = Symbol('x')
  return y.diff(x)

In [16]:
getDir(y=3*x**2)

6*x

In [0]:
w

### Product

In [0]:
fx = 3*x**2
gx = 4*x**2

In [0]:
x = Symbol('x')

In [21]:
(fx.diff(x) * gx) + (fx * gx.diff(x))

48*x**3

### Chain

In [0]:
x = Symbol('x')

In [0]:
fx = 3*x**2
gx = 4*x**2

In [38]:
fd = getDir(fx); fd

6*x

In [39]:
gd = getDir(gx); gd

8*x

In [0]:
def fx(x):
  return 3*x**2
def gx(x):
  return 4*x**2

def fd(x):
  return 6*x
def gd(x):
  return 8*x

In [47]:
fd(gx(2))*gd(2)

1536