# Matrix Calculas

The following is a summary of a [paper](https://explained.ai/matrix-calculus/index.html) suggested by Jeremy for an introduction to matrix calculas.

## Differentiation as an operator

`d/dx` can be thought of as an operator that maps a function `f(x)` (single input variable) to its derivate `f'(x)`. This results in more intuitive and easier handling of derivates.

## Vector Calculas

### Partial Derivatives

A function `f(x,y)` (multiple parameters) can be differentiated using the help of partial derivates.  
A partial derivate with respect to `x` will give the change in `f(x,y)` w.r.t `x` **keeping** `y` **constant**  
**NOTE:** Partial derivates are commonly called partials

In vector calculas, all partials of a function are organized in a vector format. This is called a **gradient**


In [None]:
" Gradient of a multivariate function "
# using for autograd
from torch import tensor

#initial values
x,y = tensor(1.0, requires_grad=True), tensor(1.0, requires_grad=True)

" f(x,y) = 3.y.x^2"
def f(x, y):
    return 3*x.pow(2)*y

def gradient(f, x, y):
    if x.grad is not None: x.grad.zero_()
    if y.grad is not None: y.grad.zero_()
    z = f(x, y)
    z.backward() # finds gradient
    # extract all scalars -> put in list -> make tensor
    return tensor([x.grad.item(), y.grad.item()])

gradient(f, x, y)

In [None]:
" A more general version of gradient which takes arbitrary number of arguments for a function " 
def gradient(f, *args):
    for arg in args: 
        if arg.grad is not None: arg.grad.zero_()
    z = f(*args)
    z.backward()
    # extract all scalars -> put in list -> make tensor
    return tensor([arg.grad.item() for arg in args], requires_grad=True)

gradient(f, x, y)

In [None]:
x

## Matrix Calculas

Calculating and managing  derivatives for multiple functions with multiple parameters shifts us from vector calculas to matrix calculas

For any function `f` with args `X=[x1, x2, x3]`, we can think of the arguments as a vector (or a 1D tensor in code). The convention is to use row vectors (single dimension tensor) as input of all functions.  
**NOTE:** In the paper, the input vectors like `X` are used as column vectors. We implement them as row vectors here.  

Finding partials of `f` w.r.t. `X` results in a gradient vector (here also row vector). Therefore changing the `gradient` function to take a tensor as input for `args` makes sense.

In [None]:
for elem in tensor([1.,2.], requires_grad=True):
    print(elem)

In [None]:
""" 
Take tensor as arg
f takes tensor now as argument
"""
def f(inp_t):
    assert inp_t.shape[0] == 2
    return 3*inp_t[0].pow(2)*inp_t[1]

def gradient(f, ip_tensor):
    if ip_tensor.grad is not None: ip_tensor.grad.zero_()
    z = f(ip_tensor)
    z.backward()
    return ip_tensor.grad

In [None]:
x, y = 2., 3.

gradient(f, tensor([x, y], requires_grad=True))

The partial derivative of a function w.r.t. a vector is of the same size as the size of the vector

## Matrix Calculas

### Jacobian Matrix

For a set of functions `F = [f1, f2...fn]`, we can treat them in turn as a vector of functions.  
Applying `F` to `X = [x1, x2...xn]`, we get:  
`Y = F(X)`  
Shape of `Y` is the same as the shape of `F`, which is a row vector.  
Gradient of `Y` w.r.t `X` is: `[ grad(f1), grad(f2)...grad(f3)] wrt x`  
This results in a matrix called Jacobian Matrix. Paper uses a Jacobian matrix of numerator type, we will use the denominator type matrix