# Defining our first primitives

If you haven't already, take a look at a the `semiautograd.semiautograd` module.  It contains the fundamental building blocks for automatic differentiation.
 * Scalar - containers for computational results that chain together to form the computational graph
 * Function - differentiable functions and the chain rule
 * backward - execute backpropagation to compute the derivatives

The other important piece of an automatic differentiation system is a set of primitives.  Primatives are differentiable operations, and in `semiautograd` these are instances of `Function`.  Let's start by defining some simple primatives and looking at what happens when we chain them together.  The primatives defined here are also available from the `primatives` module.

In [1]:
from typing import Union, List, Any
from semiautograd import Scalar, Function, trace, backward, reset_grad
pi = 3.14159265359

Our first primative just multiplies the input by a scalar, 2.
The forward function is `lambda x: 2*x`.  Because the derivative of 2x w.r.t. x is 2, the backward function just returns [2].  Notice that forward returns a number, but backwards returns a list of numbers, one derivative w.r.t. each of the input arguments.

In [2]:
Double = Function("Double", lambda x: 2*x, lambda x: [2])

x = Scalar(3.14)
y = Double(x)
z = Double(y)

display([x,y,z])

[3.14, 6.28 = Double(3.14), 12.56 = Double(6.28)]

Here are the guts of a Scalar.  It has the following components
 * value - the actual numerical value
 * function - the function used to compute this scalar, this let's semiautograd know where to find the backward function.
 * kwargs - non-differentiated keyword arguments to the function
 * grad - a placeholder for the derivative
 * time - a naughty hack that make topologically sorting the computation graph easy to do.

An important note of convention here - semiautograd uses `*args` to pass differentiable arguments and `**kwargs` to pass non-differentiable arguments.  There's no special reason for that, it's just a convention.

In [3]:
display(z.__dict__)
print('')
print(f'{trace(z)=}')
print('backward(z)')
backward(z)
print(f'{trace(z)=}')

{'value': 12.56,
 'function': Double(),
 'parents': [6.28 = Double(3.14)],
 'kwargs': {},
 'grad': None,
 'time': 2}


trace(z)=[12.56 = Double(6.28), 6.28 = Double(3.14), 3.14]
backward(z)
trace(z)=[12.56 = Double(6.28) <grad=1>, 6.28 = Double(3.14) <grad=2>, 3.14 <grad=4>]


Ooooh, did you catch that computational graph?  `trace` returns a topological sort of the computational graph with parents to the right.  In order for backpropagation to work correctly compute the derivative at a given Scalar we need to first compute the derivatives at each it's children, so semiautograd backprops through the graph in this order.

The line `backward(z)` is what actually executes backpropagation, and you can see in the trace that follows it that the derivatives (grads) are now populated.  For example `3.14 <grad=4>` is the variable x we defined in cell two, showing that the derivative of z w.r.t. x is 4.

Let's define a couple of other useful primatives and test them out with an arbitrary computation.  Make sure you understand why each of the definitions works.  As as aside, the Function class lets you call the function with either Scalar arguments or numbers.  If you pass in a Scalar then you get back a Scalar, and if you pass in a number then you get back a number.  Mixed Scalar and number arguments are not supported, though.

In [4]:
Pow   = Function("Pow",   lambda x,p: x**p,        lambda x, p: [p*(x**(p-1))])

Plus  = Function("Plus",  lambda x,y: x+y,         lambda x,y: [1,1])

Sum   = Function("Sum",   lambda *args: sum(args), lambda *args: [1]*len(args))

Times = Function("Times", lambda x,y: x*y,         lambda x,y: [y, x])

Mod   = Function("Mod",   lambda x,m: x % m,       lambda x, m: [1])

Abs   = Function("Abs",   lambda x: abs(x),        lambda x: [-1 if x<0 else 1])


v = Scalar(3)     # 3
w = Double(v)     # 2*3
x = Times(w,v)    # 3*2*3
y = Pow(x,p=2)    # (3*2*3)**2
z = Sum(v,w,x,y)  # 3 + 2*3 + 3*2*3 + (3*2*3)**2 = 4 * v**4 + 2 * v**2 + 3*v
print(trace(z))
print(f'{3 + 2*3 + 3*2*3 + (3*2*3)**2=}')
backward(z)
print(f'{v.grad=}, {16 * (3**3) + 4*3 + 3=}')     #16 * v**3 + 4*v + 3 = 16 * 3**3 + 4*3 + 3

[351 = Sum(3,6,18,324), 324 = Pow(18,p=2), 18 = Times(6,3), 6 = Double(3), 3]
3 + 2*3 + 3*2*3 + (3*2*3)**2=351
v.grad=447, 16 * (3**3) + 4*3 + 3=447


 Now is a good time to think about using primatives versus creating primatives.  The next cell implements cosine and sine by computing their truncated Taylor Series with the primatives we created above.  This is an example of _using_ primatives to get some functionality.

In [5]:
def trigtaylor(x: Union[Scalar, float], t: Union[Scalar,float], ii: int):
    ''' Compute the taylor series for cos or sin

    Arguments:
        t -- The leading term in the taylor series
        ii -- The order of the leading term
    '''
    
    notscalar = not isinstance(x,Scalar)
    if notscalar:
        x = Scalar(x)
    t = t if isinstance(t,Scalar) else Scalar(t)
    eps = Scalar(1e-7)
    x = Times(Mod(Abs(x),m=2*pi), Scalar(-1 if x<0 else 1))
    xsqr = Pow(x,p=2)
    r = Scalar(0)
    while Abs(t) > eps:
        ii += 2
        r = Plus(r,t)
        # t * -1 * x**2 / (ii * (ii-1))
        t = Times(t, xsqr)
        t = Times(t, Scalar(-1/(ii*(ii-1)))) 
    r = Plus(r,t)
    if notscalar:
        r = r.value
    return r

def cos(x: Union[Scalar,float]):
    '''Taylor series approximation to cos(x)
    1 - x^2 / 2! + x^4 / 4! - x^8 / 8! + ...
    '''
    return trigtaylor(x, 1, 0)

def sin(x: Union[Scalar,float]):
    '''Taylor series approximation to sin(x)
    x - x^3 / 3! + x^5 / 5! - x^7 / 7! + ...
    '''
    return trigtaylor(x, x, 1) 

sin(1),cos(1),sin(Scalar(1)), cos(Scalar(1))

(0.841470984648068,
 0.5403023058795627,
 0.841470984648068 = Plus(0.8414710097001764,-2.505210838544171e-08),
 0.5403023058795627 = Plus(0.540302303791887,2.0876756987868096e-09))

The computational graph for computing `cos(0)` already has 13 Scalars in it.

In [6]:
x = Scalar(0)
y = cos(x)
backward(y)
print(f'{x=},  {y=}\n\n{trace(y)=}')

x=0 <grad=0.0>,  y=1.0 = Plus(1,-0.0) <grad=1>

trace(y)=[1.0 = Plus(1,-0.0) <grad=1>, -0.0 = Times(0.0,-0.5) <grad=1>, -0.5 <grad=0.0>, 0.0 = Times(1,0.0) <grad=-0.5>, 1 = Plus(0,1) <grad=1>, 0 <grad=1>, 0.0 = Pow(0.0,p=2) <grad=-0.5>, 0.0 = Times(0.0,1) <grad=0.0>, 1 <grad=0.0>, 0.0 = Mod(0,m=6.28318530718) <grad=0.0>, 0 = Abs(0) <grad=0.0>, 1 <grad=1.0>, 0 <grad=0.0>]


Take a look at the graph for `cos(-1)`!

In [7]:
x = Scalar(-1)
y = cos(x)
backward(y)
print(f'{x=},  {y=}\n\n{trace(y)=}')

x=-1 <grad=0.8414709846480679>,  y=0.5403023058795627 = Plus(0.540302303791887,2.0876756987868096e-09) <grad=1>

trace(y)=[0.5403023058795627 = Plus(0.540302303791887,2.0876756987868096e-09) <grad=1>, 2.0876756987868096e-09 = Times(-2.755731922398589e-07,-0.007575757575757576) <grad=1>, -0.007575757575757576 <grad=-2.755731922398589e-07>, -2.755731922398589e-07 = Times(-2.755731922398589e-07,1.0) <grad=-0.007575757575757576>, 0.540302303791887 = Plus(0.5403025793650793,-2.755731922398589e-07) <grad=1>, -2.755731922398589e-07 = Times(2.4801587301587298e-05,-0.011111111111111112) <grad=0.9924242424242424>, -0.011111111111111112 <grad=2.4613696488696486e-05>, 2.4801587301587298e-05 = Times(2.4801587301587298e-05,1.0) <grad=-0.011026936026936027>, 0.5403025793650793 = Plus(0.5402777777777777,2.4801587301587298e-05) <grad=1>, 2.4801587301587298e-05 = Times(-0.0013888888888888887,-0.017857142857142856) <grad=0.988973063973064>, -0.017857142857142856 <grad=-0.0013735736999625887>, -0.00138888

Another way to get the cosine functionality is to create a new primative.  In this case, we've used the exact same forward computation, but since we're able to differentiate cosine directly we can the functional form of the derivative for the backward function.

In [8]:
Cos = Function("Cos", lambda x: cos(x), lambda x: [-sin(x)])

x = Scalar(-1)
y = Cos(x)
backward(y)

print(f'{x=},  {y=}\n\n{trace(y)=}')

x=-1 <grad=0.841470984648068>,  y=0.5403023058795627 = Cos(-1) <grad=1>

trace(y)=[0.5403023058795627 = Cos(-1) <grad=1>, -1 <grad=0.841470984648068>]


### We've created two different differentiable implementations of cosine and sine, which is better?  Why?

In using a automatic differentiation library we're almost always working with existing primatives instead of defining new ones.  When you define a new primative you have to write down its derivative and not writing down derivatives is kind of what automatic differentiation is all about.  