#### Libs and configs

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from micrograd.tensor import Value

#### Setting up gradient computation

In this demo, we are simulating how to compute the gradient of a function $f$ using **automatic differentiation** (a.k.a auto-diff), a set of techniques used to evaluate the function's partial derivative.

In a few words, auto-diff exploits the fact every computer calculation executes a sequence of **elementary arithmetic operations** (addition, substraction, multiplication, division, etc) and **elementary functions** (exp, log, sin, cos, etc). All numeric computation is centered around these operations, and since we know their derivatives, we can chain them together (by applying the **chain rule** $[\frac{\partial z}{\partial x} = \frac{\partial z}{\partial y} \cdotp \frac{\partial y}{\partial x}]$) to arrive at the derivative for the entire function.

The function $f$ we are simulating here is a simple neuron. This neuron has a shape of $$f(\sum_{i}{w_i x_i + b})$$ where the function $f$ is an activation function (*tanh* for this demo), $x_i$ are the inputs, $w_i$ the weights, and $b$ the bias of the neuron.

Let's define these variables:

In [4]:
# Inputs
x1 = Value(data=2.0)
x2 = Value(data=0.0)

# Weights
w1 = Value(data=-3.0)
w2 = Value(data=1.0)

# Bias
b = Value(data=6.881373)

# Inside sum
x1w1 = x1 * w1
x2w2 = x2 * w2

x1w1x2w2 = x1w1 + x2w2

# Adding with bias
n = x1w1x2w2 + b
print(f'n: {n}')

# Tanh
o = n.tanh()
print(f'o: {o}')

n: Value(data=0.881373, grad=0.0)
o: Value(data=0.7071064876766542, grad=0.0)


#### Calculating derivatives manually

First, we'll calculate the derivative of ou function $f$ manually. After that, we'll implement the backpropagation algorithm.

The derivative of $o$ with respect to $o$ is `1.0` given that is the base case $(\frac{\partial o}{\partial o})$

In [5]:
o.grad = 1.0

print(f'o: {o}')

o: Value(data=0.7071064876766542, grad=1.0)


The variable $o$ is connected to $n$ in our graph. So, the derivative of $o$ with respect to $n$ $(\frac{\partial o}{\partial n})$ is $1 - tanh(n)^2$

In [6]:
n.grad = 1 - o.data ** 2  # 0.5 given that o.data is the same of n.tanh()

print(f'n: {n}')

n: Value(data=0.881373, grad=0.5000004150855857)


To calculate the derivative of $o$ with respect to $x_1w_1x_2w_2$ $(\frac{\partial o}{\partial x_1w_1x_2w_2})$ is necessary to use the **chain rule**:

$$\frac{\partial o}{\partial x_1w_1x_2w_2} = \frac{\partial o}{\partial n} \cdotp \frac{\partial n}{\partial x_1w_1x_2w_2}$$

We know that $\frac{\partial n}{\partial x_1w_1x_2w_2}$ is equal to `1.0`. So, the $\frac{\partial o}{\partial x_1w_1x_2w_2}$ is the same of $\frac{\partial o}{\partial n}$ since a **plus operation** is just a distributor of gradient.

The same applies to $\frac{\partial o}{\partial b}$, $\frac{\partial o}{\partial x_2w_2}$, and $\frac{\partial o}{\partial x_1w_1}$

In [7]:
x1w1x2w2.grad = n.grad
b.grad = n.grad
x2w2.grad = n.grad
x1w1.grad = n.grad

print(f'x1w1x2w2: {x1w1x2w2}')
print(f'b: {b}')
print(f'x2w2: {x2w2}')
print(f'x1w1: {x1w1}')

x1w1x2w2: Value(data=-6.0, grad=0.5000004150855857)
b: Value(data=6.881373, grad=0.5000004150855857)
x2w2: Value(data=0.0, grad=0.5000004150855857)
x1w1: Value(data=-6.0, grad=0.5000004150855857)


In a multiplication case (such as the node $x_1w_1$ that is composed by the variables $x_1$ and $w_1$), we can assume that the local derivative of one of the variables is equal to the other data's variable. For example: $\frac{\partial x_1w_1}{\partial x_1}$ is equal to $w_1$'s value and so on.

By applying the chain rule, we have:

In [8]:
x1.grad = w1.data * x1w1.grad
w1.grad = x1.data * x1w1.grad

x2.grad = w2.data * x2w2.grad
w2.grad = x2.data * x2w2.grad

print(f'x1: {x1}')
print(f'w1: {w1}')
print(f'x2: {x2}')
print(f'w2: {w2}')

x1: Value(data=2.0, grad=-1.500001245256757)
w1: Value(data=-3.0, grad=1.0000008301711714)
x2: Value(data=0.0, grad=0.5000004150855857)
w2: Value(data=1.0, grad=0.0)
