#### Libs and configs

In [1]:
%load_ext autoreload
%autoreload 2

In [30]:
from micrograd.tensor import Value

#### Setting up gradient computation

In this demo, we are simulating how to compute the gradient of a function using **automatic differentiation** (a.k.a auto-diff), a set of techniques used to evaluate the function partial derivative.

In a few words, the auto-diff exploits the fact every computer calculation executes a sequence of elementary arithmetic operations (addition, substraction, multiplication, division, etc) and elementary functions (exp, log, sin, cos, etc). By applying the **chain rule** $(\frac{\partial z}{\partial x} = \frac{\partial z}{\partial y} \cdotp \frac{\partial y}{\partial x})$ reoeatedly to these operations, partial derivatives can be computer automatically.

The function we are simulating here is a simple neuron. This neuron has a shape of $$f(\sum_{i}{w_i x_i + b})$$ where the function $f$ is an activation function (*tanh* for this demo), $x_i$ are the inputs, $w_i$ the weights, and $b$ the bias of the neuron.

In [31]:
# Inputs
x1 = Value(data=2.0)
x2 = Value(data=0.0)

# Weights
w1 = Value(data=-3.0)
w2 = Value(data=1.0)

# Bias
b = Value(data=6.881373)

# Inside sum
x1w1 = x1 * w1
x2w2 = x2 * w2

x1w1x2w2 = x1w1 + x2w2

n = x1w1x2w2 + b
print(f'n: {n}')

# Tanh
o = n.tanh()
print(f'o: {o}')

n: Value(data=0.881373, grad=0.0)
o: Value(data=0.7071064876766542, grad=0.0)


#### Calculating derivatives manually

The derivative of $o$ with respect to $o$ (base case $\frac{\partial o}{\partial o}$) is 1.0

In [42]:
o.grad = 1.0

print(f'o: {o}')

o: Value(data=0.7071064876766542, grad=1.0)


The derivative of o with respect to n $\frac{\partial o}{\partial n}$ is $1 - tanh(n)^2$

In [41]:
n.grad = 1 - o.data ** 2  # 0.5

print(f'n: {n}')

n: Value(data=0.881373, grad=0.5000004150855857)


Derivative of o with respect of x1w1x2w2 $\frac{\partial o}{\partial x1w1x2w2}$ is the same as $\frac{\partial o}{\partial n}$ since a plus operation is just a distributor of gradient (as well as gradients $\frac{\partial o}{\partial b}$, $\frac{\partial o}{\partial x2w2}$, and $\frac{\partial o}{\partial x1w1}$)

In [40]:
x1w1x2w2.grad = n.grad
b.grad = n.grad
x2w2.grad = n.grad
x1w1.grad = n.grad

print(f'x1w1x2w2: {x1w1x2w2}')
print(f'b: {b}')
print(f'x2w2: {x2w2}')
print(f'x1w1: {x1w1}')

x1w1x2w2: Value(data=-6.0, grad=0.5000004150855857)
b: Value(data=6.881373, grad=0.5000004150855857)
x2w2: Value(data=0.0, grad=0.5000004150855857)
x1w1: Value(data=-6.0, grad=0.5000004150855857)
