# Introduction
In this first lesson by Karpathy we go through a step-by-step explanation of backpropagation and training of neural networks by building a **micrograd** engine. The notebooks should be followed in numbered order.

Link to lesson: https://www.youtube.com/watch?v=VMj-3S1tku0

# What is Micrograd?
Micrograd is an engine for automatically evaluating the gradient of your loss function with respect to the weight of your neural network.

## Installation

In [1]:
!pip3 install micrograd

Collecting micrograd
  Downloading micrograd-0.1.0-py3-none-any.whl (4.9 kB)
Installing collected packages: micrograd
Successfully installed micrograd-0.1.0


### Math Examples

In [22]:
from micrograd.engine import Value

In [44]:
# Creating two Values
a = Value(-4.0)
b = Value(2.0)

In [45]:
# Doing common math operations
c = a + b;print(c)
d = a * b + b ** 3;print(d)
c += c + 1;print(c)
c += 1 + c + (-a);print(c)

Value(data=-2.0, grad=0)
Value(data=0.0, grad=0)
Value(data=-3.0, grad=0)
Value(data=-1.0, grad=0)


In [46]:
# Using relu to squash negative values to zero
d += d * 2 + (b + a).relu();print(d)
d += 3 * d + (b - a).relu();print(d)

Value(data=0.0, grad=0)
Value(data=6.0, grad=0)


In [47]:
# Doing some more math operations
e = c - d;print(e)
f = e ** 2;print(f)
g = f / 2.0;print(g)
g += 10.0 / f;print(g)

Value(data=-7.0, grad=0)
Value(data=49.0, grad=0)
Value(data=24.5, grad=0)
Value(data=24.70408163265306, grad=0)


### Backpropagation
While **micrograd** can be used to create simple math operations that is not the most interesting part about the engine! Actually **micrograd** builds a tree of the relationships between the calculations, e.g. that c was calculated by adding a and b. This means that for each calculation we can get the output of the forward pass (the printed values), but also of the backwards pass, which will be demonstrated next.

The value of the backwards pass is calculated using "the chain rule", which tells us how to find the derivative of a composite function, e.g. F(x) = f( g(x) ). The rule tells us to find the derivative of the outer function and multiply it with the derivative of the inner function as such: F'(x) = f'( g(x) ) * g'(x). This calculation is done recursively, and makes use of the hierachical **micrograd** tree. 

In [49]:
# The output of the forward pass for g
print(g)

Value(data=24.70408163265306, grad=0)


In [50]:
# Calculating all gradients (backpropagation)
g.backward()

In [55]:
# Getting all the gradients
print(a)
print(b)
print(c)
print(d)
print(f)
print(g)

Value(data=-4.0, grad=138.83381924198252)
Value(data=2.0, grad=645.5772594752186)
Value(data=-1.0, grad=-6.941690962099126)
Value(data=6.0, grad=6.941690962099126)
Value(data=49.0, grad=0.4958350687213661)
Value(data=24.70408163265306, grad=1)


These gradients tell us how the values of a and b affect the function g through the mathematical expression. That is how an increase/decrease of a and b affect the output of g.