# Micrograd

Micrograd is a small library made by Andrej Karpathy. This is a reimplementation made by me.
I wanted to have a version of Micrograd with a ton of documentation after watching the amazing
video made possible by Adrej that [you can view here](https://www.youtube.com/watch?v=VMj-3S1tku0).

The code is licensed under the MIT license and you can use it for whatever you want.

In [147]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random

# We have to create a datastructure (The Engine)

Creating a new datastructure so our life is easier.

In [50]:
class Tensor:
    def __init__(self, value):
        # Create a constructor for the datastructure
        self.value = value

We are going to add a bit more. We are going to make it able to add between the Tensor type and a constant

In [52]:
class Tensor:
    def __init__(self, value):
        # Create a constructor for the datastructure
        self.value = value
    def __add__(self, other): 
        # self + other
        return Tensor(self.value + other.value)

That seems like enough for now. Now we should check what is a gradient.

The gradient is a small step forward in the direction of our derivative.

But we should have a way to store this information. So we should add this to our class.

In [51]:
class Tensor:
    def __init__(self, value):
        # Create a constructor for the datastructure
        self.value = value
        self.grad = 0.0

And now that we have the possiblity to have a gradient, we should look back at what calculus is.

# Looking at you Calculus!

So we need can look at the definition of a derivative

$$x = \lim_{\Delta x \rightarrow 0} \frac{f(a+\Delta x)-f(a)}{\Delta x}$$

This did not tell me anything at first. So let's do an example.

For example. If we have a value $a = 3.0$ and a value $b=0.5$ we can get their derivatives with respect to each other.

So we can see the derivatives with respect to each other if we add them. We are going to name the result of this operation $c$

$$c = a+b$$

And now we can get the derivative with respect to $a$ and the derivative with respect to $b$.

So we are going to do a _limit_. This might seem scary but what the thing above is telling us is just to add a really small amount to see if it works.

So we are going to say: $\Delta x = 0.0001$

We have our $c1$ that is the value of $3.5$ and we increment the value of a by $0.0001$ so we get $c2 = 3.0001 + 0.5 = 3.5001$ and we can evaluate this value in
the derivative:

$$\frac{c2-c1}{\Delta x} = \frac{ 3.5001-3.5}{0.0001} = \frac{0.0001}{0.0001} = 1$$

Now we can do the same but for other operations

In [22]:
a = 3.0
b = 1.5
h = 0.0001

c1 = a*b
c1

4.5

We can see the difference in the slope by going through the same process we did with the adition.

In [23]:
a += h
b = 1.5

c2 = a*b
derivative = (c2-c1)/h
derivative

1.500000000005386

And for the derivative of b with respect to a we can see the following.

In [24]:
a = 3.0
b = 1.5
h = 0.0001

c1 = a*b
b += h
c2 = a*b

(c2-c1)/h

3.00000000000189

We can see that the values of the derivatives for $a$ and $b$ swap in the derivative. 
So we can modify our Tensor class to store the gradients of this operations.

# The Full Engine

In [192]:
class Tensor:
    """
        Stores a single value.
    """
    def __init__(self, value, _children=(), _operation=""):
        # Create a constructor for the datastructure
        self.value = value
        self.gradient = 0.0
        
        # Internal variables
        self._prev = set(_children)
        self._backward = lambda: None
        
        # Debugging
        self._operation = _operation
        
    def __add__(self, other):
        # self + other
        other = other if isinstance(other, Tensor) else Tensor(other)
        output = Tensor(self.value + other.value, (self, other), '+')
        
        def _backward():
            self.gradient += output.gradient
            other.gradient += output.gradient
        output._backward = _backward
        
        return output
    
    def __radd__(self, other):
        return self + other
    
    def __mul__(self, other):
        # self * other
        other = other if isinstance(other, Tensor) else Tensor(other)
        output = Tensor(self.value * other.value, (self, other), '*')
        
        def _backward():
            # we use the chain rule so that is why we multiply out.gradient * self and out.grad * other.value
            self.gradient += other.value * output.gradient
            other.gradient += self.value * output.gradient
        output._backward = _backward
        
        return output
    
    def __rmul__(self, other):
        return self * other
    
    def __pow__(self, other):
        # self ** other
        assert isinstance(other, (int, float)), "only supporting int/float powers for now"
        output = Tensor(self.value ** other.value, (self, other), '^')
        
        def _backward():
            self.gradient += (other*self.value ** (other-1))*output.gradient
        
        output._backward = _backward
        
        return output
    
    def backward(self):
        # Topological order of the children in the graph
        # Using the dfs version
        topological_order = []
        visited = set()
        
        def build_topo(v):
            if v not in visited:
                visited.add(v)
                for child in v._prev:
                    build_topo(v)
                topological_order.append(v)
        build_topo(self)
        
        self.gradient = 1.0
        
        for v in reversed(topological_order):
            v._backward()
    
    def __neg__(self):
        return self*-1
    
    def __sub__(self, other):
        return self + (-other)
    
    def __rsub__(self, other):
        return other + (-self)
    
    def __truediv__(self, other):
        return self * other**-1
    
    def __rtruediv__(self, other):
        return other * self**-1
    
    def __repr__(self):
        # This function returns the value formatted when you print it in jupyterlab
        return f"Tensor=({self.value}, gradient={self.gradient})"
    
    # Extra
    def tanh(self):
        x = self.value
        t = (math.exp(2*x) - 1)/(math.exp(2*x) + 1)
        out = Tensor(t, (self, ), 'tanh')

        def _backward(): 
            self.gradient += (1 - t**2) * out.gradient
        
        out._backward = _backward
        
        return out

# The nn

In [193]:
class Module:
    def zero_grad(self):
        for p in self.parameters():
            p.gradient = 0
    def parameters(self):
        return []

In [194]:
class Neuron:
    """ Class Neuron inherits from Module """
    
    def __init__(self, nin):
        # Initialize number in neuron
        # Nin --> Number of inputs
        self.w = [Tensor(random.uniform(-1,1)) for _ in range(nin)]
        self.b = Tensor(0)
    
    def __call__(self, x):
        # w * x +b
        act = sum((wi*xi for wi, xi in zip(self.w, x)), self.b)
        out = act.tanh()
        return out

In [195]:
class Layer:
    def __init__(self, nin, nout):
        self.neurons = [Neuron(nin) for _ in range(nout)]
    
    def __call__(self, x):
        outs = [n(x) for n in self.neurons]
        return outs[0] if len(outs) == 1 else outs

In [196]:
class MLP:
    """
        Multi-Layer Perceptron
    """
    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]
    
    def __call__(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

# An example

In [197]:
x = [2.0, 3.0, -1.0]
n = MLP(3, [4,4,1])
n(x)

Tensor=(0.07683646136043382, gradient=0.0)

In [198]:
xs = [
    [2.0, 3.0, -1.0],
    [3.0, -1.0, 0.5],
    [0.5, 1.0, 1.0],
    [1.0, 1.0, -1.0]
]
ys = [1.0, -1.0, -1.0, 1.0] # desired targets