In [1]:
! pip install streamlit torch micrograd

Collecting micrograd
  Downloading micrograd-0.1.0-py3-none-any.whl (4.9 kB)
Installing collected packages: micrograd
Successfully installed micrograd-0.1.0


If you have studied Deep Learning or are interested in the subject, you've probably come across the terms backpropagation and gradient descent. If you've been paying attention to the scene you probably even heard more mathematical terms like matrix multiplications and the Chain Rule. We've all heard about them at some point during our journey, but do we understand how they are used in Neural Networks?

In this short introduction to backpropagation for Deep Learning applications, we will mostly follow Andrej Karpathy's video on the topic, and add a few extra comments here and there that give my own perspective on the subject or help clarify something that I didn't understand when going through his material. To be clear, everyone looking at this dashboard should definitely go and check out his content, as he is both the developer of the micrograd library, and the person who gave me the idea to build this dashboard.

In [2]:
import streamlit
import numpy as np
import matplotlib.pyplot as plt
from micrograd.engine import Value

In [3]:
a = Value(2.0)
b = Value(-4.0)
c = Value(7)

a*b + c

Value(data=-1.0, grad=0)

In [11]:
def base_function(x: float) -> float:
    return 3 * x**2 + 6*x - 1

In [12]:
base_function(1), base_function(0), base_function(-3)

(8, -1, 8)

In [13]:
def derivative_base_function(x: float) -> float:
    return 6 * x + 6

In [15]:
derivative_base_function(1), derivative_base_function(0), derivative_base_function(-3)

(12, 6, -12)

No podemos aproximar lo mismo numericamente?

In [28]:
x = 1

base_function(x)

8

In [29]:
h = 0.00001

base_function(x + h)

8.0001200003

In [32]:
derivative = (base_function(x + h) - base_function(x))/h

print(f'The value of the derivative at {x} is:', derivative)

The value of the derivative at 1 is: 12.000030000081095


In [6]:
x_values = np.array([value for value in range(-50, 60, 10)])
y_values = 3 * x_values ** 2 - 2 * x_values - 1

In [7]:
x_values

array([-50, -40, -30, -20, -10,   0,  10,  20,  30,  40,  50])

In [8]:
y_values

array([7599, 4879, 2759, 1239,  319,   -1,  279, 1159, 2639, 4719, 7399])

This equation is read in the following way: The derivative of the function f with respect to the variable x is the limit of the ratio between the difference of the function evaluated at x + h and the function evaluated at x, normalized by the increment "h", when "h" tends to zero.

In [3]:
x = 1 
h=0.1

def test_parabola(x: float or np.array) -> float or np.array:
    """Function that builds a test parabola of X**2 + 1

    :param x: Input to pass to the function. Can be a scalar or a numpy array.
    :type x: float or np.array
    :return: The output of squaring the input and adding 1.
    :rtype: float or np.array
    """
    return x**2 + 1

x_values = np.arange(-2, 2.25, 0.25)
y_values_parabola = test_parabola(x_values)

data_parabola = [[x_val, y_val] for x_val, y_val in zip(x_values, y_values_parabola)]

inputs = np.array([x-10, x, x+h, x+10])
output = test_parabola(inputs)

data_derivative = [[x_val, y_val] for x_val, y_val in zip(inputs, output)]

In [4]:
data_parabola

[[-2.0, 5.0],
 [-1.75, 4.0625],
 [-1.5, 3.25],
 [-1.25, 2.5625],
 [-1.0, 2.0],
 [-0.75, 1.5625],
 [-0.5, 1.25],
 [-0.25, 1.0625],
 [0.0, 1.0],
 [0.25, 1.0625],
 [0.5, 1.25],
 [0.75, 1.5625],
 [1.0, 2.0],
 [1.25, 2.5625],
 [1.5, 3.25],
 [1.75, 4.0625],
 [2.0, 5.0]]

In [5]:
data_derivative

[[-9.0, 82.0], [1.0, 2.0], [1.1, 2.21], [11.0, 122.0]]

Now, notice that this definition is independent of any function we use. As long as the limit exists and is unique, we will say the function has a derivative. The way we would solve this mathematically is by calculating the limit in the definition of a derivative at a point of interest (x). The problem with this is that it is not always trivial to calculate that limit. Let's show a simple example to understand why even in a simple example this is not trivial.

In [2]:
''.join(filter(str.isalnum, ))

'holaquetal'

In [4]:
test = 'hola que tal'

interm = test.split(' ')

'-'.join([''.join(filter(str.isalnum, word)) for word in interm])

'hola-que-tal'