Backpropagation, short for "backward propagation of errors," is an algorithm for supervised learning of artificial neural networks using gradient descent. Given an artificial neural network and an error function, the method calculates the gradient of the error function with respect to the neural network's weights.
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function.

The derivative of a with respect to b is denoted as da/db. It represents the rate of change of the variable "a" with respect to changes in the variable "b." In mathematical notation, it can be expressed as:

da/db = d(a)/d(b)

This derivative measures how a changes as b changes, and it is calculated using the rules of calculus, such as the chain rule or product rule, depending on the specific functions involved.

An implementation of Andrej Karpathy's micrograd from scratch : https://www.youtube.com/watch?v=VMj-3S1tku0&list=WL&index=15&t=12s

In [1]:
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
class Value:
    def __init__(self, data, _children=(), _op=''):
        self.data = data
        self._prev = set(_children)
        self._op = _op
        
    # built in string representation of an object
    def __repr__(self):
        return f"Value(data={self.data})"
    # override add
    def __add__(self, other):
        out = Value(self.data + other.data, (self, other), '+')
        return out
    # override multiply
    def __mul__(self, other):
        out = Value(self.data * other.data, (self, other), '*')
        return out
    # override subtract
    def __subtract__(self, other):
        out = Value(self.data - other.data)
        return out
    # override divide
    def __divide__(self, other):
        out = Value(self.data / other.data)
        return out
    
a = Value(2)
b = Value(-3)
c = Value(10.0)
d = a * b + c
d

Value(data=4.0)

In [3]:
d._prev

{Value(data=-6), Value(data=10.0)}

In [4]:
from graphviz import Digraph

def trace(root):
  # builds a set of all nodes and edges in a graph
  nodes, edges = set(), set()
  def build(v):
    if v not in nodes:
      nodes.add(v)
      for child in v._prev:
        edges.add((child, v))
        build(child)
  build(root)
  return nodes, edges

def draw_dot(root):
  dot = Digraph(format='svg', graph_attr={'rankdir': 'LR'}) # LR = left to right
  
  nodes, edges = trace(root)
  for n in nodes:
    uid = str(id(n))
    # for any value in the graph, create a rectangular ('record') node for it
    dot.node(name = uid, label = "{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record')
    if n._op:
      # if this value is a result of some operation, create an op node for it
      dot.node(name = uid + n._op, label = n._op)
      # and connect this node to it
      dot.edge(uid + n._op, uid)

  for n1, n2 in edges:
    # connect n1 to the op node of n2
    dot.edge(str(id(n1)), str(id(n2)) + n2._op)

  return dot