# AutoGrad Engine(Scalar, Reverse-Mode)

This implementation builds a minimal reverse-mode automatic differentiation engine from first principles.

Core idea:
- Forward pass builds a computatuion DAG dynamically
- Backward pass applies the chain rule in reverse topological order

This is NOT symbolic differentiation.
This is NOT numerical differentiation.
This is graph-based reverse-mode autodiff

## Node initialization

Represents a single value in the computation graph.

Each Node stores:
- Value: forward computed scalar value
- grad: accumulated gradient doutput/dvalue
- parents: nodes this value depends on (graph edges)
- backward_fn: local backward function (closure)

The computation graph is implicit:
If each node knows its parents, the graph already exists.

In [17]:
# Core Structure
class Node:
    def __init__(self, value, parents=(), backward_fn=None):
        self.value = value
        self.grad = 0
        self.parents = parents
        self.backward_fn = backward_fn

## FORWARD Operations (GRAPH CONSTRUCTION)

Each operation:
- computes a forward value
- creates exactly one new Node
- Records dependencies via parents
- Attaches local backward logic as a closure

Important:
- Forward pass ONLY builds the graph
- No gradients are computed here

### add(a, b)

Forward:
- value = a.value + b.value
- parents = (a,b)

Backward (local):
- d(out)/d(a) = 1
- d(out)/d(b) = 1

Gradient contribution is accumulated into parent nodes.

In [11]:
# Function for addition
def add(a, b):
    out = Node(
        value = a.value + b.value,
        parents = (a,b)
    )
    # Gradient Accumulation
    def backward():
        a.grad += out.grad * 1
        b.grad += out.grad * 1
    out.backward_fn = backward
    return out


### mul(a, b)

Forward:
- value = a.value * b.value
- parents = (a,b)

Backward (local):
- d(out)/d(a) = b.value
- d(out)/d(b) = a.value

Uses closure to capture forward values needed for backward.

In [12]:
# Function for multiplication
def mul(a, b):
    out = Node(
        value = a.value * b.value,
        parents = (a,b)
    )
    # Gradient Accumulation
    def backward():
        a.grad += out.grad * b.value
        b.grad += out.grad * a.value
    out.backward_fn = backward
    return out

## Gradient Accumulation

Gradients are accumulated (+=), Not overwritten.

Calling backward multiple times without resetting gradients will accumulate contributions.

This mirrors real frameworks like PyTorch.
Resetting gradient is the user's responsibility.

## post_dfs(node)

Post-order Depth-First Search over the computation DAG.

Guarantees:
- Each node is visited exactly once
- Parents are processed before the node
- Produces a valid topological ordering

Why post-order:
Backward pass requires all downstream gradients to be ready before propagating gradients upstream.

In [13]:
visited = set()
order = []
def post_dfs(node):
    if node in visited:
        return
    visited.add(node)
    for parent in node.parents:
        post_dfs(parent)
    order.append(node)

## BACKWARD PASS (Core Autograd Engine)

Computes gradients for all nodes contributing to the output.

Steps:
1. Seed output gradient: d(out)/d(out) = 1
2. Discover computation graph via post-order DFS
3. Build topological order (dependencies first)
4. Execute backward functions in reverse order

This separates:
- Scheduling (engine)
- Math (local backward functions)

In [20]:
#Backward Pass
def backward(out_node):
    post_dfs(out_node)
    out_node.grad = 1
    for node in reversed(order):
        if node.backward_fn:
            node.backward_fn()


## Summary

This implementation demonstrates the core of reverse-mode autodiff:

- Dynamic graph construction
- Implicit DAG representation
- Topological sorting via post-order DFS
- Reverse execution of local backward functions

This is the foundational mechanism behind modern deep learning frameworks.


In [22]:
x = Node(5)
y = Node(4)
z = add(x,x)
backward(z)
print(z.grad, x.grad, y.grad) # Outputs are (1 2 0) because x is being used twice so it takes the blame twice.
z = mul(add(x,y),add(x,y))
backward(z)
print(z.grad, x.grad, y.grad) # Outputs are (1 22 18) the answer should be (1 20 18) but due to the gradient not being reset before calling the x node again it added the previous gradient to the new one.

1 2 0
1 22 18
