# Week 1 — Backprop as a Functor (Micro‑Demo)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sridharmahadevan/Category-Theory-for-AGI-UMass-CMPSCI-692CT/blob/main/notebooks/week01_backprop_as_functor.ipynb)
<br/>
_Replace `sridharmahadevan/Category-Theory-for-AGI-UMass-CMPSCI-692CT` above once you push this repo to GitHub._

### Environment (run first)
This pins a minimal, stable stack. GPU is **optional**; notebooks run on CPU.

In [None]:
%%capture
# Core scientific stack + causal / graph tooling
%pip install -q numpy==1.* pandas==2.* matplotlib==3.* networkx==3.* pgmpy==0.1.* graphviz==0.20.*
# Torch CPU by default (Colab often preinstalls a GPU build; this is a safe fallback)
%pip install -q torch --extra-index-url https://download.pytorch.org/whl/cpu

In [None]:
# Install system graphviz only if available (Colab & many Linux envs). Safe to skip elsewhere.
!command -v apt-get >/dev/null && apt-get -y -qq install graphviz || echo "apt-get not available; skipping system graphviz"

In [None]:
import platform, sys
print("Python:", platform.python_version())
try:
    import torch
    print("Torch:", torch.__version__, "| CUDA available?", torch.cuda.is_available())
    device = "cuda" if torch.cuda.is_available() else "cpu"
except Exception as e:
    print("Torch not installed, proceeding CPU-only.")
    device = "cpu"
device


## Learning goals
- See how a **computation graph** (objects: layers; morphisms: compositions) is mapped to **compositional learners**.
- Observe that composition is preserved: learning `(g ∘ f)` behaves like composing learners for `f` and `g` (chain rule).


## 10‑line micro‑demo

In [None]:

import torch, math
torch.manual_seed(0)
# Two arrows in Graph: x --f--> h --g--> yhat   (ReLU in the middle)
f = torch.nn.Linear(2, 3, bias=False)
g = torch.nn.Linear(3, 1, bias=False)
relu = torch.nn.ReLU()

x = torch.tensor([[1.0, -1.0]])
y = torch.tensor([[0.5]])

def forward(x):            # (g ∘ ReLU ∘ f)
    return g(relu(f(x)))

opt = torch.optim.SGD(list(f.parameters()) + list(g.parameters()), lr=0.1)

yhat = forward(x)
loss = ((yhat - y)**2).mean()
opt.zero_grad(); loss.backward(); opt.step()

float(loss.detach())



### Key observation
The gradient on `f` is scaled by the **pushforward** through `g ∘ ReLU` (i.e., chain rule). That is the hallmark of functoriality: composition in the graph corresponds to composition in the learner category.


## Worked example: compare composing before vs. after a step

In [None]:

# We'll compare a single step update when seen as (g∘f) vs. sequential updates.
torch.manual_seed(42)
f1 = torch.nn.Linear(2, 3, bias=False); g1 = torch.nn.Linear(3, 1, bias=False); relu = torch.nn.ReLU()
f2 = torch.nn.Linear(2, 3, bias=False); g2 = torch.nn.Linear(3, 1, bias=False)
f2.weight.data[:] = f1.weight.data; g2.weight.data[:] = g1.weight.data

x = torch.tensor([[0.2, 0.8]]); y = torch.tensor([[0.7]])
def F(x, f, g): return g(relu(f(x)))

def step_pair(f, g, x, y, lr=0.1):
    opt = torch.optim.SGD(list(f.parameters())+list(g.parameters()), lr=lr)
    yhat = F(x, f, g); loss = ((yhat - y)**2).mean()
    opt.zero_grad(); loss.backward(); opt.step()
    return float(loss.detach()), f.weight.detach().clone(), g.weight.detach().clone()

# Joint step (g∘f)
L_joint, fW_joint, gW_joint = step_pair(f1, g1, x, y)

# Sequential view (still one joint step, but just to see parameter movement relation)
L_seq, fW_seq, gW_seq = step_pair(f2, g2, x, y)

print("Losses close?", abs(L_joint - L_seq) < 1e-6)
print("Δf weight norm:", float((fW_joint - fW_seq).abs().sum()))
print("Δg weight norm:", float((gW_joint - gW_seq).abs().sum()))


## Exercises (ungraded, quick)

In [None]:

# 1) Replace ReLU with Tanh and observe the effect on the gradient flow (chain rule factor).
# 2) Change the loss to MAE and repeat.
# 3) Add a skip connection h' = ReLU(f(x)) + x (shape match) and observe composition changes.
